Reading a file in Ruby

How to Read a File in Ruby

Reading a file is a simple yet common programming task that can be accomplished in a few different ways in Ruby. In this post, we're going to cover the basics of reading files in Ruby, including few different ways to open and read files, both in-memory and via streaming.

4 min read

To read some text from a file is one of the most common tasks while programming. However, there are a few gotchas that await a new programmer that have bitten me in the past. So in this post, we'll explore a few different ways to read data from file and when to choose which.

TL; DR

If the file is small, slurp it:

content = File.read "data.txt"

If the file is large, stream it:

File.foreach("data.txt") { |line| puts line }

Let's assume we have a file named companies.csv containing the following data:

id,company,product
1,Microsoft,windows
2,Apple,iphone
3,Meta,facebook
4,Google,search
5,Amazon,ecommerce

Let's see a few different ways to read this file in Ruby.

Using File.new

The simplest way to access this file in Ruby is to create an instance of the File class, passing the name of the file. Using this file instance, you can manipulate the file to your heart's content: read it, write to it, inspect the permissions, etc.

To read the whole file at once, use the read method.

file = File.new "companies.csv"

contents = file.read

# "id,company,product\n1,Microsoft,windows\n2,Apple,iphone\n ..."

Once you're done processing the file, don't forget to close the file, to avoid leaking resources.

file.close  # Important!
💡
The read method is defined on the IO class, which is the superclass of File class. Class IO represents anything that can be read as input and written to as output in Ruby.

Using File.open

In the previous solution, you have to remember to close the file. However, Ruby provides a better solution with the File.open method, which takes a block that receives an instance of the File class, which represents the underlying file. The file is closed automatically at the end of the block.

File.open("companies.csv") do |file|
end

Using the file instance, you can read the whole file at once using the read method.

File.open("companies.csv") do |file|
  content = file.read
end

Using File.read

In everyday programming, the simplest (and most readable) way is to directly read the contents of the file using the File.read class method.

content = File.read "companies.csv"

If all you are doing is read the file and don't need the file object for anything else, use this method. Ruby takes care of opening and closing the file behind the scenes and you don't have to worry about it.

💡
Methods File.new and File.open each take optional string argument mode. For more details, check out the docs. By default, if you don't pass anything, the file is opened in read-only mode, preventing accidental writes.

Reading All Lines

To read all lines from the file into an array, Ruby provides the readlines method.

File.open("companies.csv") do |file|
  lines = file.readlines
end

# Output

[
    [0] "id,company,product\n",
    [1] "1,Microsoft,windows\n",
    [2] "2,Apple,iphone\n",
    [3] "3,Meta,facebook\n",
    [4] "4,Google,search\n",
    [5] "5,Amazon,ecommerce"
]

Also, just like the read method, you can call the readlines method on the File class itself.

lines = File.readlines("companies.csv")

# Output

[
    [0] "id,company,product\n",
    [1] "1,Microsoft,windows\n",
    [2] "2,Apple,iphone\n",
    [3] "3,Meta,facebook\n",
    [4] "4,Google,search\n",
    [5] "5,Amazon,ecommerce"
]

Now let's address a common issue while reading the file. Often, during development, you're working with smaller files. You read the file and everything works fine. Then you deploy it to production where it needs to work with large files, and suddenly your program crashes. Oops!

Read Large Files as Streams

All the solutions we've seen so far load the whole file in memory at once. If the file is huge, it will consume too much memory. What's more, good luck loading a 10 GB log file on a machine with 8 GB of memory.

To read a huge file, a better solution is to treat it as a flowing stream.

We don't need to have the whole file into memory at once to process it. You can process the file one line at a time, or even one character at a time.

File.open("companies.csv") do |f|
  f.each_line do |line|
    puts line
  end

  # OR

  f.each do |line|
    puts line
  end
end

# OR

File.foreach "companies.csv" do |line|
  puts line
end

The advantage of treating a file as a stream is this: At no point, we have the whole file in the memorey, and as the size of the file increases, the above code won't use any more memory.

Bonus: File is an Enumerable

As you can see in the above example, you can use the each method on the File instance to read each line. In addition, the File class includes the Enumerable module via its superclass, the IO class.

Since the Enumerable module contains useful methods such as map, filter, reduce, etc. that operate on the collections, you can use these methods on the file lines, without having to load the whole file in memory. What that means, is that you can manipulate lines in a file, just like an array.

For example, to group the above CSV records via the company name, you could write:

File.open csv do |f|
  puts f.drop(1)
      .map { |line| { company: line.split(',')[1], product: line.split(',')[2] } }
      .group_by { |record| record[:company] }
end

# Output
# 
# {"Microsoft"=>[{:company=>"Microsoft", :product=>"windows\n"}], "Apple"=>[{:company=>"Apple", :product=>"iphone\n"}], "Meta"=>[{:company=>"Meta", :product=>"facebook\n"}], "Google"=>[{:company=>"Google", :product=>"search\n"}], "Amazon"=>[{:company=>"Amazon", :product=>"ecommerce"}]}

That's a wrap. I hope you found this article helpful and you learned something new.

As always, if you have any questions or feedback, didn't understand something, or found a mistake, please leave a comment below or send me an email. I reply to all emails I get from developers, and I look forward to hearing from you.

If you'd like to receive future articles directly in your email, please subscribe to my blog. If you're already a subscriber, thank you.