Dan Wellman's Blog

Text File Processing with Ruby from the Command Line

My colleague David showed me some useful ways to use Ruby for simple text processing from the command line. I'm blogging about it here to remind myself how it works. (Thanks for the tip, David!)

The Ruby interpreter can be run in a mode which executes a one-line script against a text file. You can choose to print the results of the script to standard out or to replace the file. In this way, Ruby becomes a text processing tool like awk, sed, and grep.

For example, if you've got a file with a list of recipe names (in "recipe_index.txt") and you'd like it to contain only those recipes that have 'banana' in the title, you'd do something like this:

ruby -ni.bak -e 'print if /banana/i' recipe_index.txt

-n feeds every line of input_file into your script, simulating sed's text file processing

-i applies changes to recipe_index.txt in-place, otherwise prints the results to standard out. If you specify any text after -i, such as "-i.bak", Ruby will first make a copy of your input file and append the extension. In this example, you'd end up with recipe_index.txt.bak

-e specifies a one-line Ruby script to execute, which prints the line only if it matches the case-insensitive regular express 'banana'

You can also use -p if you want to print out every line of the input file as well as your script's result. For example, if you wanted to prepend the number of characters in each line to the beginning of the line, you'd say:

ruby -pe 'print $_.size' input_file"

Note the magic variable $_ (dollar sign underscore) which holds the contents of the last line matched by gets().