Friday, November 20, 2009

Perl Record Separator

You may have thought Perl only worked line by line. I thought so, until I needed to do a multi-line match and had to research it out. Turns out that the special variable $/ allows you to specify the record separator when reading data.

Normally, $/ is set to "\n". This, obviously, means that data is read until a new line is reached. In certain cases, you might want to read tab-delimited data a field at a time. In such a case, $/="\t".

Another example is a file of quotes separated by some special characters, say ---###---. The file might look like this:

I cannot undertake to lay my finger on that article of the Constitution which granted a right to Congress of expending, on objects of benevolence, the money of their constituents. (James Madison)
---###---
No more evidence is needed; the verdict is in: nothing is more intolerant of a diversity of opinion than a 'liberal' society touting the virtues of tolerance and diversity. (Theodore Pappas "Plagiarism and the Culture War")
---###---
Suppose you were an idiot and suppose you were a member of Congress. But I repeat myself. (Mark Twain).


In this case, set $/="---###---". There would be three "reads", each one containing an entire quote. The quotes could be printed, one at a time, by this code:

#!/usr/bin/perl -w
use strict;
$/="---###---";
while ( <> ) { print "$1\n\n" if ( /(.+)/ ); }

Labels: