Description
  • Assume that the data file is sorted according to the first field. It seems that grep will do this job well, but consider a file which consists of 10000 lines and those lines interesting you are the first 100 ones. Although grep will print them out, it will go on examining the remaining 9900 lines.
  • In the following example, we use '70613' as PAT, and the field separator is space.
Raw Input Desired Output
7254 romans@abc.net
7254 stack@bcd.org
7254 giant@ab.cd.com
70613 mega@true.edu.tw
70613 meggy@false.com
70613 antims@msgotohell.cx
70613 blah@blah.bon.org
70613 ramma@comics.co.jp
enc gary@enc.com
enc devin@enc.com
enc roy@enc.com
70613 mega@true.edu.tw
70613 meggy@false.com
70613 antims@msgotohell.cx
70613 blah@blah.bon.org
70613 ramma@comics.co.jp
Script and Comments
Script1
[ 1] /^PAT /!d
[ 2] $!N
[ 3] /\nPAT /{
[ 4] P
[ 5] D
[ 6] }
[ 7] s/\n.*//
[ 8] q
Comments
  1. Step [1] will discard lines you are not interested in.
  2. The first line interesting you will direct sed to Step [2], which will join the next line if possible.
    • If the next line has the same first field, sed will print the current line (Step [3]), delete it and make the next one current (Step [4]). Due to the characteristic of command 'D', sed will start a new cycle (back to Step [1] then [2], ...).
    • If there are no more lines (e.g., end of file is reached) or the next line does not have the same first field, Step [7] will delete the next line, and command 'q' will print the current line then quits.