Description
Given a sorted data file, each line consists of several fields separated by colons (':'). A block is consists of consecutive lines whose first fields are of the same value. What we want to do is appending
**********
Duplicate
*********
after every block.
Raw Input Desired Output
Monica:pc-s01.hq:2003/05/01:ME
Kelly:pc-x01.plant:2002/06/05:Linux
Kelly:pc-a01.hq:2003/12/25:XP
Qoo:mail.hq:2000/04/12:Linux
John:tester.plant:2004/09/01:XP
Johnson:mail.plant:2003/11/12:Linux
Johnson:x31.plant:2004/01/01:Linux
Johnson:x40.plant:2004/09/07:None
Monica:pc-s01.hq:2003/05/01:ME
Kelly:pc-x01.plant:2002/06/05:Linux
Kelly:pc-a01.hq:2003/12/25:XP
*********
Duplicate
*********
Qoo:mail.hq:2000/04/12:Linux
John:tester.plant:2004/09/01:XP
Johnson:mail.plant:2003/11/12:Linux
Johnson:x31.plant:2004/01/01:Linux
Johnson:x40.plant:2004/09/07:None
*********
Duplicate
*********
Script and Comments
Script1
[ 1] p
[ 2] $!N
[ 3] /^\([^:]*:\).*\n\1/{h;D}
[ 4] x
[ 5] /./{
[ 6] i\
[ 7] =========\
[ 8] Duplicate\
[ 9] =========
[10] s/^.*//
[11] }
[12] x
[13] D
Comments
  1. Pattern Space and Hold Space are abbrivated to PS and HS, respectively.
  2. Every line of the datafile will be printed by command 'p' of Step [1], one per cycle.
  3. Step [2] is used to join next line if the current line is not the last one.
  4. Now we have two lines in PS, where the second one is the recently joined one.:
    • If the second one has the same first field as the first one (previous line), then command 'h' of Step [3] will place a mark in HS, command 'D' will delete the first line of PS, and make sed jump to Step [1].
    • Otherwise, We have to check whether there exists a mark in HS. To perform such a check, first we have to exchange the contents of PS and HS by command 'x' of Step [4].
    • If there is a mark, the line we just print is the last line of some block, command 'i' of Step [6] thru [9] will print the message immediately, then use Step [10] to clear the mark.
    • Command 'x' of Step [12] will exchange PS and HS again.
    • Command 'D' of Step [13] will delete the first line of PS, then make sed jump to Step [1].