Description
  • The file is assumed sorted.
  • The tag of a line is separated with other part of that line by a space.
What we want is joining all lines of the same tag together, remove all tags then print. In the following, three scripts are provided. The first is tedious but straightforward, the second is an improved version, and the third is the neat one.
Raw Input
7254 romans@abc.net
7254 stack@bcd.org
7254 giant@ab.cd.com
70613 mega@true.edu.tw
70613 meggy@false.com
70613 antims@msgotohell.cx
70613 ramma@comics.co.jp
enc gary@enc.com
enc devin@enc.com
enc roy@enc.com
Desired Output
romans@abc.net stack@bcd.org giant@ab.cd.com
mega@true.edu.tw meggy@false.com antims@msgotohell.cx ramma@comics.co.jp
gary@enc.com devin@enc.com roy@enc.com
Script and Comments
Script1
[ 1] :loop
[ 2] $!{
[ 3] N
[ 4] /^\([^ ]* \).*\n\1/{
[ 5] s/\n[^ ]*//
[ 6] b loop
[ 7] }
[ 8] s/^[^ ]* //
[ 9] P
[10] D
[11] }
[12] s/^[^ ]* //
Comments
  1. To facilitate further explanations, take a 3-line data file as an example:
    T1 D1
    T1 D2
    T2 D3
    The changes of Pattern Space (abbreviated to PS) are listed in the following:
    OperationPattern Space After Operation
    Initiallyempty
    read the first lineT1 D1
    join next line T1 D1\nT1 D2
    remove tag and new line of recently joined lineT1 D1 D2
    join next line T1 D1 D2\nT2 D3
    remove tag of first line of PS D1 D2\nT2 D3
    print then delete the first line of PS T2 D3
  2. If the current line is not the last one of a file, sed will enter the block consists of Step [2-11]:
    • Step [3] (command N) is used to join the next line.
    • If PS matches the RE of Step [4], then the line joined by Step [3] has the same tag, Step [5] will remove the newline character and the following tag, then Step [6] make sed branch to Step [1].
    • Otherwise, Step [8] is used to remove the tag of PS' first line, the P-then-D sequence (Step [9-10]) will print then delete PS' first line; after that, start a new cycle.
  3. When the last line of file is reached, Step [12] will remove PS' tag. Since the end of script is reached, sed will print PS' contents then stop.
Script2
[ 1] :loop
[ 2] $!{
[ 3] N
[ 4] s/^\(\([^ ]* \).*\)\n\2/\1 / 
[ 5] t loop
[ 6] s/^[^ ]* //
[ 7] P
[ 8] D
[ 9] }
[10] s/^[^ ]* //
Comments
  1. Step [4-5] of this script are equivalent to Step [4-7] of the first one.
Script3
[ 1] :loop
[ 2] $!N
[ 3] s/^\(\([^ ]* \).*\)\n\2/\1 / 
[ 4] t loop
[ 5] s/^[^ ]* //
[ 6] P
[ 7] D
Comments
  1. The neat version.