Description
In the following example:
  • We want to extract the third through the seventh matches of [0-9]+.
  • Besides, the following script can be adapted to extract discrete number of matches. For example, extract the third, the seventh, and the eleventh matches.
Raw Input
First 111802 Second 22235 Third 33392
Fourth 44410984 Fifth 555432583
Sixth 6667181 Seventh 7778297 Eighth 888133
Ninth 9997 Tenth 00086 Eleventh 11123401
Twelfth 12229999
Desired Output
33392
44410984
555432583
6667181
7778297
Script and Comments
Script1
[ 1] s/[0-9]+/\n&/
[ 2] /\n/!d
[ 3] s/^[^\n]*\n//
[ 4] G
[ 5] s/$/\n/
[ 6] h
[ 7] s/^[^\n]*\n//
[ 8] x
[ 9] s/[0-9]+/&\n/
[10] /^[^\n]*\n[^\n]*\n\n{3,7}$/P
[11] s/\n*$//
[12] D
Comments -r
  1. A counter is needed to keep the number of matches that have been processed. This script keeps the value of the counter as the same number of newline characters in HS.
  2. After a line being read to PS,
    • Step [1] inserts a newline character before the first match of [0-9]+ in that line.
    • If that line does not have any match of [0-9]+, it will be discarded by Step [2].
  3. Step [3] deletes everything from the start of the line till but not including the first match.
  4. Since there is no way to increment the counter's value (kept in HS) directly, the following steps are used:
    • Command `G' of Step [4] appends the counter's value to PS, separating it from the original data of PS with a newline character.
    • Step [5] appends a newline character at the end of PS. This is equivalent to increment the counter by one.
    • Step [6] overwrites HS with PS.
    • Step [7] deletes everything up and including the first newline character. Remember that command `G' of Step [4] add an extra newline character to separate the appended data from the original of PS.
    • Command `x' of Step [8] exchanges the contents of PS and HS. Now the value of the counter has been updated.
    • Step [9] inserts a newline character after the first match of [0-9]+ to separate it from the data behind it. Now PS contains two lines with the first one consisting only of a match.
    • If the match is what we want, command `P' prints it since `P' prints only the first line of PS.
    • Step [11] deletes the `counter part' of PS.
    • Step [12] deletes the first match, making sed jump to Step [1] to process the remaining data of that line.
Script2
[ 1] s/[0-9]+/\n&/
[ 2] /\n/!d
[ 3] s/^[^\n]*\n//
[ 4] G
[ 5] s/$/\n/
[ 6] h
[ 7] s/^[^\n]*\n//
[ 8] x
[ 9] s/[0-9]+/&\n/
[10] /^[^\n]*\n[^\n]*\n\n{7}$/{
[11] s/\n.*$//
[12] q
[13] }
[14] /^[^\n]*\n[^\n]*\n\n{3,6}$/P
[15] s/\n*$//
[16] D
Comments -r
  1. The drawback of the first script is: it will not stop when there is no more desired matches. For example, it keep processing after the seventh match has been found. In this script, Steps [10] thru [13] are inserted to print the last desired match, then terminates sed.
  2. The second number of the brace-enclosed pair in step [14] is changed to the ordinal of the last second desired match.
Script3
[ 1] s/[0-9]+/\n&/
[ 2] /\n/!d
[ 3] G
[ 4] s/^[^\n]*\n(.*)/\1\n/
[ 5] h
[ 6] s/^[^\n]*\n//
[ 7] x
[ 8] s/[0-9]+/&\n/
[ 9] s/^([^\n]*)\n[^\n]*\n\n{7}$/\1/
[10] /\n/!q
[11] /^[^\n]*\n[^\n]*\n\n{3,6}$/P
[12] s/\n*$//
[13] D
Comments -r
  1. A version neat than Script2, where
    • Steps [3] and [5] of Script2 is combined to Step [4].
    • Steps [10] thru [13] of Script2 are reduced to Steps [10] and [11].
  2. To get only the third, the seventh, the ninth, and the eleventh matches,
    • Change {7} of Step [9] to {11}.
    • Change \n{3,6} of Step [11] to
      (\n{3}|\n{6}|\n{9}).