Description
Given a file contains comments where
  • Each comment begins and ends with a opening and a closing delimiter, respectively.
  • A comment may span across lines.
  • Comments can NOT be nested.
  • The following sample uses /* and */ as the opening and the closing delimiter, respectively.
  • We want to extract every string matched [0-9]+ and is not part of comment.
Raw Input Desired Output
no numbers at all
num_a 101 num_b 102 /* no number
 1030 1040 */ /* 1050 1060 */ num_c 107
num_d 108 /* 1090 1100
no number
1110 1120 */ num_e 113 num_f 114
/* 1150 1160
1170 1180
*/ num_g 119

101
102
107
108
113
114
119
Script and Comments
Script1
[ 1] :top
[ 2] /\/\*/!{ # line without opening delimiter
[ 3] s/[0-9]+/\n&\n/g
[ 4] /\n/!d
[ 5] s/[^\n]*\n([^\n]*\n)/\1/g
[ 6] s/\n[^\n]*$//
[ 7] b
[ 8] }
[ 9] /\*\//!{          # line with opening but
[10] s/\/\*.*$//       # no closing delimiter
[11] s/\n/\n\n/g
[12] s/[0-9]+/\n&\n/g
[13] s/[^\n]*\n([^\n]*\n)/\1/g
[14] s/\n\n/\n/g
[15] s/\n[^\n]*$//p
[16] b in_comment
[17] }
[18] :loop             # replace every pair of
[19] /\*\//s/\/\*/\n&/ # `/* ... */' with a newline
[20] s/\*\//&\n/
[21] s/\n\/\*[^\n]*\*\/\n//
[22] t loop
[23] b top
[24] :in_comment
[25] /\*\//!{ # replace PS with next line
[26] $!N
[27] s/^.*\n//
[28] $!b in_comment
[29] }
[30] s/\*\//&\n/
[31] s/^[^\n]*\*\/\n//
[32] b top
Comments
  1. Please read script3 of `extract every matched string' first.
  2. The `-r' option of GNU sed must be used to interpret REs as EREs.
  3. The flow chart is shown as follows: