Description
  • Given a datafile where each line may contains one or more \cite{...}s.
  • Each \cite{...} may contain one or more members separated by commas. For example, \cite{AA}, \cite{BB,CC}
  • We want to prepend `xx_' before each member of every \cite{...}.
  • There may be some brace-enclosed structures other than \cite{...}s in a line.
Raw Input Desired Output
..\cite{AA}....
...\cite{BB}...\mbox{CC}...\cite{DD}....
........
.....\cite{EE,FF}....\em{GG}....
......\cite{HH,II}.....\cite{JJ,KK,LL}....
....\cite{MM}...\mbox{NN}...\cite{OO,PP}....\cite{QQ}...
..\cite{xx_AA}....
...\cite{xx_BB}...\mbox{CC}...\cite{xx_DD}....
........
.....\cite{xx_EE,xx_FF}....\em{GG}....
......\cite{xx_HH,xx_II}.....\cite{xx_JJ,xx_KK,xx_LL}....
....\cite{xx_MM}...\mbox{NN}...\cite{xx_OO,xx_PP}....\cite{xx_QQ}...
Script and Comments
Script1
[ 1] s/\\cite\{/&\n/g
[ 2] /\n/!b
[ 3] :loop
[ 4] s/\n([^,}]*[,}])/xx_\1\n/g
[ 5] s/\}\n/}/g
[ 6] /\n/b loop
Comments
  1. After reading a line to PS, Step [1] inserts a newline character as a mark at the beginning of the first member of every \cite{...}.
  2. Every member prepended with a newline character is referred to as a marked one.
  3. Any line without \cite{...} will be printed by Step [2], then sed will start a new cycle.
  4. Steps [3] thru [5] constitute a loop. Each iteration of this loop will
    • insert `xx_' between the newline character and the beginning of every marked member.
    • move the mark to the next member if it exists;
    • otherwise, delete the mark.
  5. Using ......\cite{HH,II}.....\cite{JJ,KK,LL}.... as an example:
    After StepPattern Space
    0 ......\cite{HH,II}.....\cite{JJ,KK,LL}....
    1 ......\cite{\nHH,II}.....\cite{\nJJ,KK,LL}....
    4 ......\cite{xx_HH,\nII}.....\cite{xx_JJ,\nKK,LL}....
    4 ......\cite{xx_HH,xx_II}\n.....\cite{xx_JJ,xx_KK,\nLL}....
    5 ......\cite{xx_HH,xx_II}.....\cite{xx_JJ,xx_KK,\nLL}....
    4 ......\cite{xx_HH,xx_II}.....\cite{xx_JJ,xx_KK,xx_LL}\n....
    5 ......\cite{xx_HH,xx_II}.....\cite{xx_JJ,xx_KK,xx_LL}....