| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Here are some sed scripts to guide you in the art of mastering
sed.
| 4.2 Increment a Number | ||
| 4.3 Rename Files to Lower Case | ||
4.4 Print bash Environment | ||
| 4.5 Reverse Characters of Lines | ||
Emulating standard utilities: | ||
|---|---|---|
| 4.6 Reverse Lines of Files | Reverse lines of files | |
| 4.7 Numbering Lines | Numbering lines | |
| 4.8 Numbering Non-blank Lines | Numbering non-blank lines | |
| 4.9 Counting Characters | Counting chars | |
| 4.10 Counting Words | Counting words | |
| 4.11 Counting Lines | Counting lines | |
| 4.12 Printing the First Lines | Printing the first lines | |
| 4.13 Printing the Last Lines | Printing the last lines | |
| 4.14 Make Duplicate Lines Unique | Make duplicate lines unique | |
| 4.15 Print Duplicated Lines of Input | Print duplicated lines of input | |
| 4.16 Remove All Duplicated Lines | Remove all duplicated lines | |
| 4.17 Squeezing Blank Lines | Squeezing blank lines | |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script centers all lines of a file on a 80 columns width.
To change that width, the number in \{…\} must be
replaced, and the number of added spaces also must be changed.
Note how the buffer commands are used to separate parts in the regular expressions to be matched--this is a common technique.
#!/usr/bin/sed -f
# Put 80 spaces in the buffer
1 {
x
s/^$/ /
s/^.*$/&&&&&&&&/
x
}
# del leading and trailing spaces
y/tab/ /
s/^ *//
s/ *$//
# add a newline and 80 spaces to end of line
G
# keep first 81 chars (80 + a newline)
s/^\(.\{81\}\).*$/\1/
# \2 matches half of the spaces, which are moved to the beginning
s/^\(.*\)\n\(.*\)\2/\2\1/
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script is one of a few that demonstrate how to do arithmetic
in sed. This is indeed possible,(7) but must be done manually.
To increment one number you just add 1 to last digit, replacing it by the following digit. There is one exception: when the digit is a nine the previous digits must be also incremented until you don't have a nine.
This solution by Bruno Haible is very clever and smart because
it uses a single buffer; if you don't have this limitation, the
algorithm used in Numbering lines, is faster.
It works by replacing trailing nines with an underscore, then
using multiple s commands to increment the last digit,
and then again substituting underscores with zeros.
#!/usr/bin/sed -f
/[^0-9]/ d
# replace all leading 9s by _ (any other character except digits, could
# be used)
:d
s/9\(_*\)$/_\1/
td
# incr last digit only. The first line adds a most-significant
# digit of 1 if we have to add a digit.
#
# The |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This is a pretty strange use of sed. We transform text, and
transform it to be shell commands, then just feed them to shell.
Don't worry, even worse hacks are done when using sed; I have
seen a script converting the output of date into a bc
program!
The main body of this is the sed script, which remaps the name
from lower to upper (or vice-versa) and even checks out
if the remapped name is the same as the original name.
Note how the script is parameterized using shell
variables and proper quoting.
#! /bin/sh
# rename files to lower/upper case...
#
# usage:
# move-to-lower *
# move-to-upper *
# or
# move-to-lower -R .
# move-to-upper -R .
#
help()
{
cat << eof
Usage: $0 [-n] [-r] [-h] files...
-n do nothing, only see what would be done
-R recursive (use find)
-h this message
files files to remap to lower case
Examples:
$0 -n * (see if everything is ok, then...)
$0 *
$0 -R .
eof
}
apply_cmd='sh'
finder='echo "$@" | tr " " "\n"'
files_only=
while :
do
case "$1" in
-n) apply_cmd='cat' ;;
-R) finder='find "$@" -type f';;
-h) help ; exit 1 ;;
*) break ;;
esac
shift
done
if [ -z "$1" ]; then
echo Usage: $0 [-h] [-n] [-r] files...
exit 1
fi
LOWER='abcdefghijklmnopqrstuvwxyz'
UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ'
case `basename $0` in
*upper*) TO=$UPPER; FROM=$LOWER ;;
*) FROM=$UPPER; TO=$LOWER ;;
esac
eval $finder | sed -n '
# remove all trailing slashes
s/\/*$//
# add ./ if there is no path, only a filename
/\//! s/^/.\//
# save path+filename
h
# remove path
s/.*\///
# do conversion only on filename
y/'$FROM'/'$TO'/
# now line contains original path+file, while
# hold space contains the new filename
x
# add converted file name to line, which now contains
# path/file-name\nconverted-file-name
G
# check if converted file name is equal to original file name,
# if it is, do not print nothing
/^.*\/\(.*\)\n\1/b
# now, transform path/fromfile\n, into
# mv path/fromfile path/tofile and print it
s/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\2" "\1\3"/p
' | $apply_cmd
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
bash Environment This script strips the definition of the shell functions
from the output of the set Bourne-shell command.
#!/bin/sh
set | sed -n '
:x
# if no occurrence of `=()' print and load next line
/=()/! { p; b; }
/ () $/! { p; b; }
# possible start of functions section
# save the line in case this is a var like FOO="() "
h
# if the next line has a brace, we quit because
# nothing comes after functions
n
/^{/ q
# print the old line
x; p
# work on the new line now
x; bx
'
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script can be used to reverse the position of characters in lines. The technique moves two characters at a time, hence it is faster than more intuitive implementations.
Note the tx command before the definition of the label.
This is often needed to reset the flag that is tested by
the t command.
Imaginative readers will find uses for this script. An example
is reversing the output of banner.(8)
#!/usr/bin/sed -f /../! b # Reverse a line. Begin embedding the line between two newlines s/^.*$/\ &\ / # Move first character at the end. The regexp matches until # there are zero or one characters between the markers tx :x s/\(\n.\)\(.*\)\(.\n\)/\3\2\1/ tx # Remove the newline markers s/\n//g |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This one begins a series of totally useless (yet interesting)
scripts emulating various Unix commands. This, in particular,
is a tac workalike.
Note that on implementations other than GNU sed
this script might easily overflow internal buffers.
#!/usr/bin/sed -nf # reverse all lines of input, i.e. first line became last, ... # from the second line, the buffer (which contains all previous lines) # is *appended* to current line, so, the order will be reversed 1! G # on the last line we're done -- print everything $ p # store everything on the buffer again h |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script replaces `cat -n'; in fact it formats its output
exactly like GNU cat does.
Of course this is completely useless and for two reasons: first, because somebody else did it in C, second, because the following Bourne-shell script could be used for the same purpose and would be much faster:
#! /bin/sh sed -e "=" $@ | sed -e ' s/^/ / N s/^ *\(......\)\n/\1 / ' |
It uses sed to print the line number, then groups lines two
by two using N. Of course, this script does not teach as much as
the one presented below.
The algorithm used for incrementing uses both buffers, so the line
is printed as soon as possible and then discarded. The number
is split so that changing digits go in a buffer and unchanged ones go
in the other; the changed digits are modified in a single step
(using a y command). The line number for the next line
is then composed and stored in the hold space, to be used in the
next iteration.
#!/usr/bin/sed -nf # Prime the pump on the first line x /^$/ s/^.*$/1/ # Add the correct line number before the pattern G h # Format it and print it s/^/ / s/^ *\(......\)\n/\1 /p # Get the line number from hold space; add a zero # if we're going to add a digit on the next line g s/\n.*$// /^9*$/ s/^/0/ # separate changing/unchanged digits with an x s/.9*$/x&/ # keep changing digits in hold space h s/^.*x// y/0123456789/1234567890/ x # keep unchanged digits in pattern space s/x.*$// # compose the new number, remove the newline implicitly added by G G s/\n// h |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Emulating `cat -b' is almost the same as `cat -n'--we only have to select which lines are to be numbered and which are not.
The part that is common to this script and the previous one is
not commented to show how important it is to comment sed
scripts properly...
#!/usr/bin/sed -nf
/^$/ {
p
b
}
# Same as cat -n from now
x
/^$/ s/^.*$/1/
G
h
s/^/ /
s/^ *\(......\)\n/\1 /p
x
s/\n.*$//
/^9*$/ s/^/0/
s/.9*$/x&/
h
s/^.*x//
y/0123456789/1234567890/
x
s/x.*$//
G
s/\n//
h
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script shows another way to do arithmetic with sed.
In this case we have to add possibly large numbers, so implementing
this by successive increments would not be feasible (and possibly
even more complicated to contrive than this script).
The approach is to map numbers to letters, kind of an abacus
implemented with sed. `a's are units, `b's are
tens and so on: we simply add the number of characters
on the current line as units, and then propagate the carry
to tens, hundreds, and so on.
As usual, running totals are kept in hold space.
On the last line, we convert the abacus form back to decimal.
For the sake of variety, this is done with a loop rather than
with some 80 s commands(9): first we
convert units, removing `a's from the number; then we
rotate letters so that tens become `a's, and so on
until no more letters remain.
#!/usr/bin/sed -nf
# Add n+1 a's to hold space (+1 is for the newline)
s/./a/g
H
x
s/\n/a/
# Do the carry. The t's and b's are not necessary,
# but they do speed up the thing
t a
: a; s/aaaaaaaaaa/b/g; t b; b done
: b; s/bbbbbbbbbb/c/g; t c; b done
: c; s/cccccccccc/d/g; t d; b done
: d; s/dddddddddd/e/g; t e; b done
: e; s/eeeeeeeeee/f/g; t f; b done
: f; s/ffffffffff/g/g; t g; b done
: g; s/gggggggggg/h/g; t h; b done
: h; s/hhhhhhhhhh//g
: done
$! {
h
b
}
# On the last line, convert back to decimal
: loop
/a/! s/[b-h]*/&0/
s/aaaaaaaaa/9/
s/aaaaaaaa/8/
s/aaaaaaa/7/
s/aaaaaa/6/
s/aaaaa/5/
s/aaaa/4/
s/aaa/3/
s/aa/2/
s/a/1/
: next
y/bcdefgh/abcdefg/
/[a-h]/ b loop
p
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script is almost the same as the previous one, once each of the words on the line is converted to a single `a' (in the previous script each letter was changed to an `a').
It is interesting that real wc programs have optimized
loops for `wc -c', so they are much slower at counting
words rather than characters. This script's bottleneck,
instead, is arithmetic, and hence the word-counting one
is faster (it has to manage smaller numbers).
Again, the common parts are not commented to show the importance
of commenting sed scripts.
#!/usr/bin/sed -nf
# Convert words to a's
s/[ tab][ tab]*/ /g
s/^/ /
s/ [^ ][^ ]*/a /g
s/ //g
# Append them to hold space
H
x
s/\n//
# From here on it is the same as in wc -c.
/aaaaaaaaaa/! bx; s/aaaaaaaaaa/b/g
/bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/g
/cccccccccc/! bx; s/cccccccccc/d/g
/dddddddddd/! bx; s/dddddddddd/e/g
/eeeeeeeeee/! bx; s/eeeeeeeeee/f/g
/ffffffffff/! bx; s/ffffffffff/g/g
/gggggggggg/! bx; s/gggggggggg/h/g
s/hhhhhhhhhh//g
:x
$! { h; b; }
:y
/a/! s/[b-h]*/&0/
s/aaaaaaaaa/9/
s/aaaaaaaa/8/
s/aaaaaaa/7/
s/aaaaaa/6/
s/aaaaa/5/
s/aaaa/4/
s/aaa/3/
s/aa/2/
s/a/1/
y/bcdefgh/abcdefg/
/[a-h]/ by
p
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
No strange things are done now, because sed gives us
`wc -l' functionality for free!!! Look:
#!/usr/bin/sed -nf $= |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script is probably the simplest useful sed script.
It displays the first 10 lines of input; the number of displayed
lines is right before the q command.
#!/usr/bin/sed -f 10q |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Printing the last n lines rather than the first is more complex but indeed possible. n is encoded in the second line, before the bang character.
This script is similar to the tac script in that it keeps the
final output in the hold space and prints it at the end:
#!/usr/bin/sed -nf
1! {; H; g; }
1,10 !s/[^\n]*\n//
$p
h
|
Mainly, the scripts keeps a window of 10 lines and slides it
by adding a line and deleting the oldest (the substitution command
on the second line works like a D command but does not
restart the loop).
The "sliding window" technique is a very powerful way to write
efficient and complex sed scripts, because commands like
P would require a lot of work if implemented manually.
To introduce the technique, which is fully demonstrated in the
rest of this chapter and is based on the N, P
and D commands, here is an implementation of tail
using a simple "sliding window."
This looks complicated but in fact the working is the same as
the last script: after we have kicked in the appropriate number
of lines, however, we stop using the hold space to keep inter-line
state, and instead use N and D to slide pattern
space by one line:
#!/usr/bin/sed -f
1h
2,10 {; H; g; }
$q
1,9d
N
D
|
Note how the first, second and fourth line are inactive after the first ten lines of input. After that, all the script does is: exiting on the last line of input, appending the next input line to pattern space, and removing the first line.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This is an example of the art of using the N, P
and D commands, probably the most difficult to master.
#!/usr/bin/sed -f
h
:b
# On the last line, print and exit
$b
N
/^\(.*\)\n\1$/ {
# The two lines are identical. Undo the effect of
# the n command.
g
bb
}
# If the |
As you can see, we mantain a 2-line window using P and D.
This technique is often used in advanced sed scripts.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script prints only duplicated lines, like `uniq -d'.
#!/usr/bin/sed -nf
$b
N
/^\(.*\)\n\1$/ {
# Print the first of the duplicated lines
s/.*\n//
p
# Loop until we get a different line
:b
$b
N
/^\(.*\)\n\1$/ {
s/.*\n//
bb
}
}
# The last line cannot be followed by duplicates
$b
# Found a different one. Leave it alone in the pattern space
# and go back to the top, hunting its duplicates
D
|
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This script prints only unique lines, like `uniq -u'.
#!/usr/bin/sed -f
# Search for a duplicate line --- until that, print what you find.
$b
N
/^\(.*\)\n\1$/ ! {
P
D
}
:c
# Got two equal lines in pattern space. At the
# end of the file we simply exit
$d
# Else, we keep reading lines with |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
As a final example, here are three scripts, of increasing complexity and speed, that implement the same function as `cat -s', that is squeezing blank lines.
The first leaves a blank line at the beginning and end if there are some already.
#!/usr/bin/sed -f
# on empty lines, join with next
# Note there is a star in the regexp
:x
/^\n*$/ {
N
bx
}
# now, squeeze all '\n', this can be also done by:
# s/^\(\n\)*/\1/
s/\n*/\
/
|
This one is a bit more complex and removes all empty lines at the beginning. It does leave a single blank line at end if one was there.
#!/usr/bin/sed -f
# delete all leading empty lines
1,/^./{
/./!d
}
# on an empty line we remove it and all the following
# empty lines, but one
:x
/./!{
N
s/^\n$//
tx
}
|
This removes leading and trailing blank lines. It is also the
fastest. Note that loops are completely done with n and
b, without relying on sed to restart the
the script automatically at the end of a line.
#!/usr/bin/sed -nf # delete all (leading) blanks /./!d # get here: so there is a non empty :x # print it p # get next n # got chars? print it again, etc... /./bx # no, don't have chars: got an empty line :z # get next, if last line we finish here so no trailing # empty lines are written n # also empty? then ignore it, and get next... this will # remove ALL empty lines /./!bz # all empty lines were deleted/ignored, but we have a non empty. As # what we want to do is to squeeze, insert a blank line artificially i\ bx |
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on July, 20 2009 using texi2html 1.76.