| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This commands operate on individual characters.
9.1 tr: Translate, squeeze, and/or delete characters | Translate, squeeze, and/or delete characters. | |
9.2 expand: Convert tabs to spaces | Convert tabs to spaces. | |
9.3 unexpand: Convert spaces to tabs | Convert spaces to tabs. |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tr: Translate, squeeze, and/or delete characters Synopsis:
tr [option]… set1 [set2] |
tr copies standard input to standard output, performing
one of the following operations:
The set1 and (if given) set2 arguments define ordered
sets of characters, referred to below as set1 and set2. These
sets are the characters of the input that tr operates on.
The `--complement' (`-c', `-C') option replaces
set1 with its
complement (all of the characters that are not in set1).
Currently tr fully supports only single-byte characters.
Eventually it will support multibyte characters; when it does, the
`-C' option will cause it to complement the set of characters,
whereas `-c' will cause it to complement the set of values.
This distinction will matter only when some values are not characters,
and this is possible only in locales using multibyte encodings when
the input contains encoding errors.
The program accepts the `--help' and `--version' options. See section Common options. Options must precede operands.
An exit status of zero indicates success, and a nonzero value indicates failure.
| 9.1.1 Specifying sets of characters | ||
| 9.1.2 Translating | Changing one set of characters to another. | |
| 9.1.3 Squeezing repeats and deleting |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The format of the set1 and set2 arguments resembles the format of regular expressions; however, they are not regular expressions, only lists of characters. Most characters simply represent themselves in these strings, but the strings can contain the shorthands listed below, for convenience. Some of them can be used only in set1 or set2, as noted below.
The following backslash escape sequences are recognized:
Control-G.
Control-H.
Control-L.
Control-J.
Control-M.
Control-I.
Control-K.
The character with the value given by ooo, which is 1 to 3 octal digits,
A backslash.
While a backslash followed by a character not listed above is interpreted as that character, the backslash also effectively removes any special significance, so it is useful to escape `[', `]', `*', and `-'.
The notation `m-n' expands to all of the characters from m through n, in ascending order. m should collate before n; if it doesn't, an error results. As an example, `0-9' is the same as `0123456789'.
GNU tr does not support the System V syntax that uses square
brackets to enclose ranges. Translations specified in that format
sometimes work as expected, since the brackets are often transliterated
to themselves. However, they should be avoided because they sometimes
behave unexpectedly. For example, `tr -d '[0-9]'' deletes brackets
as well as digits.
Many historically common and even accepted uses of ranges are not
portable. For example, on EBCDIC hosts using the `A-Z'
range will not do what most would expect because `A' through `Z'
are not contiguous as they are in ASCII.
If you can rely on a POSIX compliant version of tr, then
the best way to work around this is to use character classes (see below).
Otherwise, it is most portable (and most ugly) to enumerate the members
of the ranges.
The notation `[c*n]' in set2 expands to n copies of character c. Thus, `[y*6]' is the same as `yyyyyy'. The notation `[c*]' in string2 expands to as many copies of c as are needed to make set2 as long as set1. If n begins with `0', it is interpreted in octal, otherwise in decimal.
The notation `[:class:]' expands to all of the characters in
the (predefined) class class. The characters expand in no
particular order, except for the upper and lower classes,
which expand in ascending order. When the `--delete' (`-d')
and `--squeeze-repeats' (`-s') options are both given, any
character class can be used in set2. Otherwise, only the
character classes lower and upper are accepted in
set2, and then only if the corresponding character class
(upper and lower, respectively) is specified in the same
relative position in set1. Doing this specifies case conversion.
The class names are given below; an error results when an invalid class
name is given.
alnumLetters and digits.
alphaLetters.
blankHorizontal whitespace.
cntrlControl characters.
digitDigits.
graphPrintable characters, not including space.
lowerLowercase letters.
printPrintable characters, including space.
punctPunctuation characters.
spaceHorizontal or vertical whitespace.
upperUppercase letters.
xdigitHexadecimal digits.
The syntax `[=c=]' expands to all of the characters that are
equivalent to c, in no particular order. Equivalence classes are
a relatively recent invention intended to support non-English alphabets.
But there seems to be no standard way to define them or determine their
contents. Therefore, they are not fully implemented in GNU tr;
each character's equivalence class consists only of that character,
which is of no particular use.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
tr performs translation when set1 and set2 are
both given and the `--delete' (`-d') option is not given.
tr translates each character of its input that is in set1
to the corresponding character in set2. Characters not in
set1 are passed through unchanged. When a character appears more
than once in set1 and the corresponding characters in set2
are not all the same, only the final one is used. For example, these
two commands are equivalent:
tr aaa xyz tr a z |
A common use of tr is to convert lowercase characters to
uppercase. This can be done in many ways. Here are three of them:
tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ tr a-z A-Z tr '[:lower:]' '[:upper:]' |
But note that using ranges like a-z above is not portable.
When tr is performing translation, set1 and set2
typically have the same length. If set1 is shorter than
set2, the extra characters at the end of set2 are ignored.
On the other hand, making set1 longer than set2 is not
portable; POSIX says that the result is undefined. In this situation,
BSD tr pads set2 to the length of set1 by repeating
the last character of set2 as many times as necessary. System V
tr truncates set1 to the length of set2.
By default, GNU tr handles this case like BSD tr.
When the `--truncate-set1' (`-t') option is given,
GNU tr handles this case like the System V tr
instead. This option is ignored for operations other than translation.
Acting like System V tr in this case breaks the relatively common
BSD idiom:
tr -cs A-Za-z0-9 '\012' |
because it converts only zero bytes (the first element in the complement of set1), rather than all non-alphanumerics, to newlines.
By the way, the above idiom is not portable because it uses ranges, and
it assumes that the octal code for newline is 012.
Assuming a POSIX compliant tr, here is a better way to write it:
tr -cs '[:alnum:]' '[\n*]' |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
When given just the `--delete' (`-d') option, tr
removes any input characters that are in set1.
When given just the `--squeeze-repeats' (`-s') option,
tr replaces each input sequence of a repeated character that
is in set1 with a single occurrence of that character.
When given both `--delete' and `--squeeze-repeats', tr
first performs any deletions using set1, then squeezes repeats
from any remaining characters using set2.
The `--squeeze-repeats' option may also be used when translating,
in which case tr first performs translation, then squeezes
repeats from any remaining characters using set2.
Here are some examples to illustrate various combinations of options:
tr -d '\0' |
tr -cs '[:alnum:]' '[\n*]' |
tr -s '\n' |
uniq with the `-d' option to print out only the words
that were repeated.
#!/bin/sh cat -- "$@" \ | tr -s '[:punct:][:blank:]' '[\n*]' \ | tr '[:upper:]' '[:lower:]' \ | uniq -d |
tr -d axM |
However, when `-' is one of those characters, it can be tricky because
`-' has special meanings. Performing the same task as above but also
removing all `-' characters, we might try tr -d -axM, but
that would fail because tr would try to interpret `-a' as
a command-line option. Alternatively, we could try putting the hyphen
inside the string, tr -d a-xM, but that wouldn't work either because
it would make tr interpret a-x as the range of characters
`a'…`x' rather than the three.
One way to solve the problem is to put the hyphen at the end of the list
of characters:
tr -d axM- |
Or you can use `--' to terminate option processing:
tr -d -- -axM |
More generally, use the character class notation [=c=]
with `-' (or any other character) in place of the `c':
tr -d '[=-=]axM' |
Note how single quotes are used in the above example to protect the square brackets from interpretation by a shell.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
expand: Convert tabs to spaces expand writes the contents of each given file, or standard
input if none are given or for a file of `-', to standard
output, with tab characters converted to the appropriate number of
spaces. Synopsis:
expand [option]… [file]… |
By default, expand converts all tabs to spaces. It preserves
backspace characters in the output; they decrement the column count for
tab calculations. The default action is equivalent to `-t 8' (set
tabs every 8 columns).
The program accepts the following options. Also see Common options.
If only one tab stop is given, set the tabs tab1 spaces apart (default is 8). Otherwise, set the tabs at columns tab1, tab2, … (numbered from 0), and replace any tabs beyond the last tab stop given with single spaces. Tab stops can be separated by blanks as well as by commas.
For compatibility, GNU expand also accepts the obsolete
option syntax, `-t1[,t2]…'. New scripts
should use `-t t1[,t2]…' instead.
Only convert initial tabs (those that precede all non-space or non-tab characters) on each line to spaces.
An exit status of zero indicates success, and a nonzero value indicates failure.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
unexpand: Convert spaces to tabs unexpand writes the contents of each given file, or
standard input if none are given or for a file of `-', to
standard output, converting blanks at the beginning of each line into
as many tab characters as needed. In the default POSIX
locale, a blank is a space or a tab; other locales may specify
additional blank characters. Synopsis:
unexpand [option]… [file]… |
By default, unexpand converts only initial blanks (those
that precede all non-blank characters) on each line. It
preserves backspace characters in the output; they decrement the column
count for tab calculations. By default, tabs are set at every 8th
column.
The program accepts the following options. Also see Common options.
If only one tab stop is given, set the tabs tab1 columns apart instead of the default 8. Otherwise, set the tabs at columns tab1, tab2, … (numbered from 0), and leave blanks beyond the tab stops given unchanged. Tab stops can be separated by blanks as well as by commas. This option implies the `-a' option.
For compatibility, GNU unexpand supports the obsolete option syntax,
`-tab1[,tab2]…', where tab stops must be
separated by commas. (Unlike `-t', this obsolete option does
not imply `-a'.) New scripts should use `--first-only -t
tab1[,tab2]…' instead.
Also convert all sequences of two or more blanks just before a tab stop, even if they occur after non-blank characters in a line.
An exit status of zero indicates success, and a nonzero value indicates failure.
| [ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on January, 20 2010 using texi2html 1.76.