[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6. Summarizing files

These commands generate just a few numbers representing entire contents of files.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.1 wc: Print newline, word, and byte counts

wc counts the number of bytes, characters, whitespace-separated words, and newlines in each given file, or standard input if none are given or for a file of `-'. Synopsis:

 
wc [option]… [file]…

wc prints one line of counts for each file, and if the file was given as an argument, it prints the file name following the counts. If more than one file is given, wc prints a final line containing the cumulative counts, with the file name `total'. The counts are printed in this order: newlines, words, characters, bytes, maximum line length. Each count is printed right-justified in a field with at least one space between fields so that the numbers and file names normally line up nicely in columns. The width of the count fields varies depending on the inputs, so you should not depend on a particular field width. However, as a GNU extension, if only one count is printed, it is guaranteed to be printed without leading spaces.

By default, wc prints three counts: the newline, words, and byte counts. Options can specify that only certain counts be printed. Options do not undo others previously given, so

 
wc --bytes --words

prints both the byte counts and the word counts.

With the `--max-line-length' option, wc prints the length of the longest line per file, and if there is more than one file it prints the maximum (not the sum) of those lengths. The line lengths here are measured in screen columns, according to the current locale and assuming tab positions in every 8th column.

The program accepts the following options. Also see Common options.

`-c'
`--bytes'

Print only the byte counts.

`-m'
`--chars'

Print only the character counts.

`-w'
`--words'

Print only the word counts.

`-l'
`--lines'

Print only the newline counts.

`-L'
`--max-line-length'

Print only the maximum line lengths.

`--files0-from=file'

Disallow processing files named on the command line, and instead process those named in file file; each name being terminated by a zero byte (ASCII NUL). This is useful when the list of file names is so long that it may exceed a command line length limitation. In such cases, running wc via xargs is undesirable because it splits the list into pieces and makes wc print a total for each sublist rather than for the entire list. One way to produce a list of ASCII NUL terminated file names is with GNU find, using its `-print0' predicate. If file is `-' then the ASCII NUL terminated file names are read from standard input.

For example, to find the length of the longest line in any `.c' or `.h' file in the current hierarchy, do this:

 
find . -name '*.[ch]' -print0 |
  wc -L --files0-from=- | tail -n1

An exit status of zero indicates success, and a nonzero value indicates failure.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.2 sum: Print checksum and block counts

sum computes a 16-bit checksum for each given file, or standard input if none are given or for a file of `-'. Synopsis:

 
sum [option]… [file]…

sum prints the checksum for each file followed by the number of blocks in the file (rounded up). If more than one file is given, file names are also printed (by default). (With the `--sysv' option, corresponding file names are printed when there is at least one file argument.)

By default, GNU sum computes checksums using an algorithm compatible with BSD sum and prints file sizes in units of 1024-byte blocks.

The program accepts the following options. Also see Common options.

`-r'

Use the default (BSD compatible) algorithm. This option is included for compatibility with the System V sum. Unless `-s' was also given, it has no effect.

`-s'
`--sysv'

Compute checksums using an algorithm compatible with System V sum's default, and print file sizes in units of 512-byte blocks.

sum is provided for compatibility; the cksum program (see next section) is preferable in new applications.

An exit status of zero indicates success, and a nonzero value indicates failure.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.3 cksum: Print CRC checksum and byte counts

cksum computes a cyclic redundancy check (CRC) checksum for each given file, or standard input if none are given or for a file of `-'. Synopsis:

 
cksum [option]… [file]…

cksum prints the CRC checksum for each file along with the number of bytes in the file, and the file name unless no arguments were given.

cksum is typically used to ensure that files transferred by unreliable means (e.g., netnews) have not been corrupted, by comparing the cksum output for the received files with the cksum output for the original files (typically given in the distribution).

The CRC algorithm is specified by the POSIX standard. It is not compatible with the BSD or System V sum algorithms (see the previous section); it is more robust.

The only options are `--help' and `--version'. See section Common options.

An exit status of zero indicates success, and a nonzero value indicates failure.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.4 md5sum: Print or check MD5 digests

md5sum computes a 128-bit checksum (or fingerprint or message-digest) for each specified file.

Note: The MD5 digest is more reliable than a simple CRC (provided by the cksum command) for detecting accidental file corruption, as the chances of accidentally having two files with identical MD5 are vanishingly small. However, it should not be considered truly secure against malicious tampering: although finding a file with a given MD5 fingerprint, or modifying a file so as to retain its MD5 are considered infeasible at the moment, it is known how to produce different files with identical MD5 (a "collision"), something which can be a security issue in certain contexts. For more secure hashes, consider using SHA-1 or SHA-2. See section sha1sum: Print or check SHA-1 digests, and sha2 utilities: Print or check SHA-2 digests.

If a file is specified as `-' or if no files are given md5sum computes the checksum for the standard input. md5sum can also determine whether a file and checksum are consistent. Synopsis:

 
md5sum [option]… [file]…

For each file, `md5sum' outputs the MD5 checksum, a flag indicating a binary or text input file, and the file name. If file contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names. If file is omitted or specified as `-', standard input is read.

The program accepts the following options. Also see Common options.

`-b'
`--binary'

Treat each input file as binary, by reading it in binary mode and outputting a `*' flag. This is the inverse of `--text'. On systems like GNU that do not distinguish between binary and text files, this option merely flags each input file as binary: the MD5 checksum is unaffected. This option is the default on systems like MS-DOS that distinguish between binary and text files, except for reading standard input when standard input is a terminal.

`-c'
`--check'

Read file names and checksum information (not data) from each file (or from stdin if no file was specified) and report whether the checksums match the contents of the named files. The input to this mode of md5sum is usually the output of a prior, checksum-generating run of `md5sum'. Each valid line of input consists of an MD5 checksum, a binary/text flag, and then a file name. Binary files are marked with `*', text with ` '. For each such line, md5sum reads the named file and computes its MD5 checksum. Then, if the computed message digest does not match the one on the line with the file name, the file is noted as having failed the test. Otherwise, the file passes the test. By default, for each valid line, one line is written to standard output indicating whether the named file passed the test. After all checks have been performed, if there were any failures, a warning is issued to standard error. Use the `--status' option to inhibit that output. If any listed file cannot be opened or read, if any valid line has an MD5 checksum inconsistent with the associated file, or if no valid line is found, md5sum exits with nonzero status. Otherwise, it exits successfully.

`--quiet'

This option is useful only when verifying checksums. When verifying checksums, don't generate an 'OK' message per successfully checked file. Files that fail the verification are reported in the default one-line-per-file format. If there is any checksum mismatch, print a warning summarizing the failures to standard error.

`--status'

This option is useful only when verifying checksums. When verifying checksums, don't generate the default one-line-per-file diagnostic and don't output the warning summarizing any failures. Failures to open or read a file still evoke individual diagnostics to standard error. If all listed files are readable and are consistent with the associated MD5 checksums, exit successfully. Otherwise exit with a status code indicating there was a failure.

`-t'
`--text'

Treat each input file as text, by reading it in text mode and outputting a ` ' flag. This is the inverse of `--binary'. This option is the default on systems like GNU that do not distinguish between binary and text files. On other systems, it is the default for reading standard input when standard input is a terminal.

`-w'
`--warn'

When verifying checksums, warn about improperly formatted MD5 checksum lines. This option is useful only if all but a few lines in the checked input are valid.

An exit status of zero indicates success, and a nonzero value indicates failure.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.5 sha1sum: Print or check SHA-1 digests

sha1sum computes a 160-bit checksum for each specified file. The usage and options of this command are precisely the same as for md5sum. See section md5sum: Print or check MD5 digests.

Note: The SHA-1 digest is more secure than MD5, and no collisions of it are known (different files having the same fingerprint). However, it is known that they can be produced with considerable, but not unreasonable, resources. For this reason, it is generally considered that SHA-1 should be gradually phased out in favor of the more secure SHA-2 hash algorithms. See section sha2 utilities: Print or check SHA-2 digests.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

6.6 sha2 utilities: Print or check SHA-2 digests

The commands sha224sum, sha256sum, sha384sum and sha512sum compute checksums of various lengths (respectively 224, 256, 384 and 512 bits), collectively known as the SHA-2 hashes. The usage and options of these commands are precisely the same as for md5sum. See section md5sum: Print or check MD5 digests.

Note: The SHA384 and SHA512 digests are considerably slower to compute, especially on 32-bit computers, than SHA224 or SHA256.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated on January, 20 2010 using texi2html 1.76.