summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorJim Meyering <jim@meyering.net>2001-05-21 12:37:24 +0000
committerJim Meyering <jim@meyering.net>2001-05-21 12:37:24 +0000
commit4893d6c53bd8dfbf1a27bbca5ed54dfb40b39f79 (patch)
tree1c3bec8beece5e9b401e2fd4d779f52db88b3f59 /doc
parentc025720350aad78a2c8b809b654c1e77df2546a0 (diff)
downloadcoreutils-4893d6c53bd8dfbf1a27bbca5ed54dfb40b39f79.tar.xz
.
Diffstat (limited to 'doc')
-rw-r--r--doc/textutils.texi4521
1 files changed, 0 insertions, 4521 deletions
diff --git a/doc/textutils.texi b/doc/textutils.texi
deleted file mode 100644
index ec7e2a49a..000000000
--- a/doc/textutils.texi
+++ /dev/null
@@ -1,4521 +0,0 @@
-\input texinfo
-@c %**start of header
-@setfilename textutils.info
-@settitle @sc{gnu} text utilities
-@c %**end of header
-
-@include version.texi
-@include constants.texi
-
-@c Define new indices.
-@defcodeindex op
-
-@c Put everything in one index (arbitrarily chosen to be the concept index).
-@syncodeindex fn cp
-@syncodeindex ky cp
-@syncodeindex op cp
-@syncodeindex pg cp
-@syncodeindex vr cp
-
-@ifinfo
-@format
-START-INFO-DIR-ENTRY
-* Text utilities: (textutils). GNU text utilities.
-* cat: (textutils)cat invocation. Concatenate and write files.
-* cksum: (textutils)cksum invocation. Print @sc{posix} CRC checksum.
-* comm: (textutils)comm invocation. Compare sorted files by line.
-* csplit: (textutils)csplit invocation. Split by context.
-* cut: (textutils)cut invocation. Print selected parts of lines.
-* expand: (textutils)expand invocation. Convert tabs to spaces.
-* fmt: (textutils)fmt invocation. Reformat paragraph text.
-* fold: (textutils)fold invocation. Wrap long input lines.
-* head: (textutils)head invocation. Output the first part of files.
-* join: (textutils)join invocation. Join lines on a common field.
-* md5sum: (textutils)md5sum invocation. Print or check message-digests.
-* nl: (textutils)nl invocation. Number lines and write files.
-* od: (textutils)od invocation. Dump files in octal, etc.
-* paste: (textutils)paste invocation. Merge lines of files.
-* pr: (textutils)pr invocation. Paginate or columnate files.
-* ptx: (textutils)ptx invocation. Produce permuted indexes.
-* sort: (textutils)sort invocation. Sort text files.
-* split: (textutils)split invocation. Split into fixed-size pieces.
-* sum: (textutils)sum invocation. Print traditional checksum.
-* tac: (textutils)tac invocation. Reverse files.
-* tail: (textutils)tail invocation. Output the last part of files.
-* tsort: (textutils)tsort invocation. Topological sort.
-* tr: (textutils)tr invocation. Translate characters.
-* unexpand: (textutils)unexpand invocation. Convert spaces to tabs.
-* uniq: (textutils)uniq invocation. Uniquify files.
-* wc: (textutils)wc invocation. Byte, word, and line counts.
-END-INFO-DIR-ENTRY
-@end format
-@end ifinfo
-
-@ifinfo
-This file documents the GNU text utilities.
-
-Copyright (C) 1994, 95, 96 Free Software Foundation, Inc.
-
-Permission is granted to copy, distribute and/or modify this document
-under the terms of the GNU Free Documentation License, Version 1.1
-or any later version published by the Free Software Foundation;
-with no Invariant Sections, with no
-Front-Cover Texts, and with no Back-Cover Texts.
-A copy of the license is included in the section entitled ``GNU
-Free Documentation License''.
-
-@end ifinfo
-
-@titlepage
-@title @sc{gnu} @code{textutils}
-@subtitle A set of text utilities
-@subtitle for version @value{VERSION}, @value{UPDATED}
-@author David MacKenzie et al.
-
-@page
-@vskip 0pt plus 1filll
-Copyright @copyright{} 1994, 95, 96 Free Software Foundation, Inc.
-
-Permission is granted to copy, distribute and/or modify this document
-under the terms of the GNU Free Documentation License, Version 1.1
-or any later version published by the Free Software Foundation;
-with no Invariant Sections, with no
-Front-Cover Texts, and with no Back-Cover Texts.
-A copy of the license is included in the section entitled ``GNU
-Free Documentation License''.
-@end titlepage
-
-
-@c If your makeinfo doesn't grok this @ifnottex directive, then either
-@c get a newer version of makeinfo or do s/ifnottex/ifinfo/ here and on
-@c the matching @end directive below.
-@ifnottex
-@node Top
-@top GNU text utilities
-
-@cindex text utilities
-@cindex utilities for text handling
-
-This manual documents version @value{VERSION} of the @sc{gnu} text utilities.
-
-@menu
-* Introduction:: Caveats, overview, and authors.
-* Common options:: Common options.
-* Output of entire files:: cat tac nl od
-* Formatting file contents:: fmt pr fold
-* Output of parts of files:: head tail split csplit
-* Summarizing files:: wc sum cksum md5sum
-* Operating on sorted files:: sort uniq comm ptx tsort
-* Operating on fields within a line:: cut paste join
-* Operating on characters:: tr expand unexpand
-* Opening the software toolbox:: The software tools philosophy.
-* Index:: General index.
-
-@detailmenu
- --- The Detailed Node Listing ---
-
-Output of entire files
-
-* cat invocation:: Concatenate and write files.
-* tac invocation:: Concatenate and write files in reverse.
-* nl invocation:: Number lines and write files.
-* od invocation:: Write files in octal or other formats.
-
-Formatting file contents
-
-* fmt invocation:: Reformat paragraph text.
-* pr invocation:: Paginate or columnate files for printing.
-* fold invocation:: Wrap input lines to fit in specified width.
-
-Output of parts of files
-
-* head invocation:: Output the first part of files.
-* tail invocation:: Output the last part of files.
-* split invocation:: Split a file into fixed-size pieces.
-* csplit invocation:: Split a file into context-determined pieces.
-
-Summarizing files
-
-* wc invocation:: Print byte, word, and line counts.
-* sum invocation:: Print checksum and block counts.
-* cksum invocation:: Print CRC checksum and byte counts.
-* md5sum invocation:: Print or check message-digests.
-
-Operating on sorted files
-
-* sort invocation:: Sort text files.
-* uniq invocation:: Uniquify files.
-* comm invocation:: Compare two sorted files line by line.
-* ptx invocation:: Produce a permuted index of file contents.
-* tsort invocation:: Topological sort.
-
-@code{ptx}: Produce permuted indexes
-
-* General options in ptx:: Options which affect general program behavior.
-* Charset selection in ptx:: Underlying character set considerations.
-* Input processing in ptx:: Input fields, contexts, and keyword selection.
-* Output formatting in ptx:: Types of output format, and sizing the fields.
-* Compatibility in ptx:: The GNU extensions to @code{ptx}
-
-Operating on fields within a line
-
-* cut invocation:: Print selected parts of lines.
-* paste invocation:: Merge lines of files.
-* join invocation:: Join lines on a common field.
-
-Operating on characters
-
-* tr invocation:: Translate, squeeze, and/or delete characters.
-* expand invocation:: Convert tabs to spaces.
-* unexpand invocation:: Convert spaces to tabs.
-
-@code{tr}: Translate, squeeze, and/or delete characters
-
-* Character sets:: Specifying sets of characters.
-* Translating:: Changing one characters to another.
-* Squeezing:: Squeezing repeats and deleting.
-* Warnings in tr:: Warning messages.
-
-Opening the software toolbox
-
-* Toolbox introduction:: Toolbox introduction
-* I/O redirection:: I/O redirection
-* The who command:: The @code{who} command
-* The cut command:: The @code{cut} command
-* The sort command:: The @code{sort} command
-* The uniq command:: The @code{uniq} command
-* Putting the tools together:: Putting the tools together
-
-@end detailmenu
-@end menu
-
-@end ifnottex
-
-
-@node Introduction
-@chapter Introduction
-
-@cindex introduction
-
-This manual is incomplete: No attempt is made to explain basic concepts
-in a way suitable for novices. Thus, if you are interested, please get
-involved in improving this manual. The entire @sc{gnu} community will
-benefit.
-
-@cindex POSIX.2
-The @sc{gnu} text utilities are mostly compatible with the @sc{posix.2}
-standard.
-
-@c This paragraph appears in all of fileutils.texi, textutils.texi, and
-@c sh-utils.texi too -- so be sure to keep them consistent.
-@cindex bugs, reporting
-Please report bugs to @email{bug-textutils@@gnu.org}. Remember
-to include the version number, machine architecture, input files, and
-any other information needed to reproduce the bug: your input, what you
-expected, what you got, and why it is wrong. Diffs are welcome, but
-please include a description of the problem as well, since this is
-sometimes difficult to infer. @xref{Bugs, , , gcc, GNU CC}.
-
-This manual was originally derived from the Unix man pages in the
-distribution, which were written by David MacKenzie and updated by Jim
-Meyering. What you are reading now is the authoritative documentation
-for these utilities; the man pages are no longer being maintained.
-The original @code{fmt} man page was written by Ross Paterson.
-Fran@,{c}ois Pinard did the initial conversion to Texinfo format.
-Karl Berry did the indexing, some reorganization, and editing of the results.
-Richard Stallman contributed his usual invaluable insights to the
-overall process.
-
-
-@node Common options
-@chapter Common options
-
-@cindex common options
-
-Certain options are available in all of these programs. Rather than
-writing identical descriptions for each of the programs, they are
-described here. (In fact, every @sc{gnu} program accepts (or should accept)
-these options.)
-
-@vindex POSIXLY_CORRECT
-Normally options and operands can appear in any order, and programs act
-as if all the options appear before any operands. For example,
-@samp{sort -r passwd -t :} acts like @samp{sort -r -t : passwd}, since
-@samp{:} is an option-argument of @option{-t}. However, if the
-@env{POSIXLY_CORRECT} environment variable is set, options must appear
-before operands, unless otherwise specified for a particular command.
-
-Some of these programs recognize the @samp{--help} and @samp{--version}
-options only when one of them is the sole command line argument.
-
-@table @samp
-
-@item --help
-@opindex --help
-@cindex help, online
-Print a usage message listing all available options, then exit successfully.
-
-@item --version
-@opindex --version
-@cindex version number, finding
-Print the version number, then exit successfully.
-
-@item --
-@opindex --
-@cindex option delimiter
-Delimit the option list. Later arguments, if any, are treated as
-operands even if they begin with @samp{-}. For example, @samp{sort --
--r} reads from the file named @file{-r}.
-
-@end table
-
-@cindex standard input
-@cindex standard output
-A single @samp{-} is not really an option, though it looks like one. It
-stands for standard input, or for standard output if that is clear from
-the context, and it can be used either as an operand or as an
-option-argument. For example, @samp{sort -o - -} outputs to standard
-output and reads from standard input, and is equivalent to plain
-@samp{sort}. Unless otherwise specified, @samp{-} can appear in any
-context that requires a file name.
-
-@node Output of entire files
-@chapter Output of entire files
-
-@cindex output of entire files
-@cindex entire files, output of
-
-These commands read and write entire files, possibly transforming them
-in some way.
-
-@menu
-* cat invocation:: Concatenate and write files.
-* tac invocation:: Concatenate and write files in reverse.
-* nl invocation:: Number lines and write files.
-* od invocation:: Write files in octal or other formats.
-@end menu
-
-@node cat invocation
-@section @code{cat}: Concatenate and write files
-
-@pindex cat
-@cindex concatenate and write files
-@cindex copying files
-
-@code{cat} copies each @var{file} (@samp{-} means standard input), or
-standard input if none are given, to standard output. Synopsis:
-
-@example
-cat [@var{option}] [@var{file}]@dots{}
-@end example
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -A
-@itemx --show-all
-@opindex -A
-@opindex --show-all
-Equivalent to @samp{-vET}.
-
-@item -B
-@itemx --binary
-@opindex -B
-@opindex --binary
-@cindex binary and text I/O in cat
-On MS-DOS and MS-Windows only, read and write the files in binary mode.
-By default, @code{cat} on MS-DOS/MS-Windows uses binary mode only when
-standard output is redirected to a file or a pipe; this option overrides
-that. Binary file I/O is used so that the files retain their format
-(Unix text as opposed to DOS text and binary), because @code{cat} is
-frequently used as a file-copying program. Some options (see below)
-cause @code{cat} to read and write files in text mode because in those
-cases the original file contents aren't important (e.g., when lines are
-numbered by @code{cat}, or when line endings should be marked). This is
-so these options work as DOS/Windows users would expect; for example,
-DOS-style text files have their lines end with the CR-LF pair of
-characters, which won't be processed as an empty line by @samp{-b} unless
-the file is read in text mode.
-
-@item -b
-@itemx --number-nonblank
-@opindex -b
-@opindex --number-nonblank
-Number all nonblank output lines, starting with 1. On MS-DOS and
-MS-Windows, this option causes @code{cat} to read and write files in
-text mode.
-
-@item -e
-@opindex -e
-Equivalent to @samp{-vE}.
-
-@item -E
-@itemx --show-ends
-@opindex -E
-@opindex --show-ends
-Display a @samp{$} after the end of each line. On MS-DOS and
-MS-Windows, this option causes @code{cat} to read and write files in
-text mode.
-
-@item -n
-@itemx --number
-@opindex -n
-@opindex --number
-Number all output lines, starting with 1. On MS-DOS and MS-Windows,
-this option causes @code{cat} to read and write files in text mode.
-
-@item -s
-@itemx --squeeze-blank
-@opindex -s
-@opindex --squeeze-blank
-@cindex squeezing blank lines
-Replace multiple adjacent blank lines with a single blank line. On
-MS-DOS and MS-Windows, this option causes @code{cat} to read and write
-files in text mode.
-
-@item -t
-@opindex -t
-Equivalent to @samp{-vT}.
-
-@item -T
-@itemx --show-tabs
-@opindex -T
-@opindex --show-tabs
-Display TAB characters as @samp{^I}.
-
-@item -u
-@opindex -u
-Ignored; for Unix compatibility.
-
-@item -v
-@itemx --show-nonprinting
-@opindex -v
-@opindex --show-nonprinting
-Display control characters except for LFD and TAB using
-@samp{^} notation and precede characters that have the high bit set with
-@samp{M-}. On MS-DOS and MS-Windows, this option causes @code{cat} to
-read files and standard input in DOS binary mode, so the CR
-characters at the end of each line are also visible.
-
-@end table
-
-
-@node tac invocation
-@section @code{tac}: Concatenate and write files in reverse
-
-@pindex tac
-@cindex reversing files
-
-@code{tac} copies each @var{file} (@samp{-} means standard input), or
-standard input if none are given, to standard output, reversing the
-records (lines by default) in each separately. Synopsis:
-
-@example
-tac [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@dfn{Records} are separated by instances of a string (newline by
-default). By default, this separator string is attached to the end of
-the record that it follows in the file.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -b
-@itemx --before
-@opindex -b
-@opindex --before
-The separator is attached to the beginning of the record that it
-precedes in the file.
-
-@item -r
-@itemx --regex
-@opindex -r
-@opindex --regex
-Treat the separator string as a regular expression. Users of @code{tac}
-on MS-DOS/MS-Windows should note that, since @code{tac} reads files in
-binary mode, each line of a text file might end with a CR/LF pair
-instead of the Unix-style LF.
-
-@item -s @var{separator}
-@itemx --separator=@var{separator}
-@opindex -s
-@opindex --separator
-Use @var{separator} as the record separator, instead of newline.
-
-@end table
-
-
-@node nl invocation
-@section @code{nl}: Number lines and write files
-
-@pindex nl
-@cindex numbering lines
-@cindex line numbering
-
-@code{nl} writes each @var{file} (@samp{-} means standard input), or
-standard input if none are given, to standard output, with line numbers
-added to some or all of the lines. Synopsis:
-
-@example
-nl [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@cindex logical pages, numbering on
-@code{nl} decomposes its input into (logical) pages; by default, the
-line number is reset to 1 at the top of each logical page. @code{nl}
-treats all of the input files as a single document; it does not reset
-line numbers or logical pages between files.
-
-@cindex headers, numbering
-@cindex body, numbering
-@cindex footers, numbering
-A logical page consists of three sections: header, body, and footer.
-Any of the sections can be empty. Each can be numbered in a different
-style from the others.
-
-The beginnings of the sections of logical pages are indicated in the
-input file by a line containing exactly one of these delimiter strings:
-
-@table @samp
-@item \:\:\:
-start of header;
-@item \:\:
-start of body;
-@item \:
-start of footer.
-@end table
-
-The two characters from which these strings are made can be changed from
-@samp{\} and @samp{:} via options (see below), but the pattern and
-length of each string cannot be changed.
-
-A section delimiter is replaced by an empty line on output. Any text
-that comes before the first section delimiter string in the input file
-is considered to be part of a body section, so @code{nl} treats a
-file that contains no section delimiters as a single body section.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -b @var{style}
-@itemx --body-numbering=@var{style}
-@opindex -b
-@opindex --body-numbering
-Select the numbering style for lines in the body section of each
-logical page. When a line is not numbered, the current line number
-is not incremented, but the line number separator character is still
-prepended to the line. The styles are:
-
-@table @samp
-@item a
-number all lines,
-@item t
-number only nonempty lines (default for body),
-@item n
-do not number lines (default for header and footer),
-@item p@var{regexp}
-number only lines that contain a match for @var{regexp}.
-@end table
-
-@item -d @var{cd}
-@itemx --section-delimiter=@var{cd}
-@opindex -d
-@opindex --section-delimiter
-@cindex section delimiters of pages
-Set the section delimiter characters to @var{cd}; default is
-@samp{\:}. If only @var{c} is given, the second remains @samp{:}.
-(Remember to protect @samp{\} or other metacharacters from shell
-expansion with quotes or extra backslashes.)
-
-@item -f @var{style}
-@itemx --footer-numbering=@var{style}
-@opindex -f
-@opindex --footer-numbering
-Analogous to @samp{--body-numbering}.
-
-@item -h @var{style}
-@itemx --header-numbering=@var{style}
-@opindex -h
-@opindex --header-numbering
-Analogous to @samp{--body-numbering}.
-
-@item -i @var{number}
-@itemx --page-increment=@var{number}
-@opindex -i
-@opindex --page-increment
-Increment line numbers by @var{number} (default 1).
-
-@item -l @var{number}
-@itemx --join-blank-lines=@var{number}
-@opindex -l
-@opindex --join-blank-lines
-@cindex empty lines, numbering
-@cindex blank lines, numbering
-Consider @var{number} (default 1) consecutive empty lines to be one
-logical line for numbering, and only number the last one. Where fewer
-than @var{number} consecutive empty lines occur, do not number them.
-An empty line is one that contains no characters, not even spaces
-or tabs.
-
-@item -n @var{format}
-@itemx --number-format=@var{format}
-@opindex -n
-@opindex --number-format
-Select the line numbering format (default is @code{rn}):
-
-@table @samp
-@item ln
-@opindex ln @r{format for @code{nl}}
-left justified, no leading zeros;
-@item rn
-@opindex rn @r{format for @code{nl}}
-right justified, no leading zeros;
-@item rz
-@opindex rz @r{format for @code{nl}}
-right justified, leading zeros.
-@end table
-
-@item -p
-@itemx --no-renumber
-@opindex -p
-@opindex --no-renumber
-Do not reset the line number at the start of a logical page.
-
-@item -s @var{string}
-@itemx --number-separator=@var{string}
-@opindex -s
-@opindex --number-separator
-Separate the line number from the text line in the output with
-@var{string} (default is the TAB character).
-
-@item -v @var{number}
-@itemx --starting-line-number=@var{number}
-@opindex -v
-@opindex --starting-line-number
-Set the initial line number on each logical page to @var{number} (default 1).
-
-@item -w @var{number}
-@itemx --number-width=@var{number}
-@opindex -w
-@opindex --number-width
-Use @var{number} characters for line numbers (default 6).
-
-@end table
-
-
-@node od invocation
-@section @code{od}: Write files in octal or other formats
-
-@pindex od
-@cindex octal dump of files
-@cindex hex dump of files
-@cindex ASCII dump of files
-@cindex file contents, dumping unambiguously
-
-@code{od} writes an unambiguous representation of each @var{file}
-(@samp{-} means standard input), or standard input if none are given.
-Synopsis:
-
-@example
-od [@var{option}]@dots{} [@var{file}]@dots{}
-od -C [@var{file}] [[+]@var{offset} [[+]@var{label}]]
-@end example
-
-Each line of output consists of the offset in the input, followed by
-groups of data from the file. By default, @code{od} prints the offset in
-octal, and each group of file data is two bytes of input printed as a
-single octal number.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -A @var{radix}
-@itemx --address-radix=@var{radix}
-@opindex -A
-@opindex --address-radix
-@cindex radix for file offsets
-@cindex file offset radix
-Select the base in which file offsets are printed. @var{radix} can
-be one of the following:
-
-@table @samp
-@item d
-decimal;
-@item o
-octal;
-@item x
-hexadecimal;
-@item n
-none (do not print offsets).
-@end table
-
-The default is octal.
-
-@item -j @var{bytes}
-@itemx --skip-bytes=@var{bytes}
-@opindex -j
-@opindex --skip-bytes
-Skip @var{bytes} input bytes before formatting and writing. If
-@var{bytes} begins with @samp{0x} or @samp{0X}, it is interpreted in
-hexadecimal; otherwise, if it begins with @samp{0}, in octal; otherwise,
-in decimal. Appending @samp{b} multiplies @var{bytes} by 512, @samp{k}
-by 1024, and @samp{m} by 1048576.
-
-@item -N @var{bytes}
-@itemx --read-bytes=@var{bytes}
-@opindex -N
-@opindex --read-bytes
-Output at most @var{bytes} bytes of the input. Prefixes and suffixes on
-@code{bytes} are interpreted as for the @samp{-j} option.
-
-@item -s [@var{n}]
-@itemx --strings[=@var{n}]
-@opindex -s
-@opindex --strings
-@cindex string constants, outputting
-Instead of the normal output, output only @dfn{string constants}: at
-least @var{n} (3 by default) consecutive @sc{ascii} graphic characters,
-followed by a null (zero) byte.
-
-@item -t @var{type}
-@itemx --format=@var{type}
-@opindex -t
-@opindex --format
-Select the format in which to output the file data. @var{type} is a
-string of one or more of the below type indicator characters. If you
-include more than one type indicator character in a single @var{type}
-string, or use this option more than once, @code{od} writes one copy
-of each output line using each of the data types that you specified,
-in the order that you specified.
-
-Adding a trailing ``z'' to any type specification appends a display
-of the @sc{ascii} character representation of the printable characters
-to the output line generated by the type specification.
-
-@table @samp
-@item a
-named character
-@item c
-@sc{ascii} character or backslash escape,
-@item d
-signed decimal
-@item f
-floating point
-@item o
-octal
-@item u
-unsigned decimal
-@item x
-hexadecimal
-@end table
-
-The type @code{a} outputs things like @samp{sp} for space, @samp{nl} for
-newline, and @samp{nul} for a null (zero) byte. Type @code{c} outputs
-@samp{ }, @samp{\n}, and @code{\0}, respectively.
-
-@cindex type size
-Except for types @samp{a} and @samp{c}, you can specify the number
-of bytes to use in interpreting each number in the given data type
-by following the type indicator character with a decimal integer.
-Alternately, you can specify the size of one of the C compiler's
-built-in data types by following the type indicator character with
-one of the following characters. For integers (@samp{d}, @samp{o},
-@samp{u}, @samp{x}):
-
-@table @samp
-@item C
-char
-@item S
-short
-@item I
-int
-@item L
-long
-@end table
-
-For floating point (@code{f}):
-
-@table @asis
-@item F
-float
-@item D
-double
-@item L
-long double
-@end table
-
-@item -v
-@itemx --output-duplicates
-@opindex -v
-@opindex --output-duplicates
-Output consecutive lines that are identical. By default, when two or
-more consecutive output lines would be identical, @code{od} outputs only
-the first line, and puts just an asterisk on the following line to
-indicate the elision.
-
-@item -w[@var{n}]
-@itemx --width[=@var{n}]
-@opindex -w
-@opindex --width
-Dump @code{n} input bytes per output line. This must be a multiple of
-the least common multiple of the sizes associated with the specified
-output types. If @var{n} is omitted, the default is 32. If this option
-is not given at all, the default is 16.
-
-@end table
-
-The next several options map the old, pre-@sc{posix} format specification
-options to the corresponding @sc{posix} format specs.
-@sc{gnu} @code{od} accepts
-any combination of old- and new-style options. Format specification
-options accumulate.
-
-@table @samp
-
-@item -a
-@opindex -a
-Output as named characters. Equivalent to @samp{-ta}.
-
-@item -b
-@opindex -b
-Output as octal bytes. Equivalent to @samp{-toC}.
-
-@item -c
-@opindex -c
-Output as @sc{ascii} characters or backslash escapes. Equivalent to
-@samp{-tc}.
-
-@item -d
-@opindex -d
-Output as unsigned decimal shorts. Equivalent to @samp{-tu2}.
-
-@item -f
-@opindex -f
-Output as floats. Equivalent to @samp{-tfF}.
-
-@item -h
-@opindex -h
-Output as hexadecimal shorts. Equivalent to @samp{-tx2}.
-
-@item -i
-@opindex -i
-Output as decimal shorts. Equivalent to @samp{-td2}.
-
-@item -l
-@opindex -l
-Output as decimal longs. Equivalent to @samp{-td4}.
-
-@item -o
-@opindex -o
-Output as octal shorts. Equivalent to @samp{-to2}.
-
-@item -x
-@opindex -x
-Output as hexadecimal shorts. Equivalent to @samp{-tx2}.
-
-@item -C
-@itemx --traditional
-@opindex --traditional
-Recognize the pre-@sc{posix} non-option arguments that traditional @code{od}
-accepted. The following syntax:
-
-@smallexample
-od --traditional [@var{file}] [[+]@var{offset}[.][b] [[+]@var{label}[.][b]]]
-@end smallexample
-
-@noindent
-can be used to specify at most one file and optional arguments
-specifying an offset and a pseudo-start address, @var{label}. By
-default, @var{offset} is interpreted as an octal number specifying how
-many input bytes to skip before formatting and writing. The optional
-trailing decimal point forces the interpretation of @var{offset} as a
-decimal number. If no decimal is specified and the offset begins with
-@samp{0x} or @samp{0X} it is interpreted as a hexadecimal number. If
-there is a trailing @samp{b}, the number of bytes skipped will be
-@var{offset} multiplied by 512. The @var{label} argument is interpreted
-just like @var{offset}, but it specifies an initial pseudo-address. The
-pseudo-addresses are displayed in parentheses following any normal
-address.
-
-@end table
-
-
-@node Formatting file contents
-@chapter Formatting file contents
-
-@cindex formatting file contents
-
-These commands reformat the contents of files.
-
-@menu
-* fmt invocation:: Reformat paragraph text.
-* pr invocation:: Paginate or columnate files for printing.
-* fold invocation:: Wrap input lines to fit in specified width.
-@end menu
-
-
-@node fmt invocation
-@section @code{fmt}: Reformat paragraph text
-
-@pindex fmt
-@cindex reformatting paragraph text
-@cindex paragraphs, reformatting
-@cindex text, reformatting
-
-@code{fmt} fills and joins lines to produce output lines of (at most)
-a given number of characters (75 by default). Synopsis:
-
-@example
-fmt [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@code{fmt} reads from the specified @var{file} arguments (or standard
-input if none are given), and writes to standard output.
-
-By default, blank lines, spaces between words, and indentation are
-preserved in the output; successive input lines with different
-indentation are not joined; tabs are expanded on input and introduced on
-output.
-
-@cindex line-breaking
-@cindex sentences and line-breaking
-@cindex Knuth, Donald E.
-@cindex Plass, Michael F.
-@code{fmt} prefers breaking lines at the end of a sentence, and tries to
-avoid line breaks after the first word of a sentence or before the last
-word of a sentence. A @dfn{sentence break} is defined as either the end
-of a paragraph or a word ending in any of @samp{.?!}, followed by two
-spaces or end of line, ignoring any intervening parentheses or quotes.
-Like @TeX{}, @code{fmt} reads entire ``paragraphs'' before choosing line
-breaks; the algorithm is a variant of that in ``Breaking Paragraphs Into
-Lines'' (Donald E. Knuth and Michael F. Plass, @cite{Software---Practice
-and Experience}, 11 (1981), 1119--1184).
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -c
-@itemx --crown-margin
-@opindex -c
-@opindex --crown-margin
-@cindex crown margin
-@dfn{Crown margin} mode: preserve the indentation of the first two
-lines within a paragraph, and align the left margin of each subsequent
-line with that of the second line.
-
-@item -t
-@itemx --tagged-paragraph
-@opindex -t
-@opindex --tagged-paragraph
-@cindex tagged paragraphs
-@dfn{Tagged paragraph} mode: like crown margin mode, except that if
-indentation of the first line of a paragraph is the same as the
-indentation of the second, the first line is treated as a one-line
-paragraph.
-
-@item -s
-@itemx --split-only
-@opindex -s
-@opindex --split-only
-Split lines only. Do not join short lines to form longer ones. This
-prevents sample lines of code, and other such ``formatted'' text from
-being unduly combined.
-
-@item -u
-@itemx --uniform-spacing
-@opindex -u
-@opindex --uniform-spacing
-Uniform spacing. Reduce spacing between words to one space, and spacing
-between sentences to two spaces.
-
-@item -@var{width}
-@itemx -w @var{width}
-@itemx --width=@var{width}
-@opindex -@var{width}
-@opindex -w
-@opindex --width
-Fill output lines up to @var{width} characters (default 75). @code{fmt}
-initially tries to make lines about 7% shorter than this, to give it
-room to balance line lengths.
-
-@item -p @var{prefix}
-@itemx --prefix=@var{prefix}
-Only lines beginning with @var{prefix} (possibly preceded by whitespace)
-are subject to formatting. The prefix and any preceding whitespace are
-stripped for the formatting and then re-attached to each formatted output
-line. One use is to format certain kinds of program comments, while
-leaving the code unchanged.
-
-@end table
-
-
-@node pr invocation
-@section @code{pr}: Paginate or columnate files for printing
-
-@pindex pr
-@cindex printing, preparing files for
-@cindex multicolumn output, generating
-@cindex merging files in parallel
-
-@code{pr} writes each @var{file} (@samp{-} means standard input), or
-standard input if none are given, to standard output, paginating and
-optionally outputting in multicolumn format; optionally merges all
-@var{file}s, printing all in parallel, one per column. Synopsis:
-
-@example
-pr [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@vindex LC_MESSAGES
-By default, a 5-line header is printed at each page: two blank lines;
-a line with the date, the filename, and the page count; and two more
-blank lines. A footer of five blank lines is also printed. With the @samp{-F}
-option, a 3-line header is printed: the leading two blank lines are
-omitted; no footer is used. The default @var{page_length} in both cases is 66
-lines. The default number of text lines changes from 56 (without @samp{-F})
-to 63 (with @samp{-F}). The text line of the header takes the form
-@samp{@var{date} @var{string} @var{page}}, with spaces inserted around
-@var{string} so that the line takes up the full @var{page_width}. Here,
-@var{date} is the date (see the @option{-D} or @option{--date-format}
-option for details), @var{string} is the centered header string, and
-@var{page} identifies the page number. The @env{LC_MESSAGES} locale
-category affects the spelling of @var{page}; in the default C locale, it
-is @samp{Page @var{number}} where @var{number} is the decimal page
-number.
-
-Form feeds in the input cause page breaks in the output. Multiple form
-feeds produce empty pages.
-
-Columns are of equal width, separated by an optional string (default
-is @samp{space}). For multicolumn output, lines will always be truncated to
-@var{page_width} (default 72), unless you use the @samp{-J} option. For single
-column output no line truncation occurs by default. Use @samp{-W} option to
-truncate lines in that case.
-
-The following changes were made in version 1.22i and apply to later
-versions of @command{pr}:
-@c FIXME: this whole section here sounds very awkward to me. I
-@c made a few small changes, but really it all needs to be redone. - Brian
-@c OK, I fixed another sentence or two, but some of it I just don't understand.
-@ - Brian
-@itemize @bullet
-
-@item
-Some small @var{letter options} (@samp{-s}, @samp{-w}) have been
-redefined for better @sc{posix} compliance. The output of some further
-cases has been adapted to other Unix systems. These changes are not
-compatible with earlier versions of the program.
-
-@item
-Some @var{new capital letter} options (@samp{-J}, @samp{-S}, @samp{-W})
-have been introduced to turn off unexpected interferences of small letter
-options. The @samp{-N} option and the second argument @var{last_page}
-of @samp{+FIRST_PAGE} offer more flexibility. The detailed handling of
-form feeds set in the input files requires the @samp{-T} option.
-
-@item
-Capital letter options override small letter ones.
-
-@item
-Some of the option-arguments (compare @samp{-s}, @samp{-S}, @samp{-e},
-@samp{-i}, @samp{-n}) cannot be specified as separate arguments from the
-preceding option letter (already stated in the @sc{posix} specification).
-@end itemize
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item +@var{first_page}[:@var{last_page}]
-@itemx --pages=@var{first_page}[:@var{last_page}]
-@opindex +@var{first_page}[:@var{last_page}]
-@opindex --pages
-Begin printing with page @var{first_page} and stop with @var{last_page}.
-Missing @samp{:@var{last_page}} implies end of file. While estimating
-the number of skipped pages each form feed in the input file results
-in a new page. Page counting with and without @samp{+@var{first_page}}
-is identical. By default, counting starts with the first page of input
-file (not first page printed). Line numbering may be altered by @samp{-N}
-option.
-
-@item -@var{column}
-@itemx --columns=@var{column}
-@opindex -@var{column}
-@opindex --columns
-@cindex down columns
-With each single @var{file}, produce @var{column} columns of output
-(default is 1) and print columns down, unless @samp{-a} is used. The
-column width is automatically decreased as @var{column} increases; unless
-you use the @samp{-W/-w} option to increase @var{page_width} as well.
-This option might well cause some lines to be truncated. The number of
-lines in the columns on each page are balanced. The options @samp{-e}
-and @samp{-i} are on for multiple text-column output. Together with
-@samp{-J} option column alignment and line truncation is turned off.
-Lines of full length are joined in a free field format and @samp{-S}
-option may set field separators. @samp{-@var{column}} may not be used
-with @samp{-m} option.
-
-@item -a
-@itemx --across
-@opindex -a
-@opindex --across
-@cindex across columns
-With each single @var{file}, print columns across rather than down. The
-@samp{-@var{column}} option must be given with @var{column} greater than one.
-If a line is too long to fit in a column, it is truncated.
-
-@item -c
-@itemx --show-control-chars
-@opindex -c
-@opindex --show-control-chars
-Print control characters using hat notation (e.g., @samp{^G}); print
-other nonprinting characters in octal backslash notation. By default,
-nonprinting characters are not changed.
-
-@item -d
-@itemx --double-space
-@opindex -d
-@opindex --double-space
-@cindex double spacing
-Double space the output.
-
-@item -D @var{format}
-@itemx --date-format=@var{format}
-@cindex time formats
-@cindex formatting times
-Format header dates using @var{format}, using the same conventions as
-for the the command @samp{date +@var{format}}; @xref{date invocation, ,
-,sh-utils,GNU shell utilities}. Except for directives, which start with
-@samp{%}, characters in @var{format} are printed unchanged. You can use
-this option to specify an arbitrary string in place of the header date,
-e.g., @samp{--date-format="Monday morning"}.
-
-@vindex POSIXLY_CORRECT
-@vindex LC_TIME
-If the @env{POSIXLY_CORRECT} environment variable is not set, the date
-format defaults to @samp{%Y-%m-%d %H:%M} (for example, @samp{2001-12-04
-23:59}); otherwise, the format depends on the @env{LC_TIME} locale
-category, with the default being @samp{%b %e %H:%M %Y} (for example,
-@samp{Dec@ @ 4 23:59 2001}.
-
-@item -e[@var{in-tabchar}[@var{in-tabwidth}]]
-@itemx --expand-tabs[=@var{in-tabchar}[@var{in-tabwidth}]]
-@opindex -e
-@opindex --expand-tabs
-@cindex input tabs
-Expand @var{tab}s to spaces on input. Optional argument @var{in-tabchar} is
-the input tab character (default is the TAB character). Second optional
-argument @var{in-tabwidth} is the input tab character's width (default
-is 8).
-
-@item -f
-@itemx -F
-@itemx --form-feed
-@opindex -F
-@opindex -f
-@opindex --form-feed
-Use a form feed instead of newlines to separate output pages. The default
-page length of 66 lines is not altered. But the number of lines of text
-per page changes from default 56 to 63 lines.
-
-@item -h @var{HEADER}
-@itemx --header=@var{HEADER}
-@opindex -h
-@opindex --header
-Replace the filename in the header with the centered string @var{header}.
-When using the shell, @var{header} should be quoted and should be
-separated from @option{-h} by a space.
-
-@item -i[@var{out-tabchar}[@var{out-tabwidth}]]
-@itemx --output-tabs[=@var{out-tabchar}[@var{out-tabwidth}]]
-@opindex -i
-@opindex --output-tabs
-@cindex output tabs
-Replace spaces with @var{tab}s on output. Optional argument @var{out-tabchar}
-is the output tab character (default is the TAB character). Second optional
-argument @var{out-tabwidth} is the output tab character's width (default
-is 8).
-
-@item -J
-@itemx --join-lines
-@opindex -J
-@opindex --join-lines
-Merge lines of full length. Used together with the column options
-@samp{-@var{column}}, @samp{-a -@var{column}} or @samp{-m}. Turns off
-@samp{-W/-w} line truncation;
-no column alignment used; may be used with @samp{-S[@var{string}]}.
-@samp{-J} has been introduced (together with @samp{-W} and @samp{-S})
-to disentangle the old (@sc{posix}-compliant) options @samp{-w} and
-@samp{-s} along with the three column options.
-
-
-@item -l @var{page_length}
-@itemx --length=@var{page_length}
-@opindex -l
-@opindex --length
-Set the page length to @var{page_length} (default 66) lines, including
-the lines of the header [and the footer]. If @var{page_length} is less
-than or equal to 10 (or <= 3 with @samp{-F}), the header and footer are
-omitted, and all form feeds set in input files are eliminated, as if
-the @samp{-T} option had been given.
-
-@item -m
-@itemx --merge
-@opindex -m
-@opindex --merge
-Merge and print all @var{file}s in parallel, one in each column. If a
-line is too long to fit in a column, it is truncated, unless the @samp{-J}
-option is used. @samp{-S[@var{string}]} may be used. Empty pages in
-some @var{file}s (form feeds set) produce empty columns, still marked
-by @var{string}. The result is a continuous line numbering and column
-marking throughout the whole merged file. Completely empty merged pages
-show no separators or line numbers. The default header becomes
-@samp{@var{date} @var{page}} with spaces inserted in the middle; this
-may be used with the @option{-h} or @option{--header} option to fill up
-the middle blank part.
-
-@item -n[@var{number-separator}[@var{digits}]]
-@itemx --number-lines[=@var{number-separator}[@var{digits}]]
-@opindex -n
-@opindex --number-lines
-Provide @var{digits} digit line numbering (default for @var{digits} is
-5). With multicolumn output the number occupies the first @var{digits}
-column positions of each text column or only each line of @samp{-m}
-output. With single column output the number precedes each line just as
-@samp{-m} does. Default counting of the line numbers starts with the
-first line of the input file (not the first line printed, compare the
-@samp{--page} option and @samp{-N} option).
-Optional argument @var{number-separator} is the character appended to
-the line number to separate it from the text followed. The default
-separator is the TAB character. In a strict sense a TAB is always
-printed with single column output only. The @var{TAB}-width varies
-with the @var{TAB}-position, e.g. with the left @var{margin} specified
-by @samp{-o} option. With multicolumn output priority is given to
-@samp{equal width of output columns} (a @sc{posix} specification).
-The @var{TAB}-width is fixed to the value of the first column and does
-not change with different values of left @var{margin}. That means a
-fixed number of spaces is always printed in the place of the
-@var{number-separator tab}. The tabification depends upon the output
-position.
-
-@item -N @var{line_number}
-@itemx --first-line-number=@var{line_number}
-@opindex -N
-@opindex --first-line-number
-Start line counting with the number @var{line_number} at first line of
-first page printed (in most cases not the first line of the input file).
-
-@item -o @var{margin}
-@itemx --indent=@var{margin}
-@opindex -o
-@opindex --indent
-@cindex indenting lines
-@cindex left margin
-Indent each line with a margin @var{margin} spaces wide (default is zero).
-The total page width is the size of the margin plus the @var{page_width}
-set with the @samp{-W/-w} option. A limited overflow may occur with
-numbered single column output (compare @samp{-n} option).
-
-@item -r
-@itemx --no-file-warnings
-@opindex -r
-@opindex --no-file-warnings
-Do not print a warning message when an argument @var{file} cannot be
-opened. (The exit status will still be nonzero, however.)
-
-@item -s[@var{char}]
-@itemx --separator[=@var{char}]
-@opindex -s
-@opindex --separator
-Separate columns by a single character @var{char}. The default for
-@var{char} is the TAB character without @samp{-w} and @samp{no
-character} with @samp{-w}. Without @samp{-s} the default separator
-@samp{space} is set. @samp{-s[char]} turns off line truncation of all
-three column options (@samp{-COLUMN}|@samp{-a -COLUMN}|@samp{-m}) unless
-@samp{-w} is set. This is a @sc{posix}-compliant formulation.
-
-
-@item -S[@var{string}]
-@itemx --sep-string[=@var{string}]
-@opindex -S
-@opindex --sep-string
-Use @var{string} to separate output columns. The @samp{-S} option doesn't
-affect the @samp{-W/-w} option, unlike the @samp{-s} option which does. It
-does not affect line truncation or column alignment.
-Without @samp{-S}, and with @samp{-J}, @code{pr} uses the default output
-separator, TAB.
-Without @samp{-S} or @samp{-J}, @code{pr} uses a @samp{space}
-(same as @samp{-S" "}).
-Using @samp{-S} with no @var{string} is equivalent to @samp{-S""}.
-Note that for some of @code{pr}'s options the single-letter option
-character must be followed immediately by any corresponding argument;
-there may not be any intervening white space.
-@samp{-S/-s} is one of them. Don't use @samp{-S "STRING"}.
-@sc{posix} requires this.
-
-@item -t
-@itemx --omit-header
-@opindex -t
-@opindex --omit-header
-Do not print the usual header [and footer] on each page, and do not fill
-out the bottom of pages (with blank lines or a form feed). No page
-structure is produced, but form feeds set in the input files are retained.
-The predefined pagination is not changed. @samp{-t} or @samp{-T} may be
-useful together with other options; e.g.: @samp{-t -e4}, expand TAB characters
-in the input file to 4 spaces but don't make any other changes. Use of
-@samp{-t} overrides @samp{-h}.
-
-@item -T
-@itemx --omit-pagination
-@opindex -T
-@opindex --omit-pagination
-Do not print header [and footer]. In addition eliminate all form feeds
-set in the input files.
-
-@item -v
-@itemx --show-nonprinting
-@opindex -v
-@opindex --show-nonprinting
-Print nonprinting characters in octal backslash notation.
-
-@item -w @var{page_width}
-@itemx --width=@var{page_width}
-@opindex -w
-@opindex --width
-Set page width to @var{page_width} characters for multiple text-column
-output only (default for @var{page_width} is 72). @samp{-s[CHAR]} turns
-off the default page width and any line truncation and column alignment.
-Lines of full length are merged, regardless of the column options
-set. No @var{page_width} setting is possible with single column output.
-A @sc{posix}-compliant formulation.
-
-@item -W @var{page_width}
-@itemx --page_width=@var{page_width}
-@opindex -W
-@opindex --page_width
-Set the page width to @var{page_width} characters. That's valid with and
-without a column option. Text lines are truncated, unless @samp{-J}
-is used. Together with one of the three column options
-(@samp{-@var{column}}, @samp{-a -@var{column}} or @samp{-m}) column
-alignment is always used. The separator options @samp{-S} or @samp{-s}
-don't affect the @samp{-W} option. Default is 72 characters. Without
-@samp{-W @var{page_width}} and without any of the column options NO line
-truncation is used (defined to keep downward compatibility and to meet
-most frequent tasks). That's equivalent to @samp{-W 72 -J}. The header
-line is never truncated.
-
-@end table
-
-
-@node fold invocation
-@section @code{fold}: Wrap input lines to fit in specified width
-
-@pindex fold
-@cindex wrapping long input lines
-@cindex folding long input lines
-
-@code{fold} writes each @var{file} (@samp{-} means standard input), or
-standard input if none are given, to standard output, breaking long
-lines. Synopsis:
-
-@example
-fold [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-By default, @code{fold} breaks lines wider than 80 columns. The output
-is split into as many lines as necessary.
-
-@cindex screen columns
-@code{fold} counts screen columns by default; thus, a tab may count more
-than one column, backspace decreases the column count, and carriage
-return sets the column to zero.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -b
-@itemx --bytes
-@opindex -b
-@opindex --bytes
-Count bytes rather than columns, so that tabs, backspaces, and carriage
-returns are each counted as taking up one column, just like other
-characters.
-
-@item -s
-@itemx --spaces
-@opindex -s
-@opindex --spaces
-Break at word boundaries: the line is broken after the last blank before
-the maximum line length. If the line contains no such blanks, the line
-is broken at the maximum line length as usual.
-
-@item -w @var{width}
-@itemx --width=@var{width}
-@opindex -w
-@opindex --width
-Use a maximum line length of @var{width} columns instead of 80.
-
-@end table
-
-
-@node Output of parts of files
-@chapter Output of parts of files
-
-@cindex output of parts of files
-@cindex parts of files, output of
-
-These commands output pieces of the input.
-
-@menu
-* head invocation:: Output the first part of files.
-* tail invocation:: Output the last part of files.
-* split invocation:: Split a file into fixed-size pieces.
-* csplit invocation:: Split a file into context-determined pieces.
-@end menu
-
-@node head invocation
-@section @code{head}: Output the first part of files
-
-@pindex head
-@cindex initial part of files, outputting
-@cindex first part of files, outputting
-
-@code{head} prints the first part (10 lines by default) of each
-@var{file}; it reads from standard input if no files are given or
-when given a @var{file} of @samp{-}. Synopses:
-
-@example
-head [@var{option}]@dots{} [@var{file}]@dots{}
-head -@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-If more than one @var{file} is specified, @code{head} prints a
-one-line header consisting of
-@example
-==> @var{file name} <==
-@end example
-@noindent
-before the output for each @var{file}.
-
-@code{head} accepts two option formats: the new one, in which numbers
-are arguments to the options (@samp{-q -n 1}), and the old one, in which
-the number precedes any option letters (@samp{-1q}).
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -@var{count}@var{options}
-@opindex -@var{count}
-This option is only recognized if it is specified first. @var{count} is
-a decimal number optionally followed by a size letter (@samp{b},
-@samp{k}, @samp{m}) as in @code{-c}, or @samp{l} to mean count by lines,
-or other option letters (@samp{cqv}).
-
-@item -c @var{bytes}
-@itemx --bytes=@var{bytes}
-@opindex -c
-@opindex --bytes
-Print the first @var{bytes} bytes, instead of initial lines. Appending
-@samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and @samp{m}
-by 1048576.
-
-@itemx -n @var{n}
-@itemx --lines=@var{n}
-@opindex -n
-@opindex --lines
-Output the first @var{n} lines.
-
-@item -q
-@itemx --quiet
-@itemx --silent
-@opindex -q
-@opindex --quiet
-@opindex --silent
-Never print file name headers.
-
-@item -v
-@itemx --verbose
-@opindex -v
-@opindex --verbose
-Always print file name headers.
-
-@end table
-
-
-@node tail invocation
-@section @code{tail}: Output the last part of files
-
-@pindex tail
-@cindex last part of files, outputting
-
-@code{tail} prints the last part (10 lines by default) of each
-@var{file}; it reads from standard input if no files are given or
-when given a @var{file} of @samp{-}. Synopses:
-
-@example
-tail [@var{option}]@dots{} [@var{file}]@dots{}
-tail -@var{number} [@var{option}]@dots{} [@var{file}]@dots{}
-tail +@var{number} [@var{option}]@dots{} [@var{file}]@dots{} # obsolescent
-@end example
-
-If more than one @var{file} is specified, @code{tail} prints a
-one-line header consisting of
-@example
-==> @var{file name} <==
-@end example
-@noindent
-before the output for each @var{file}.
-
-@cindex BSD @code{tail}
-@sc{gnu} @code{tail} can output any amount of data (some other versions of
-@code{tail} cannot). It also has no @samp{-r} option (print in
-reverse), since reversing a file is really a different job from printing
-the end of a file; BSD @code{tail} (which is the one with @code{-r}) can
-only reverse files that are at most as large as its buffer, which is
-typically 32k. A more reliable and versatile way to reverse files is
-the @sc{gnu} @code{tac} command.
-
-@code{tail} accepts two option formats: the new one, in which numbers
-are arguments to the options (@samp{-n 1}), and the obsolescent one, in
-which the number precedes any option letters (@samp{-1} or @samp{+1}).
-Warning: support for the @samp{+1} form will be withdrawn, as future
-versions of @sc{posix} will not allow it.
-
-If any option-argument is a number @var{n} starting with a @samp{+},
-@code{tail} begins printing with the @var{n}th item from the start of
-each file, instead of from the end.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -@var{count}
-@itemx +@var{count}
-@opindex -@var{count}
-@opindex +@var{count}
-This option is only recognized if it is specified first. @var{count} is
-a decimal number optionally followed by a size letter (@samp{b},
-@samp{k}, @samp{m}) as in @code{-c}, or @samp{l} to mean count by lines,
-or other option letters (@samp{cfqv}).
-
-Warning: the @samp{+@var{count}} usage is obsolescent. Future versions
-of @sc{posix} will require that support for it be withdrawn. Use
-@samp{-n +@var{count}} instead.
-
-@item -c @var{bytes}
-@itemx --bytes=@var{bytes}
-@opindex -c
-@opindex --bytes
-Output the last @var{bytes} bytes, instead of final lines. Appending
-@samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and @samp{m}
-by 1048576.
-
-@item -f
-@itemx --follow[=@var{how}]
-@opindex -f
-@opindex --follow
-@cindex growing files
-@vindex name @r{follow option}
-@vindex descriptor @r{follow option}
-Loop forever trying to read more characters at the end of the file,
-presumably because the file is growing. This option is ignored when
-reading from a pipe.
-If more than one file is given, @code{tail} prints a header whenever it
-gets output from a different file, to indicate which file that output is
-from.
-
-There are two ways to specify how you'd like to track files with this option,
-but that difference is noticeable only when a followed file is removed or
-renamed.
-If you'd like to continue to track the end of a growing file even after
-it has been unlinked, use @samp{--follow=descriptor}. This is the default
-behavior, but it is not useful if you're tracking a log file that may be
-rotated (removed or renamed, then reopened). In that case, use
-@samp{--follow=name} to track the named file by reopening it periodically
-to see if it has been removed and recreated by some other program.
-
-No matter which method you use, if the tracked file is determined to have
-shrunk, @code{tail} prints a message saying the file has been truncated
-and resumes tracking the end of the file from the newly-determined endpoint.
-
-When a file is removed, @code{tail}'s behavior depends on whether it is
-following the name or the descriptor. When following by name, tail can
-detect that a file has been removed and gives a message to that effect,
-and if @samp{--retry} has been specified it will continue checking
-periodically to see if the file reappears.
-When following a descriptor, tail does not detect that the file has
-been unlinked or renamed and issues no message; even though the file
-may no longer be accessible via its original name, it may still be
-growing.
-
-The option values @samp{descriptor} and @samp{name} may be specified only
-with the long form of the option, not with @samp{-f}.
-
-@itemx --retry
-@opindex --retry
-This option is meaningful only when following by name.
-Without this option, when tail encounters a file that doesn't
-exist or is otherwise inaccessible, it reports that fact and
-never checks it again.
-
-@itemx --sleep-interval=@var{n}
-@opindex --sleep-interval
-Change the number of seconds to wait between iterations (the default is 1).
-During one iteration, every specified file is checked to see if it has
-changed size.
-
-@itemx --pid=@var{pid}
-@opindex --pid
-When following by name or by descriptor, you may specify the process ID,
-@var{pid}, of the sole writer of all @var{file} arguments. Then, shortly
-after that process terminates, tail will also terminate. This will
-work properly only if the writer and the tailing process are running on
-the same machine. For example, to save the output of a build in a file
-and to watch the file grow, if you invoke @code{make} and @code{tail}
-like this then the tail process will stop when your build completes.
-Without this option, you would have had to kill the @code{tail -f}
-process yourself.
-@example
-$ make >& makerr & tail --pid=$! -f makerr
-@end example
-If you specify a @var{pid} that is not in use or that does not correspond
-to the process that is writing to the tailed files, then @code{tail}
-may terminate long before any @var{file}s stop growing or it may not
-terminate until long after the real writer has terminated.
-Note that @samp{--pid} cannot be supported on some systems; @code{tail}
-will print a warning if this is the case.
-
-@itemx --max-unchanged-stats=@var{n}
-@opindex --max-unchanged-stats
-When tailing a file by name, if there have been @var{n} (default
-N=@value{DEFAULT_MAX_N_UNCHANGED_STATS_BETWEEN_OPENS}) consecutive
-iterations for which the size has remained the same, then
-@code{open}/@code{fstat} the file to determine if that file name is
-still associated with the same device/inode-number pair as before.
-When following a log file that is rotated, this is approximately the
-number of seconds between when tail prints the last pre-rotation lines
-and when it prints the lines that have accumulated in the new log file.
-This option is meaningful only when following by name.
-
-@itemx -n @var{n}
-@itemx --lines=@var{n}
-@opindex -n
-@opindex --lines
-Output the last @var{n} lines.
-
-@item -q
-@itemx -quiet
-@itemx --silent
-@opindex -q
-@opindex --quiet
-@opindex --silent
-Never print file name headers.
-
-@item -v
-@itemx --verbose
-@opindex -v
-@opindex --verbose
-Always print file name headers.
-
-@end table
-
-
-@node split invocation
-@section @code{split}: Split a file into fixed-size pieces
-
-@pindex split
-@cindex splitting a file into pieces
-@cindex pieces, splitting a file into
-
-@code{split} creates output files containing consecutive sections of
-@var{input} (standard input if none is given or @var{input} is
-@samp{-}). Synopsis:
-
-@example
-split [@var{option}] [@var{input} [@var{prefix}]]
-@end example
-
-By default, @code{split} puts 1000 lines of @var{input} (or whatever is
-left over for the last section), into each output file.
-
-@cindex output file name prefix
-The output files' names consist of @var{prefix} (@samp{x} by default)
-followed by a group of letters @samp{aa}, @samp{ab}, and so on, such
-that concatenating the output files in sorted order by file name produces
-the original input file. (If more than 676 output files are required,
-@code{split} uses @samp{zaa}, @samp{zab}, etc.)
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -@var{lines}
-@itemx -l @var{lines}
-@itemx --lines=@var{lines}
-@opindex -l
-@opindex --lines
-Put @var{lines} lines of @var{input} into each output file.
-
-@item -b @var{bytes}
-@itemx --bytes=@var{bytes}
-@opindex -b
-@opindex --bytes
-Put the first @var{bytes} bytes of @var{input} into each output file.
-Appending @samp{b} multiplies @var{bytes} by 512, @samp{k} by 1024, and
-@samp{m} by 1048576.
-
-@item -C @var{bytes}
-@itemx --line-bytes=@var{bytes}
-@opindex -C
-@opindex --line-bytes
-Put into each output file as many complete lines of @var{input} as
-possible without exceeding @var{bytes} bytes. For lines longer than
-@var{bytes} bytes, put @var{bytes} bytes into each output file until
-less than @var{bytes} bytes of the line are left, then continue
-normally. @var{bytes} has the same format as for the @samp{--bytes}
-option.
-
-@itemx --verbose
-@opindex --verbose
-Write a diagnostic to standard error just before each output file is opened.
-
-@end table
-
-
-@node csplit invocation
-@section @code{csplit}: Split a file into context-determined pieces
-
-@pindex csplit
-@cindex context splitting
-@cindex splitting a file into pieces by context
-
-@code{csplit} creates zero or more output files containing sections of
-@var{input} (standard input if @var{input} is @samp{-}). Synopsis:
-
-@example
-csplit [@var{option}]@dots{} @var{input} @var{pattern}@dots{}
-@end example
-
-The contents of the output files are determined by the @var{pattern}
-arguments, as detailed below. An error occurs if a @var{pattern}
-argument refers to a nonexistent line of the input file (e.g., if no
-remaining line matches a given regular expression). After every
-@var{pattern} has been matched, any remaining input is copied into one
-last output file.
-
-By default, @code{csplit} prints the number of bytes written to each
-output file after it has been created.
-
-The types of pattern arguments are:
-
-@table @samp
-
-@item @var{n}
-Create an output file containing the input up to but not including line
-@var{n} (a positive integer). If followed by a repeat count, also
-create an output file containing the next @var{line} lines of the input
-file once for each repeat.
-
-@item /@var{regexp}/[@var{offset}]
-Create an output file containing the current line up to (but not
-including) the next line of the input file that contains a match for
-@var{regexp}. The optional @var{offset} is a @samp{+} or @samp{-}
-followed by a positive integer. If it is given, the input up to the
-matching line plus or minus @var{offset} is put into the output file,
-and the line after that begins the next section of input.
-
-@item %@var{regexp}%[@var{offset}]
-Like the previous type, except that it does not create an output
-file, so that section of the input file is effectively ignored.
-
-@item @{@var{repeat-count}@}
-Repeat the previous pattern @var{repeat-count} additional
-times. @var{repeat-count} can either be a positive integer or an
-asterisk, meaning repeat as many times as necessary until the input is
-exhausted.
-
-@end table
-
-The output files' names consist of a prefix (@samp{xx} by default)
-followed by a suffix. By default, the suffix is an ascending sequence
-of two-digit decimal numbers from @samp{00} to @samp{99}. In any case,
-concatenating the output files in sorted order by filename produces the
-original input file.
-
-By default, if @code{csplit} encounters an error or receives a hangup,
-interrupt, quit, or terminate signal, it removes any output files
-that it has created so far before it exits.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -f @var{prefix}
-@itemx --prefix=@var{prefix}
-@opindex -f
-@opindex --prefix
-@cindex output file name prefix
-Use @var{prefix} as the output file name prefix.
-
-@item -b @var{suffix}
-@itemx --suffix=@var{suffix}
-@opindex -b
-@opindex --suffix
-@cindex output file name suffix
-Use @var{suffix} as the output file name suffix. When this option is
-specified, the suffix string must include exactly one
-@code{printf(3)}-style conversion specification, possibly including
-format specification flags, a field width, a precision specifications,
-or all of these kinds of modifiers. The format letter must convert a
-binary integer argument to readable form; thus, only @samp{d}, @samp{i},
-@samp{u}, @samp{o}, @samp{x}, and @samp{X} conversions are allowed. The
-entire @var{suffix} is given (with the current output file number) to
-@code{sprintf(3)} to form the file name suffixes for each of the
-individual output files in turn. If this option is used, the
-@samp{--digits} option is ignored.
-
-@item -n @var{digits}
-@itemx --digits=@var{digits}
-@opindex -n
-@opindex --digits
-Use output file names containing numbers that are @var{digits} digits
-long instead of the default 2.
-
-@item -k
-@itemx --keep-files
-@opindex -k
-@opindex --keep-files
-Do not remove output files when errors are encountered.
-
-@item -z
-@itemx --elide-empty-files
-@opindex -z
-@opindex --elide-empty-files
-Suppress the generation of zero-length output files. (In cases where
-the section delimiters of the input file are supposed to mark the first
-lines of each of the sections, the first output file will generally be a
-zero-length file unless you use this option.) The output file sequence
-numbers always run consecutively starting from 0, even when this option
-is specified.
-
-@item -s
-@itemx -q
-@itemx --silent
-@itemx --quiet
-@opindex -s
-@opindex -q
-@opindex --silent
-@opindex --quiet
-Do not print counts of output file sizes.
-
-@end table
-
-
-@node Summarizing files
-@chapter Summarizing files
-
-@cindex summarizing files
-
-These commands generate just a few numbers representing entire
-contents of files.
-
-@menu
-* wc invocation:: Print byte, word, and line counts.
-* sum invocation:: Print checksum and block counts.
-* cksum invocation:: Print CRC checksum and byte counts.
-* md5sum invocation:: Print or check message-digests.
-@end menu
-
-
-@node wc invocation
-@section @code{wc}: Print byte, word, and line counts
-
-@pindex wc
-@cindex byte count
-@cindex character count
-@cindex word count
-@cindex line count
-
-@code{wc} counts the number of bytes, characters, whitespace-separated
-words, and newlines in each given @var{file}, or standard input if none
-are given or for a @var{file} of @samp{-}. Synopsis:
-
-@example
-wc [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@cindex total counts
-@vindex POSIXLY_CORRECT
-@code{wc} prints one line of counts for each file, and if the file was
-given as an argument, it prints the file name following the counts. If
-more than one @var{file} is given, @code{wc} prints a final line
-containing the cumulative counts, with the file name @file{total}. The
-counts are printed in this order: newlines, words, characters, bytes.
-By default, each count is output right-justified in a 7-byte field with
-one space between fields so that the numbers and file names line up nicely
-in columns. However, @sc{posix} requires that there be exactly one space
-separating columns. You can make @code{wc} use the @sc{posix}-mandated
-output format by setting the @env{POSIXLY_CORRECT} environment variable.
-
-By default, @code{wc} prints three counts: the newline, words, and byte
-counts. Options can specify that only certain counts be printed.
-Options do not undo others previously given, so
-
-@example
-wc --bytes --words
-@end example
-
-@noindent
-prints both the byte counts and the word counts.
-
-With the @code{--max-line-length} option, @code{wc} prints the length
-of the longest line per file, and if there is more than one file it
-prints the maximum (not the sum) of those lengths.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -c
-@itemx --bytes
-@opindex -c
-@opindex --bytes
-Print only the byte counts.
-
-@item -m
-@itemx --chars
-@opindex -m
-@opindex --chars
-Print only the character counts.
-
-@item -w
-@itemx --words
-@opindex -w
-@opindex --words
-Print only the word counts.
-
-@item -l
-@itemx --lines
-@opindex -l
-@opindex --lines
-Print only the newline counts.
-
-@item -L
-@itemx --max-line-length
-@opindex -L
-@opindex --max-line-length
-Print only the maximum line lengths.
-
-@end table
-
-
-@node sum invocation
-@section @code{sum}: Print checksum and block counts
-
-@pindex sum
-@cindex 16-bit checksum
-@cindex checksum, 16-bit
-
-@code{sum} computes a 16-bit checksum for each given @var{file}, or
-standard input if none are given or for a @var{file} of @samp{-}. Synopsis:
-
-@example
-sum [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@code{sum} prints the checksum for each @var{file} followed by the
-number of blocks in the file (rounded up). If more than one @var{file}
-is given, file names are also printed (by default). (With the
-@samp{--sysv} option, corresponding file names are printed when there is
-at least one file argument.)
-
-By default, @sc{gnu} @code{sum} computes checksums using an algorithm
-compatible with BSD @code{sum} and prints file sizes in units of
-1024-byte blocks.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -r
-@opindex -r
-@cindex BSD @code{sum}
-Use the default (BSD compatible) algorithm. This option is included for
-compatibility with the System V @code{sum}. Unless @samp{-s} was also
-given, it has no effect.
-
-@item -s
-@itemx --sysv
-@opindex -s
-@opindex --sysv
-@cindex System V @code{sum}
-Compute checksums using an algorithm compatible with System V
-@code{sum}'s default, and print file sizes in units of 512-byte blocks.
-
-@end table
-
-@code{sum} is provided for compatibility; the @code{cksum} program (see
-next section) is preferable in new applications.
-
-
-@node cksum invocation
-@section @code{cksum}: Print CRC checksum and byte counts
-
-@pindex cksum
-@cindex cyclic redundancy check
-@cindex CRC checksum
-
-@code{cksum} computes a cyclic redundancy check (CRC) checksum for each
-given @var{file}, or standard input if none are given or for a
-@var{file} of @samp{-}. Synopsis:
-
-@example
-cksum [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@code{cksum} prints the CRC checksum for each file along with the number
-of bytes in the file, and the filename unless no arguments were given.
-
-@code{cksum} is typically used to ensure that files
-transferred by unreliable means (e.g., netnews) have not been corrupted,
-by comparing the @code{cksum} output for the received files with the
-@code{cksum} output for the original files (typically given in the
-distribution).
-
-The CRC algorithm is specified by the @sc{posix.2} standard. It is not
-compatible with the BSD or System V @code{sum} algorithms (see the
-previous section); it is more robust.
-
-The only options are @samp{--help} and @samp{--version}. @xref{Common
-options}.
-
-
-@node md5sum invocation
-@section @code{md5sum}: Print or check message-digests
-
-@pindex md5sum
-@cindex 128-bit checksum
-@cindex checksum, 128-bit
-@cindex fingerprint, 128-bit
-@cindex message-digest, 128-bit
-
-@code{md5sum} computes a 128-bit checksum (or @dfn{fingerprint} or
-@dfn{message-digest}) for each specified @var{file}.
-If a @var{file} is specified as @samp{-} or if no files are given
-@code{md5sum} computes the checksum for the standard input.
-@code{md5sum} can also determine whether a file and checksum are
-consistent. Synopses:
-
-@example
-md5sum [@var{option}]@dots{} [@var{file}]@dots{}
-md5sum [@var{option}]@dots{} --check [@var{file}]
-@end example
-
-For each @var{file}, @samp{md5sum} outputs the MD5 checksum, a flag
-indicating a binary or text input file, and the filename.
-If @var{file} is omitted or specified as @samp{-}, standard input is read.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -b
-@itemx --binary
-@opindex -b
-@opindex --binary
-@cindex binary input files
-Treat all input files as binary. This option has no effect on Unix
-systems, since they don't distinguish between binary and text files.
-This option is useful on systems that have different internal and
-external character representations. On MS-DOS and MS-Windows, this is
-the default.
-
-@item -c
-@itemx --check
-Read filenames and checksum information from the single @var{file}
-(or from stdin if no @var{file} was specified) and report whether
-each named file and the corresponding checksum data are consistent.
-The input to this mode of @code{md5sum} is usually the output of
-a prior, checksum-generating run of @samp{md5sum}.
-Each valid line of input consists of an MD5 checksum, a binary/text
-flag, and then a filename.
-Binary files are marked with @samp{*}, text with @samp{ }.
-For each such line, @code{md5sum} reads the named file and computes its
-MD5 checksum. Then, if the computed message digest does not match the
-one on the line with the filename, the file is noted as having
-failed the test. Otherwise, the file passes the test.
-By default, for each valid line, one line is written to standard
-output indicating whether the named file passed the test.
-After all checks have been performed, if there were any failures,
-a warning is issued to standard error.
-Use the @samp{--status} option to inhibit that output.
-If any listed file cannot be opened or read, if any valid line has
-an MD5 checksum inconsistent with the associated file, or if no valid
-line is found, @code{md5sum} exits with nonzero status. Otherwise,
-it exits successfully.
-
-@itemx --status
-@opindex --status
-@cindex verifying MD5 checksums
-This option is useful only when verifying checksums.
-When verifying checksums, don't generate the default one-line-per-file
-diagnostic and don't output the warning summarizing any failures.
-Failures to open or read a file still evoke individual diagnostics to
-standard error.
-If all listed files are readable and are consistent with the associated
-MD5 checksums, exit successfully. Otherwise exit with a status code
-indicating there was a failure.
-
-@item -t
-@itemx --text
-@opindex -t
-@opindex --text
-@cindex text input files
-Treat all input files as text files. This is the reverse of
-@samp{--binary}.
-
-@item -w
-@itemx --warn
-@opindex -w
-@opindex --warn
-@cindex verifying MD5 checksums
-When verifying checksums, warn about improperly formatted MD5 checksum lines.
-This option is useful only if all but a few lines in the checked input
-are valid.
-
-@end table
-
-
-@node Operating on sorted files
-@chapter Operating on sorted files
-
-@cindex operating on sorted files
-@cindex sorted files, operations on
-
-These commands work with (or produce) sorted files.
-
-@menu
-* sort invocation:: Sort text files.
-* uniq invocation:: Uniquify files.
-* comm invocation:: Compare two sorted files line by line.
-* ptx invocation:: Produce a permuted index of file contents.
-* tsort invocation:: Topological sort.
-@end menu
-
-
-@node sort invocation
-@section @code{sort}: Sort text files
-
-@pindex sort
-@cindex sorting files
-
-@code{sort} sorts, merges, or compares all the lines from the given
-files, or standard input if none are given or for a @var{file} of
-@samp{-}. By default, @code{sort} writes the results to standard
-output. Synopsis:
-
-@example
-sort [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-@code{sort} has three modes of operation: sort (the default), merge,
-and check for sortedness. The following options change the operation
-mode:
-
-@table @samp
-
-@item -c
-@itemx --check
-@opindex -c
-@opindex --check
-@cindex checking for sortedness
-Check whether the given files are already sorted: if they are not all
-sorted, print an error message and exit with a status of 1.
-Otherwise, exit successfully.
-
-@item -m
-@itemx --merge
-@opindex -m
-@opindex --merge
-@cindex merging sorted files
-Merge the given files by sorting them as a group. Each input file must
-always be individually sorted. It always works to sort instead of
-merge; merging is provided because it is faster, in the case where it
-works.
-
-@end table
-
-@vindex LC_COLLATE
-A pair of lines is compared as follows: if any key fields have been
-specified, @code{sort} compares each pair of fields, in the order
-specified on the command line, according to the associated ordering
-options, until a difference is found or no fields are left.
-Unless otherwise specified, all comparisons use the character
-collating sequence specified by the @env{LC_COLLATE} locale.
-
-If any of the global options @samp{bdfgiMnr} are given but no key fields
-are specified, @code{sort} compares the entire lines according to the
-global options.
-
-Finally, as a last resort when all keys compare equal (or if no
-ordering options were specified at all), @code{sort} compares the entire
-lines. The last resort comparison
-honors the @option{--reverse} (@option{-r}) global option.
-The @option{--stable} (@option{-s}) option
-disables this last-resort comparison so that lines in which all fields
-compare equal are left in their original relative order. If no fields
-or global options are specified, @option{--stable} (@option{-s}) has no
-effect.
-
-@sc{gnu} @code{sort} (as specified for all @sc{gnu} utilities) has no limits on
-input line length or restrictions on bytes allowed within lines. In
-addition, if the final byte of an input file is not a newline, @sc{gnu}
-@code{sort} silently supplies one. A line's trailing newline is not
-part of the line for comparison purposes.@footnote{@sc{posix}.2-1992
-requires that the trailing newline be part of the comparison, and some
-@code{sort} implementations obey this requirement, but it is widely
-considered to be a bug in the standard and the next version of
-@sc{posix}.2 will likely remove this requirement.}
-
-Upon any error, @code{sort} exits with a status of @samp{2}.
-
-@vindex TMPDIR
-If the environment variable @env{TMPDIR} is set, @code{sort} uses its
-value as the directory for temporary files instead of @file{/tmp}. The
-@option{--temporary-directory} (@option{-T}) option in turn overrides
-the environment variable.
-
-The following options affect the ordering of output lines. They may be
-specified globally or as part of a specific key field. If no key
-fields are specified, global options apply to comparison of entire
-lines; otherwise the global options are inherited by key fields that do
-not specify any special options of their own. In pre-@sc{posix}
-versions of @command{sort}, global options affect only later key fields,
-so portable shell scripts should specify global options first.
-
-@table @samp
-
-@item -b
-@itemx --ignore-leading-blanks
-@opindex -b
-@opindex --ignore-leading-blanks
-@cindex blanks, ignoring leading
-@vindex LC_CTYPE
-Ignore leading blanks when finding sort keys in each line.
-The @env{LC_CTYPE} locale determines character types.
-
-@item -d
-@itemx --dictionary-order
-@opindex -d
-@opindex --dictionary-order
-@cindex dictionary order
-@cindex phone directory order
-@cindex telephone directory order
-@vindex LC_CTYPE
-Sort in @dfn{phone directory} order: ignore all characters except
-letters, digits and blanks when sorting.
-The @env{LC_CTYPE} locale determines character types.
-
-@item -f
-@itemx --ignore-case
-@opindex -f
-@opindex --ignore-case
-@cindex ignoring case
-@cindex case folding
-@vindex LC_CTYPE
-Fold lowercase characters into the equivalent uppercase characters when
-comparing so that, for example, @samp{b} and @samp{B} sort as equal.
-The @env{LC_CTYPE} locale determines character types.
-
-@item -g
-@itemx --general-numeric-sort
-@opindex -g
-@opindex --general-numeric-sort
-@cindex general numeric sort
-@vindex LC_NUMERIC
-Sort numerically, using the standard C function @code{strtod} to convert
-a prefix of each line to a double-precision floating point number.
-This allows floating point numbers to be specified in scientific notation,
-like @code{1.0e-34} and @code{10e100}.
-The @env{LC_NUMERIC} locale determines the decimal-point character.
-Do not report overflow, underflow, or conversion errors.
-Use the following collating sequence:
-
-@itemize @bullet
-@item
-Lines that do not start with numbers (all considered to be equal).
-@item
-NaNs (``Not a Number'' values, in IEEE floating point arithmetic)
-in a consistent but machine-dependent order.
-@item
-Minus infinity.
-@item
-Finite numbers in ascending numeric order (with @math{-0} and @math{+0} equal).
-@item
-Plus infinity.
-@end itemize
-
-Use this option only if there is no alternative; it is much slower than
-@option{--numeric-sort} (@option{-n}) and it can lose information when
-converting to floating point.
-
-@item -i
-@itemx --ignore-nonprinting
-@opindex -i
-@opindex --ignore-nonprinting
-@cindex nonprinting characters, ignoring
-@cindex unprintable characters, ignoring
-@vindex LC_CTYPE
-Ignore nonprinting characters.
-The @env{LC_CTYPE} locale determines character types.
-
-@item -M
-@itemx --month-sort
-@opindex -M
-@opindex --month-sort
-@cindex months, sorting by
-@vindex LC_TIME
-An initial string, consisting of any amount of whitespace, followed
-by a month name abbreviation, is folded to UPPER case and
-compared in the order @samp{JAN} < @samp{FEB} < @dots{} < @samp{DEC}.
-Invalid names compare low to valid names. The @env{LC_TIME} locale
-determines the month spellings.
-
-@item -n
-@itemx --numeric-sort
-@opindex -n
-@opindex --numeric-sort
-@cindex numeric sort
-@vindex LC_NUMERIC
-Sort numerically: the number begins each line; specifically, it consists
-of optional whitespace, an optional @samp{-} sign, and zero or more
-digits possibly separated by thousands separators, optionally followed
-by a decimal-point character and zero or more digits. The @env{LC_NUMERIC}
-locale specifies the decimal-point character and thousands separator.
-
-Numeric sort uses what might be considered an unconventional method
-to compare strings representing floating point numbers. Rather than
-first converting each string to the C @code{double} type and then
-comparing those values, @command{sort} aligns the decimal-point
-characters in the two
-strings and compares the strings a character at a time. One benefit
-of using this approach is its speed. In practice this is much more
-efficient than performing the two corresponding string-to-double (or even
-string-to-integer) conversions and then comparing doubles. In addition,
-there is no corresponding loss of precision. Converting each string to
-@code{double} before comparison would limit precision to about 16 digits
-on most systems.
-
-Neither a leading @samp{+} nor exponential notation is recognized.
-To compare such strings numerically, use the
-@option{--general-numeric-sort} (@option{-g}) option.
-
-@item -r
-@itemx --reverse
-@opindex -r
-@opindex --reverse
-@cindex reverse sorting
-Reverse the result of comparison, so that lines with greater key values
-appear earlier in the output instead of later.
-
-@end table
-
-Other options are:
-
-@table @samp
-
-@item -o @var{output-file}
-@itemx --output=@var{output-file}
-@opindex -o
-@opindex --output
-@cindex overwriting of input, allowed
-Write output to @var{output-file} instead of standard output.
-If necessary, @command{sort} reads input before opening
-@var{output-file}, so you can safely sort a file in place by using
-commands like @code{sort -o F F} and @code{cat F | sort -o F}.
-
-@vindex POSIXLY_CORRECT
-If @option{-c} is not also specified, @option{-o} may appear after an
-input file even if @env{POSIXLY_CORRECT} is set, e.g., @samp{sort F -o
-F}. Warning: this usage is obsolescent. Future versions of @sc{posix}
-will require that support for it be withdrawn. Portable scripts should
-specify @samp{-o @var{output-file}} before any input files.
-
-@item -S @var{size}
-@itemx --buffer-size=@var{size}
-@opindex -S
-@opindex --buffer-size
-@cindex size for main memory sorting
-Use a main-memory sort buffer of the given @var{size}. By default,
-@var{size} is in units of 1,024 bytes. Appending @samp{%} causes
-@var{size} to be interpreted as a percentage of physical memory.
-Appending @samp{k} multiplies @var{size} by 1,024 (the default),
-@samp{M} by 1,048,576, @samp{G} by 1,073,741,824, and so on for
-@samp{T}, @samp{P}, @samp{E}, @samp{Z}, and @samp{Y}. Appending
-@samp{b} causes @var{size} to be interpreted as a byte count, with no
-multiplication.
-
-This option can improve the performance of @command{sort} by causing it
-to start with a larger or smaller sort buffer than the default.
-However, this option affects only the initial buffer size. The buffer
-grows beyond @var{size} if @command{sort} encounters input lines larger
-than @var{size}.
-
-@item -t @var{separator}
-@itemx --field-separator=@var{separator}
-@opindex -t
-@opindex --field-separator
-@cindex field separator character
-Use character @var{separator} as the field separator when finding the
-sort keys in each line. By default, fields are separated by the empty
-string between a non-whitespace character and a whitespace character.
-That is, given the input line @w{@samp{ foo bar}}, @code{sort} breaks it
-into fields @w{@samp{ foo}} and @w{@samp{ bar}}. The field separator is
-not considered to be part of either the field preceding or the field
-following. But note that sort fields that extend to the end of the line,
-as @samp{-k 2}, or sort fields consisting of a range, as @samp{-k 2,3},
-retain the field separators present between the endpoints of the range.
-
-@item -T @var{tempdir}
-@itemx --temporary-directory=@var{tempdir}
-@opindex -T
-@opindex --temporary-directory
-@cindex temporary directory
-@vindex TMPDIR
-Use directory @var{tempdir} to store temporary files, overriding the
-@env{TMPDIR} environment variable. If this option is given more than
-once, temporary files are stored in all the directories given. If you
-have a large sort or merge that is I/O-bound, you can often improve
-performance by using this option to specify directories on different
-disks and controllers.
-
-@item -u
-@itemx --unique
-@opindex -u
-@opindex --unique
-@cindex uniquifying output
-
-Normally, output only the first of a sequence of lines that compare
-equal. For the @option{--check} (@option{-c}) option,
-check that no pair of consecutive lines compares equal.
-
-@item -k @var{pos1}[,@var{pos2}]
-@itemx --key=@var{pos1}[,@var{pos2}]
-@opindex -k
-@opindex --key
-@cindex sort field
-Specify a sort field that
-consists of the part of the line between @var{pos1} and @var{pos2} (or the
-end of the line, if @var{pos2} is omitted), @emph{inclusive}.
-Fields and character positions are numbered starting with 1.
-So to sort on the second field, you'd use @samp{--key=2,2} (@samp{-k 2,2}).
-See below for more examples.
-
-@item -z
-@itemx --zero-terminated
-@opindex -z
-@opindex --zero-terminated
-@cindex sort zero-terminated lines
-Treat the input as a set of lines, each terminated by a zero byte (@sc{ascii}
-@sc{nul} (Null) character) instead of an @sc{ascii} @sc{lf} (Line Feed).
-This option can be useful in conjunction with @samp{perl -0} or
-@samp{find -print0} and @samp{xargs -0} which do the same in order to
-reliably handle arbitrary pathnames (even those which contain Line Feed
-characters.)
-
-@item +@var{pos1} [-@var{pos2}]
-The obsolescent, traditional option for specifying a sort field. The field
-consists of the line between @var{pos1} and up to but @emph{not including}
-@var{pos2} (or the end of the line if @var{pos2} is omitted). Fields
-and character positions are numbered starting with 0. See below.
-
-Warning: the @samp{+@var{pos1}} usage is obsolescent. Future versions of
-@sc{posix} will require that support for it be withdrawn. Use
-@option{--key} (@option{-k}) instead.
-
-@end table
-
-Historical (BSD and System V) implementations of @code{sort} have
-differed in their interpretation of some options, particularly
-@samp{-b}, @samp{-f}, and @samp{-n}. @sc{gnu} sort follows the @sc{posix}
-behavior, which is usually (but not always!) like the System V behavior.
-According to @sc{posix}, @samp{-n} no longer implies @samp{-b}. For
-consistency, @samp{-M} has been changed in the same way. This may
-affect the meaning of character positions in field specifications in
-obscure cases. The only fix is to add an explicit @samp{-b}.
-
-A position in a sort field specified with the @samp{-k} or @samp{+}
-option has the form @samp{@var{f}.@var{c}}, where @var{f} is the number
-of the field to use and @var{c} is the number of the first character
-from the beginning of the field (for @samp{+@var{pos}}) or from the end
-of the previous field (for @samp{-@var{pos}}). If the @samp{.@var{c}}
-is omitted, it is taken to be the first character in the field. If the
-@samp{-b} option was specified, the @samp{.@var{c}} part of a field
-specification is counted from the first nonblank character of the field
-(for @samp{+@var{pos}}) or from the first nonblank character following
-the previous field (for @samp{-@var{pos}}).
-
-A sort key option may also have any of the option letters @samp{Mbdfinr}
-appended to it, in which case the global ordering options are not used
-for that particular field. The @samp{-b} option may be independently
-attached to either or both of the @samp{+@var{pos}} and
-@samp{-@var{pos}} parts of a field specification, and if it is inherited
-from the global options it will be attached to both.
-Keys may span multiple fields.
-
-Here are some examples to illustrate various combinations of options.
-In them, the @sc{posix} @samp{-k} option is used to specify sort keys rather
-than the obsolescent @samp{+@var{pos1}-@var{pos2}} syntax.
-
-@itemize @bullet
-
-@item
-Sort in descending (reverse) numeric order.
-
-@example
-sort -nr
-@end example
-
-@item
-Sort alphabetically, omitting the first and second fields.
-This uses a single key composed of the characters beginning
-at the start of field three and extending to the end of each line.
-
-@example
-sort -k 3
-@end example
-
-@item
-Sort numerically on the second field and resolve ties by sorting
-alphabetically on the third and fourth characters of field five.
-Use @samp{:} as the field delimiter.
-
-@example
-sort -t : -k 2,2n -k 5.3,5.4
-@end example
-
-Note that if you had written @samp{-k 2} instead of @samp{-k 2,2}
-@command{sort} would have used all characters beginning in the second field
-and extending to the end of the line as the primary @emph{numeric}
-key. For the large majority of applications, treating keys spanning
-more than one field as numeric will not do what you expect.
-
-Also note that the @samp{n} modifier was applied to the field-end
-specifier for the first key. It would have been equivalent to
-specify @samp{-k 2n,2} or @samp{-k 2n,2n}. All modifiers except
-@samp{b} apply to the associated @emph{field}, regardless of whether
-the modifier character is attached to the field-start and/or the
-field-end part of the key specifier.
-
-@item
-Sort the password file on the fifth field and ignore any
-leading white space. Sort lines with equal values in field five
-on the numeric user ID in field three.
-
-@example
-sort -t : -k 5b,5 -k 3,3n /etc/passwd
-@end example
-
-An alternative is to use the global numeric modifier @samp{-n}.
-
-@example
-sort -t : -n -k 5b,5 -k 3,3 /etc/passwd
-@end example
-
-@item
-Generate a tags file in case-insensitive sorted order.
-
-@smallexample
-find src -type f -print0 | sort -t / -z -f | xargs -0 etags --append
-@end smallexample
-
-The use of @samp{-print0}, @samp{-z}, and @samp{-0} in this case means
-that pathnames that contain Line Feed characters will not get broken up
-by the sort operation.
-
-Finally, to ignore both leading and trailing white space, you
-could have applied the @samp{b} modifier to the field-end specifier
-for the first key,
-
-@example
-sort -t : -n -k 5b,5b -k 3,3 /etc/passwd
-@end example
-
-or by using the global @samp{-b} modifier instead of @samp{-n}
-and an explicit @samp{n} with the second key specifier.
-
-@example
-sort -t : -b -k 5,5 -k 3,3n /etc/passwd
-@end example
-
-@c This example is a bit contrived and needs more explanation.
-@c @item
-@c Sort records separated by an arbitrary string by using a pipe to convert
-@c each record delimiter string to @samp{\0}, then using sort's -z option,
-@c and converting each @samp{\0} back to the original record delimiter.
-@c
-@c @example
-@c printf 'c\n\nb\n\na\n'|perl -0pe 's/\n\n/\n\0/g'|sort -z|perl -0pe 's/\0/\n/g'
-@c @end example
-
-@end itemize
-
-
-@node uniq invocation
-@section @code{uniq}: Uniquify files
-
-@pindex uniq
-@cindex uniquify files
-
-@code{uniq} writes the unique lines in the given @file{input}, or
-standard input if nothing is given or for an @var{input} name of
-@samp{-}. Synopsis:
-
-@example
-uniq [@var{option}]@dots{} [@var{input} [@var{output}]]
-@end example
-
-By default, @code{uniq} prints the unique lines in a sorted file, i.e.,
-discards all but one of identical successive lines. Optionally, it can
-instead show only lines that appear exactly once, or lines that appear
-more than once.
-
-The input must be sorted. If your input is not sorted, perhaps you want
-to use @code{sort -u}.
-
-If no @var{output} file is specified, @code{uniq} writes to standard
-output.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -@var{n}
-@itemx -f @var{n}
-@itemx --skip-fields=@var{n}
-@opindex -@var{n}
-@opindex -f
-@opindex --skip-fields
-Skip @var{n} fields on each line before checking for uniqueness. Fields
-are sequences of non-space non-tab characters that are separated from
-each other by at least one space or tab.
-
-@item +@var{n}
-@itemx -s @var{n}
-@itemx --skip-chars=@var{n}
-@opindex +@var{n}
-@opindex -s
-@opindex --skip-chars
-Skip @var{n} characters before checking for uniqueness. If you use both
-the field and character skipping options, fields are skipped over first.
-
-Warning: the @samp{+@var{n}} usage is obsolescent. Future versions of
-@sc{posix} will require that support for it be withdrawn. Use @samp{-s
-@var{n}} instead.
-
-@item -c
-@itemx --count
-@opindex -c
-@opindex --count
-Print the number of times each line occurred along with the line.
-
-@item -i
-@itemx --ignore-case
-@opindex -i
-@opindex --ignore-case
-Ignore differences in case when comparing lines.
-
-@item -d
-@itemx --repeated
-@opindex -d
-@opindex --repeated
-@cindex duplicate lines, outputting
-Print only duplicate lines.
-
-@item -D
-@itemx --all-repeated
-@opindex -D
-@opindex --all-repeated
-@cindex all duplicate lines, outputting
-Print all duplicate lines and only duplicate lines.
-This option is useful mainly in conjunction with other options e.g.,
-to ignore case or to compare only selected fields.
-This is a @sc{gnu} extension.
-@c FIXME: give an example showing *how* it's useful
-
-@item -u
-@itemx --unique
-@opindex -u
-@opindex --unique
-@cindex unique lines, outputting
-Print only unique lines.
-
-@item -w @var{n}
-@itemx --check-chars=@var{n}
-@opindex -w
-@opindex --check-chars
-Compare @var{n} characters on each line (after skipping any specified
-fields and characters). By default the entire rest of the lines are
-compared.
-
-@end table
-
-
-@node comm invocation
-@section @code{comm}: Compare two sorted files line by line
-
-@pindex comm
-@cindex line-by-line comparison
-@cindex comparing sorted files
-
-@code{comm} writes to standard output lines that are common, and lines
-that are unique, to two input files; a file name of @samp{-} means
-standard input. Synopsis:
-
-@example
-comm [@var{option}]@dots{} @var{file1} @var{file2}
-@end example
-
-@vindex LC_COLLATE
-Before @code{comm} can be used, the input files must be sorted using the
-collating sequence specified by the @env{LC_COLLATE} locale.
-If an input file ends in a non-newline
-character, a newline is silently appended. The @code{sort} command with
-no options always outputs a file that is suitable input to @code{comm}.
-
-@cindex differing lines
-@cindex common lines
-With no options, @code{comm} produces three column output. Column one
-contains lines unique to @var{file1}, column two contains lines unique
-to @var{file2}, and column three contains lines common to both files.
-Columns are separated by a single TAB character.
-@c FIXME: when there's an option to supply an alternative separator
-@c string, append `by default' to the above sentence.
-
-@opindex -1
-@opindex -2
-@opindex -3
-The options @samp{-1}, @samp{-2}, and @samp{-3} suppress printing of
-the corresponding columns. Also see @ref{Common options}.
-
-Unlike some other comparison utilities, @code{comm} has an exit
-status that does not depend on the result of the comparison.
-Upon normal completion @code{comm} produces an exit code of zero.
-If there is an error it exits with nonzero status.
-
-
-@node tsort invocation
-@section @code{tsort}: Topological sort
-
-@pindex tsort
-@cindex topological sort
-
-@code{tsort} performs a topological sort on the given @var{file}, or
-standard input if no input file is given or for a @var{file} of
-@samp{-}. Synopsis:
-
-@example
-tsort [@var{option}] [@var{file}]
-@end example
-
-@code{tsort} reads its input as pairs of strings, separated by blanks,
-indicating a partial ordering. The output is a total ordering that
-corresponds to the given partial ordering.
-
-For example
-
-@example
-tsort <<EOF
-a b c
-d
-e f
-b c d e
-EOF
-@end example
-
-@noindent
-will produce the output
-
-@example
-a
-b
-c
-d
-e
-f
-@end example
-
-@code{tsort} will detect cycles in the input and writes the first cycle
-encountered to standard error.
-
-Note that for a given partial ordering, generally there is no unique
-total ordering.
-
-The only options are @samp{--help} and @samp{--version}. @xref{Common
-options}.
-
-
-@node ptx invocation
-@section @code{ptx}: Produce permuted indexes
-
-@pindex ptx
-
-@code{ptx} reads a text file and essentially produces a permuted index, with
-each keyword in its context. The calling sketch is either one of:
-
-@example
-ptx [@var{option} @dots{}] [@var{file} @dots{}]
-ptx -G [@var{option} @dots{}] [@var{input} [@var{output}]]
-@end example
-
-The @samp{-G} (or its equivalent: @samp{--traditional}) option disables
-all @sc{gnu} extensions and reverts to traditional mode, thus introducing some
-limitations and changing several of the program's default option values.
-When @samp{-G} is not specified, @sc{gnu} extensions are always enabled.
-@sc{gnu} extensions to @code{ptx} are documented wherever appropriate in this
-document. For the full list, see @xref{Compatibility in ptx}.
-
-Individual options are explained in the following sections.
-
-When @sc{gnu} extensions are enabled, there may be zero, one or several
-@var{file}s after the options. If there is no @var{file}, the program
-reads the standard input. If there is one or several @var{file}s, they
-give the name of input files which are all read in turn, as if all the
-input files were concatenated. However, there is a full contextual
-break between each file and, when automatic referencing is requested,
-file names and line numbers refer to individual text input files. In
-all cases, the program outputs the permuted index to the standard
-output.
-
-When @sc{gnu} extensions are @emph{not} enabled, that is, when the program
-operates in traditional mode, there may be zero, one or two parameters
-besides the options. If there are no parameters, the program reads the
-standard input and outputs the permuted index to the standard output.
-If there is only one parameter, it names the text @var{input} to be read
-instead of the standard input. If two parameters are given, they give
-respectively the name of the @var{input} file to read and the name of
-the @var{output} file to produce. @emph{Be very careful} to note that,
-in this case, the contents of file given by the second parameter is
-destroyed. This behavior is dictated by System V @code{ptx}
-compatibility; @sc{gnu} Standards normally discourage output parameters not
-introduced by an option.
-
-Note that for @emph{any} file named as the value of an option or as an
-input text file, a single dash @kbd{-} may be used, in which case
-standard input is assumed. However, it would not make sense to use this
-convention more than once per program invocation.
-
-@menu
-* General options in ptx:: Options which affect general program behavior.
-* Charset selection in ptx:: Underlying character set considerations.
-* Input processing in ptx:: Input fields, contexts, and keyword selection.
-* Output formatting in ptx:: Types of output format, and sizing the fields.
-* Compatibility in ptx::
-@end menu
-
-
-@node General options in ptx
-@subsection General options
-
-@table @samp
-
-@item -C
-@itemx --copyright
-Print a short note about the copyright and copying conditions, then
-exit without further processing.
-
-@item -G
-@itemx --traditional
-As already explained, this option disables all @sc{gnu} extensions to
-@code{ptx} and switches to traditional mode.
-
-@item --help
-Print a short help on standard output, then exit without further
-processing.
-
-@item --version
-Print the program version on standard output, then exit without further
-processing.
-
-@end table
-
-
-@node Charset selection in ptx
-@subsection Charset selection
-
-@c FIXME: People don't necessarily know what an IBM-PC was these days.
-As it is set up now, the program assumes that the input file is coded
-using 8-bit ISO 8859-1 code, also known as Latin-1 character set,
-@emph{unless} it is compiled for MS-DOS, in which case it uses the
-character set of the IBM-PC. (@sc{gnu} @code{ptx} is not known to work on
-smaller MS-DOS machines anymore.) Compared to 7-bit @sc{ascii}, the set
-of characters which are letters is different; this alters the behavior
-of regular expression matching. Thus, the default regular expression
-for a keyword allows foreign or diacriticized letters. Keyword sorting,
-however, is still crude; it obeys the underlying character set ordering
-quite blindly.
-
-@table @samp
-
-@item -f
-@itemx --ignore-case
-Fold lower case letters to upper case for sorting.
-
-@end table
-
-
-@node Input processing in ptx
-@subsection Word selection and input processing
-
-@table @samp
-
-@item -b @var{file}
-@item --break-file=@var{file}
-
-This option provides an alternative (to @samp{-W}) method of describing
-which characters make up words. It introduces the name of a
-file which contains a list of characters which can@emph{not} be part of
-one word; this file is called the @dfn{Break file}. Any character which
-is not part of the Break file is a word constituent. If both options
-@samp{-b} and @samp{-W} are specified, then @samp{-W} has precedence and
-@samp{-b} is ignored.
-
-When @sc{gnu} extensions are enabled, the only way to avoid newline as a
-break character is to write all the break characters in the file with no
-newline at all, not even at the end of the file. When @sc{gnu} extensions
-are disabled, spaces, tabs and newlines are always considered as break
-characters even if not included in the Break file.
-
-@item -i @var{file}
-@itemx --ignore-file=@var{file}
-
-The file associated with this option contains a list of words which will
-never be taken as keywords in concordance output. It is called the
-@dfn{Ignore file}. The file contains exactly one word in each line; the
-end of line separation of words is not subject to the value of the
-@samp{-S} option.
-
-There is a default Ignore file used by @code{ptx} when this option is
-not specified, usually found in @file{/usr/local/lib/eign} if this has
-not been changed at installation time. If you want to deactivate the
-default Ignore file, specify @code{/dev/null} instead.
-
-@item -o @var{file}
-@itemx --only-file=@var{file}
-
-The file associated with this option contains a list of words which will
-be retained in concordance output; any word not mentioned in this file
-is ignored. The file is called the @dfn{Only file}. The file contains
-exactly one word in each line; the end of line separation of words is
-not subject to the value of the @samp{-S} option.
-
-There is no default for the Only file. When both an Only file and an
-Ignore file are specified, a word is considered a keyword only
-if it is listed in the Only file and not in the Ignore file.
-
-@item -r
-@itemx --references
-
-On each input line, the leading sequence of non-white space characters will be
-taken to be a reference that has the purpose of identifying this input
-line in the resulting permuted index. For more information about reference
-production, see @xref{Output formatting in ptx}.
-Using this option changes the default value for option @samp{-S}.
-
-Using this option, the program does not try very hard to remove
-references from contexts in output, but it succeeds in doing so
-@emph{when} the context ends exactly at the newline. If option
-@samp{-r} is used with @samp{-S} default value, or when @sc{gnu} extensions
-are disabled, this condition is always met and references are completely
-excluded from the output contexts.
-
-@item -S @var{regexp}
-@itemx --sentence-regexp=@var{regexp}
-
-This option selects which regular expression will describe the end of a
-line or the end of a sentence. In fact, this regular expression is not
-the only distinction between end of lines or end of sentences, and input
-line boundaries have no special significance outside this option. By
-default, when @sc{gnu} extensions are enabled and if @samp{-r} option is not
-used, end of sentences are used. In this case, this @var{regex} is
-imported from @sc{gnu} Emacs:
-
-@example
-[.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]*
-@end example
-
-Whenever @sc{gnu} extensions are disabled or if @samp{-r} option is used, end
-of lines are used; in this case, the default @var{regexp} is just:
-
-@example
-\n
-@end example
-
-Using an empty @var{regexp} is equivalent to completely disabling end of
-line or end of sentence recognition. In this case, the whole file is
-considered to be a single big line or sentence. The user might want to
-disallow all truncation flag generation as well, through option @samp{-F
-""}. @xref{Regexps, , Syntax of Regular Expressions, emacs, The GNU Emacs
-Manual}.
-
-When the keywords happen to be near the beginning of the input line or
-sentence, this often creates an unused area at the beginning of the
-output context line; when the keywords happen to be near the end of the
-input line or sentence, this often creates an unused area at the end of
-the output context line. The program tries to fill those unused areas
-by wrapping around context in them; the tail of the input line or
-sentence is used to fill the unused area on the left of the output line;
-the head of the input line or sentence is used to fill the unused area
-on the right of the output line.
-
-As a matter of convenience to the user, many usual backslashed escape
-sequences from the C language are recognized and converted to the
-corresponding characters by @code{ptx} itself.
-
-@item -W @var{regexp}
-@itemx --word-regexp=@var{regexp}
-
-This option selects which regular expression will describe each keyword.
-By default, if @sc{gnu} extensions are enabled, a word is a sequence of
-letters; the @var{regexp} used is @samp{\w+}. When @sc{gnu} extensions are
-disabled, a word is by default anything which ends with a space, a tab
-or a newline; the @var{regexp} used is @samp{[^ \t\n]+}.
-
-An empty @var{regexp} is equivalent to not using this option.
-@xref{Regexps, , Syntax of Regular Expressions, emacs, The GNU Emacs
-Manual}.
-
-As a matter of convenience to the user, many usual backslashed escape
-sequences, as found in the C language, are recognized and converted to
-the corresponding characters by @code{ptx} itself.
-
-@end table
-
-
-@node Output formatting in ptx
-@subsection Output formatting
-
-Output format is mainly controlled by the @samp{-O} and @samp{-T} options
-described in the table below. When neither @samp{-O} nor @samp{-T} are
-selected, and if @sc{gnu} extensions are enabled, the program chooses an
-output format suitable for a dumb terminal. Each keyword occurrence is
-output to the center of one line, surrounded by its left and right
-contexts. Each field is properly justified, so the concordance output
-can be readily observed. As a special feature, if automatic
-references are selected by option @samp{-A} and are output before the
-left context, that is, if option @samp{-R} is @emph{not} selected, then
-a colon is added after the reference; this nicely interfaces with @sc{gnu}
-Emacs @code{next-error} processing. In this default output format, each
-white space character, like newline and tab, is merely changed to
-exactly one space, with no special attempt to compress consecutive
-spaces. This might change in the future. Except for those white space
-characters, every other character of the underlying set of 256
-characters is transmitted verbatim.
-
-Output format is further controlled by the following options.
-
-@table @samp
-
-@item -g @var{number}
-@itemx --gap-size=@var{number}
-
-Select the size of the minimum white space gap between the fields on the
-output line.
-
-@item -w @var{number}
-@itemx --width=@var{number}
-
-Select the maximum output width of each final line. If references are
-used, they are included or excluded from the maximum output width
-depending on the value of option @samp{-R}. If this option is not
-selected, that is, when references are output before the left context,
-the maximum output width takes into account the maximum length of all
-references. If this option is selected, that is, when references are
-output after the right context, the maximum output width does not take
-into account the space taken by references, nor the gap that precedes
-them.
-
-@item -A
-@itemx --auto-reference
-
-Select automatic references. Each input line will have an automatic
-reference made up of the file name and the line ordinal, with a single
-colon between them. However, the file name will be empty when standard
-input is being read. If both @samp{-A} and @samp{-r} are selected, then
-the input reference is still read and skipped, but the automatic
-reference is used at output time, overriding the input reference.
-
-@item -R
-@itemx --right-side-refs
-
-In the default output format, when option @samp{-R} is not used, any
-references produced by the effect of options @samp{-r} or @samp{-A} are
-placed to the far right of output lines, after the right context. With
-default output format, when the @samp{-R} option is specified, references
-are rather placed at the beginning of each output line, before the left
-context. For any other output format, option @samp{-R} is
-ignored, with one exception: with @samp{-R} the width of references
-is @emph{not} taken into account in total output width given by @samp{-w}.
-
-This option is automatically selected whenever @sc{gnu} extensions are
-disabled.
-
-@item -F @var{string}
-@itemx --flac-truncation=@var{string}
-
-This option will request that any truncation in the output be reported
-using the string @var{string}. Most output fields theoretically extend
-towards the beginning or the end of the current line, or current
-sentence, as selected with option @samp{-S}. But there is a maximum
-allowed output line width, changeable through option @samp{-w}, which is
-further divided into space for various output fields. When a field has
-to be truncated because it cannot extend beyond the beginning or the end of
-the current line to fit in, then a truncation occurs. By default,
-the string used is a single slash, as in @samp{-F /}.
-
-@var{string} may have more than one character, as in @samp{-F ...}.
-Also, in the particular case when @var{string} is empty (@samp{-F ""}),
-truncation flagging is disabled, and no truncation marks are appended in
-this case.
-
-As a matter of convenience to the user, many usual backslashed escape
-sequences, as found in the C language, are recognized and converted to
-the corresponding characters by @code{ptx} itself.
-
-@item -M @var{string}
-@itemx --macro-name=@var{string}
-
-Select another @var{string} to be used instead of @samp{xx}, while
-generating output suitable for @code{nroff}, @code{troff} or @TeX{}.
-
-@item -O
-@itemx --format=roff
-
-Choose an output format suitable for @code{nroff} or @code{troff}
-processing. Each output line will look like:
-
-@smallexample
-.xx "@var{tail}" "@var{before}" "@var{keyword_and_after}" "@var{head}" "@var{ref}"
-@end smallexample
-
-so it will be possible to write a @samp{.xx} roff macro to take care of
-the output typesetting. This is the default output format when @sc{gnu}
-extensions are disabled. Option @samp{-M} can be used to change
-@samp{xx} to another macro name.
-
-In this output format, each non-graphical character, like newline and
-tab, is merely changed to exactly one space, with no special attempt to
-compress consecutive spaces. Each quote character: @kbd{"} is doubled
-so it will be correctly processed by @code{nroff} or @code{troff}.
-
-@item -T
-@itemx --format=tex
-
-Choose an output format suitable for @TeX{} processing. Each output
-line will look like:
-
-@smallexample
-\xx @{@var{tail}@}@{@var{before}@}@{@var{keyword}@}@{@var{after}@}@{@var{head}@}@{@var{ref}@}
-@end smallexample
-
-@noindent
-so it will be possible to write a @code{\xx} definition to take care of
-the output typesetting. Note that when references are not being
-produced, that is, neither option @samp{-A} nor option @samp{-r} is
-selected, the last parameter of each @code{\xx} call is inhibited.
-Option @samp{-M} can be used to change @samp{xx} to another macro
-name.
-
-In this output format, some special characters, like @kbd{$}, @kbd{%},
-@kbd{&}, @kbd{#} and @kbd{_} are automatically protected with a
-backslash. Curly brackets @kbd{@{}, @kbd{@}} are protected with a
-backslash and a pair of dollar signs (to force mathematical mode). The
-backslash itself produces the sequence @code{\backslash@{@}}.
-Circumflex and tilde diacritics produce the sequence @code{^\@{ @}} and
-@code{~\@{ @}} respectively. Other diacriticized characters of the
-underlying character set produce an appropriate @TeX{} sequence as far
-as possible. The other non-graphical characters, like newline and tab,
-and all other characters which are not part of @sc{ascii}, are merely
-changed to exactly one space, with no special attempt to compress
-consecutive spaces. Let me know how to improve this special character
-processing for @TeX{}.
-
-@end table
-
-
-@node Compatibility in ptx
-@subsection The @sc{gnu} extensions to @code{ptx}
-
-This version of @code{ptx} contains a few features which do not exist in
-System V @code{ptx}. These extra features are suppressed by using the
-@samp{-G} command line option, unless overridden by other command line
-options. Some @sc{gnu} extensions cannot be recovered by overriding, so the
-simple rule is to avoid @samp{-G} if you care about @sc{gnu} extensions.
-Here are the differences between this program and System V @code{ptx}.
-
-@itemize @bullet
-
-@item
-This program can read many input files at once, it always writes the
-resulting concordance on standard output. On the other hand, System V
-@code{ptx} reads only one file and sends the result to standard output
-or, if a second @var{file} parameter is given on the command, to that
-@var{file}.
-
-Having output parameters not introduced by options is a dangerous
-practice which @sc{gnu} avoids as far as possible. So, for using @code{ptx}
-portably between @sc{gnu} and System V, you should always use it with a
-single input file, and always expect the result on standard output. You
-might also want to automatically configure in a @samp{-G} option to
-@code{ptx} calls in products using @code{ptx}, if the configurator finds
-that the installed @code{ptx} accepts @samp{-G}.
-
-@item
-The only options available in System V @code{ptx} are options @samp{-b},
-@samp{-f}, @samp{-g}, @samp{-i}, @samp{-o}, @samp{-r}, @samp{-t} and
-@samp{-w}. All other options are @sc{gnu} extensions and are not repeated in
-this enumeration. Moreover, some options have a slightly different
-meaning when @sc{gnu} extensions are enabled, as explained below.
-
-@item
-By default, concordance output is not formatted for @code{troff} or
-@code{nroff}. It is rather formatted for a dumb terminal. @code{troff}
-or @code{nroff} output may still be selected through option @samp{-O}.
-
-@item
-Unless @samp{-R} option is used, the maximum reference width is
-subtracted from the total output line width. With @sc{gnu} extensions
-disabled, width of references is not taken into account in the output
-line width computations.
-
-@item
-All 256 characters, even @kbd{NUL}s, are always read and processed from
-input file with no adverse effect, even if @sc{gnu} extensions are disabled.
-However, System V @code{ptx} does not accept 8-bit characters, a few
-control characters are rejected, and the tilde @kbd{~} is also rejected.
-
-@item
-Input line length is only limited by available memory, even if @sc{gnu}
-extensions are disabled. However, System V @code{ptx} processes only
-the first 200 characters in each line.
-
-@item
-The break (non-word) characters default to be every character except all
-letters of the underlying character set, diacriticized or not. When @sc{gnu}
-extensions are disabled, the break characters default to space, tab and
-newline only.
-
-@item
-The program makes better use of output line width. If @sc{gnu} extensions
-are disabled, the program rather tries to imitate System V @code{ptx},
-but still, there are some slight disposition glitches this program does
-not completely reproduce.
-
-@item
-The user can specify both an Ignore file and an Only file. This is not
-allowed with System V @code{ptx}.
-
-@end itemize
-
-
-@node Operating on fields within a line
-@chapter Operating on fields within a line
-
-@menu
-* cut invocation:: Print selected parts of lines.
-* paste invocation:: Merge lines of files.
-* join invocation:: Join lines on a common field.
-@end menu
-
-
-@node cut invocation
-@section @code{cut}: Print selected parts of lines
-
-@pindex cut
-@code{cut} writes to standard output selected parts of each line of each
-input file, or standard input if no files are given or for a file name of
-@samp{-}. Synopsis:
-
-@example
-cut [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-In the table which follows, the @var{byte-list}, @var{character-list},
-and @var{field-list} are one or more numbers or ranges (two numbers
-separated by a dash) separated by commas. Bytes, characters, and
-fields are numbered starting at 1. Incomplete ranges may be
-given: @samp{-@var{m}} means @samp{1-@var{m}}; @samp{@var{n}-} means
-@samp{@var{n}} through end of line or last field.
-
-The program accepts the following options. Also see @ref{Common
-options}.
-
-@table @samp
-
-@item -b @var{byte-list}
-@itemx --bytes=@var{byte-list}
-@opindex -b
-@opindex --bytes
-Print only the bytes in positions listed in @var{byte-list}. Tabs and
-backspaces are treated like any other character; they take up 1 byte.
-
-@item -c @var{character-list}
-@itemx --characters=@var{character-list}
-@opindex -c
-@opindex --characters
-Print only characters in positions listed in @var{character-list}.
-The same as @samp{-b} for now, but internationalization will change
-that. Tabs and backspaces are treated like any other character; they
-take up 1 character.
-
-@item -f @var{field-list}
-@itemx --fields=@var{field-list}
-@opindex -f
-@opindex --fields
-Print only the fields listed in @var{field-list}. Fields are
-separated by a TAB character by default.
-Also print any line that contains no delimiter character, unless
-the @samp{--only-delimited} (@samp{-s}) option is specified
-
-@item -d @var{input_delim_byte}
-@itemx --delimiter=@var{input_delim_byte}
-@opindex -d
-@opindex --delimiter
-For @samp{-f}, fields are separated in the input by the first character
-in @var{input_delim_byte} (default is TAB).
-
-@item -n
-@opindex -n
-Do not split multi-byte characters (no-op for now).
-
-@item -s
-@itemx --only-delimited
-@opindex -s
-@opindex --only-delimited
-For @samp{-f}, do not print lines that do not contain the field separator
-character.
-
-@itemx --output-delimiter=@var{output_delim_string}
-@opindex --output-delimiter
-For @samp{-f}, output fields are separated by @var{output_delim_string}.
-The default is to use the input delimiter.
-
-
-@end table
-
-
-@node paste invocation
-@section @code{paste}: Merge lines of files
-
-@pindex paste
-@cindex merging files
-
-@code{paste} writes to standard output lines consisting of sequentially
-corresponding lines of each given file, separated by a TAB character.
-Standard input is used for a file name of @samp{-} or if no input files
-are given.
-
-Synopsis:
-
-@example
-paste [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -s
-@itemx --serial
-@opindex -s
-@opindex --serial
-Paste the lines of one file at a time rather than one line from each
-file.
-
-@item -d @var{delim-list}
-@itemx --delimiters=@var{delim-list}
-@opindex -d
-@opindex --delimiters
-Consecutively use the characters in @var{delim-list} instead of
-TAB to separate merged lines. When @var{delim-list} is
-exhausted, start again at its beginning.
-
-@end table
-
-
-@node join invocation
-@section @code{join}: Join lines on a common field
-
-@pindex join
-@cindex common field, joining on
-
-@code{join} writes to standard output a line for each pair of input
-lines that have identical join fields. Synopsis:
-
-@example
-join [@var{option}]@dots{} @var{file1} @var{file2}
-@end example
-
-@vindex LC_COLLATE
-Either @var{file1} or @var{file2} (but not both) can be @samp{-},
-meaning standard input. @var{file1} and @var{file2} should be already
-sorted in increasing textual order on the join fields, using the
-collating sequence specified by the @env{LC_COLLATE} locale. Unless
-the @samp{-t} option is given, the input should be sorted ignoring blanks at
-the start of the join field, as in @code{sort -b}. If the
-@samp{--ignore-case} option is given, lines should be sorted without
-regard to the case of characters in the join field, as in @code{sort -f}.
-
-The defaults are: the join field is the first field in each line;
-fields in the input are separated by one or more blanks, with leading
-blanks on the line ignored; fields in the output are separated by a
-space; each output line consists of the join field, the remaining
-fields from @var{file1}, then the remaining fields from @var{file2}.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -a @var{file-number}
-@opindex -a
-Print a line for each unpairable line in file @var{file-number} (either
-@samp{1} or @samp{2}), in addition to the normal output.
-
-@item -e @var{string}
-@opindex -e
-Replace those output fields that are missing in the input with
-@var{string}.
-
-@item -i
-@itemx --ignore-case
-@opindex -i
-@opindex --ignore-case
-Ignore differences in case when comparing keys.
-With this option, the lines of the input files must be ordered in the same way.
-Use @samp{sort -f} to produce this ordering.
-
-@item -1 @var{field}
-@itemx -j1 @var{field}
-@opindex -1
-@opindex -j1
-Join on field @var{field} (a positive integer) of file 1.
-
-@item -2 @var{field}
-@itemx -j2 @var{field}
-@opindex -2
-@opindex -j2
-Join on field @var{field} (a positive integer) of file 2.
-
-@item -j @var{field}
-Equivalent to @samp{-1 @var{field} -2 @var{field}}.
-
-@item -o @var{field-list}@dots{}
-Construct each output line according to the format in @var{field-list}.
-Each element in @var{field-list} is either the single character @samp{0} or
-has the form @var{m.n} where the file number, @var{m}, is @samp{1} or
-@samp{2} and @var{n} is a positive field number.
-
-A field specification of @samp{0} denotes the join field.
-In most cases, the functionality of the @samp{0} field spec
-may be reproduced using the explicit @var{m.n} that corresponds
-to the join field. However, when printing unpairable lines
-(using either of the @samp{-a} or @samp{-v} options), there is no way
-to specify the join field using @var{m.n} in @var{field-list}
-if there are unpairable lines in both files.
-To give @code{join} that functionality, @sc{posix} invented the @samp{0}
-field specification notation.
-
-The elements in @var{field-list}
-are separated by commas or blanks. Multiple @var{field-list}
-arguments can be given after a single @samp{-o} option; the values
-of all lists given with @samp{-o} are concatenated together.
-All output lines -- including those printed because of any -a or -v
-option -- are subject to the specified @var{field-list}.
-
-@item -t @var{char}
-Use character @var{char} as the input and output field separator.
-
-@item -v @var{file-number}
-Print a line for each unpairable line in file @var{file-number}
-(either @samp{1} or @samp{2}), instead of the normal output.
-
-@end table
-
-In addition, when @sc{gnu} @code{join} is invoked with exactly one argument,
-options @samp{--help} and @samp{--version} are recognized. @xref{Common
-options}.
-
-
-@node Operating on characters
-@chapter Operating on characters
-
-@cindex operating on characters
-
-This commands operate on individual characters.
-
-@menu
-* tr invocation:: Translate, squeeze, and/or delete characters.
-* expand invocation:: Convert tabs to spaces.
-* unexpand invocation:: Convert spaces to tabs.
-@end menu
-
-
-@node tr invocation
-@section @code{tr}: Translate, squeeze, and/or delete characters
-
-@pindex tr
-
-Synopsis:
-
-@example
-tr [@var{option}]@dots{} @var{set1} [@var{set2}]
-@end example
-
-@code{tr} copies standard input to standard output, performing
-one of the following operations:
-
-@itemize @bullet
-@item
-translate, and optionally squeeze repeated characters in the result,
-@item
-squeeze repeated characters,
-@item
-delete characters,
-@item
-delete characters, then squeeze repeated characters from the result.
-@end itemize
-
-The @var{set1} and (if given) @var{set2} arguments define ordered
-sets of characters, referred to below as @var{set1} and @var{set2}. These
-sets are the characters of the input that @code{tr} operates on.
-The @samp{--complement} (@samp{-c}) option replaces @var{set1} with its
-complement (all of the characters that are not in @var{set1}).
-
-@menu
-* Character sets:: Specifying sets of characters.
-* Translating:: Changing one characters to another.
-* Squeezing:: Squeezing repeats and deleting.
-* Warnings in tr:: Warning messages.
-@end menu
-
-
-@node Character sets
-@subsection Specifying sets of characters
-
-@cindex specifying sets of characters
-
-The format of the @var{set1} and @var{set2} arguments resembles
-the format of regular expressions; however, they are not regular
-expressions, only lists of characters. Most characters simply
-represent themselves in these strings, but the strings can contain
-the shorthands listed below, for convenience. Some of them can be
-used only in @var{set1} or @var{set2}, as noted below.
-
-@table @asis
-
-@item Backslash escapes
-@cindex backslash escapes
-
-A backslash followed by a character not listed below causes an error
-message.
-
-@table @samp
-@item \a
-Control-G.
-@item \b
-Control-H.
-@item \f
-Control-L.
-@item \n
-Control-J.
-@item \r
-Control-M.
-@item \t
-Control-I.
-@item \v
-Control-K.
-@item \@var{ooo}
-The character with the value given by @var{ooo}, which is 1 to 3
-octal digits,
-@item \\
-A backslash.
-@end table
-
-@item Ranges
-@cindex ranges
-
-The notation @samp{@var{m}-@var{n}} expands to all of the characters
-from @var{m} through @var{n}, in ascending order. @var{m} should
-collate before @var{n}; if it doesn't, an error results. As an example,
-@samp{0-9} is the same as @samp{0123456789}.
-
-@sc{gnu} @code{tr} does not support the System V syntax that uses square
-brackets to enclose ranges. Translations specified in that format
-sometimes work as expected, since the brackets are often transliterated
-to themselves. However, they should be avoided because they sometimes
-behave unexpectedly. For example, @samp{tr -d '[0-9]'} deletes brackets
-as well as digits.
-
-Many historically common and even accepted uses of ranges are not
-portable. For example, on @sc{ebcdic} hosts using the @samp{A-Z}
-range will not do what most would expect because @samp{A} through @samp{Z}
-are not contiguous as they are in @sc{ascii}.
-If you can rely on a @sc{posix} compliant version of @code{tr}, then
-the best way to work around this is to use character classes (see below).
-Otherwise, it is most portable (and most ugly) to enumerate the members
-of the ranges.
-
-@item Repeated characters
-@cindex repeated characters
-
-The notation @samp{[@var{c}*@var{n}]} in @var{set2} expands to @var{n}
-copies of character @var{c}. Thus, @samp{[y*6]} is the same as
-@samp{yyyyyy}. The notation @samp{[@var{c}*]} in @var{string2} expands
-to as many copies of @var{c} as are needed to make @var{set2} as long as
-@var{set1}. If @var{n} begins with @samp{0}, it is interpreted in
-octal, otherwise in decimal.
-
-@item Character classes
-@cindex characters classes
-
-The notation @samp{[:@var{class}:]} expands to all of the characters in
-the (predefined) class @var{class}. The characters expand in no
-particular order, except for the @code{upper} and @code{lower} classes,
-which expand in ascending order. When the @samp{--delete} (@samp{-d})
-and @samp{--squeeze-repeats} (@samp{-s}) options are both given, any
-character class can be used in @var{set2}. Otherwise, only the
-character classes @code{lower} and @code{upper} are accepted in
-@var{set2}, and then only if the corresponding character class
-(@code{upper} and @code{lower}, respectively) is specified in the same
-relative position in @var{set1}. Doing this specifies case conversion.
-The class names are given below; an error results when an invalid class
-name is given.
-
-@table @code
-@item alnum
-@opindex alnum
-Letters and digits.
-@item alpha
-@opindex alpha
-Letters.
-@item blank
-@opindex blank
-Horizontal whitespace.
-@item cntrl
-@opindex cntrl
-Control characters.
-@item digit
-@opindex digit
-Digits.
-@item graph
-@opindex graph
-Printable characters, not including space.
-@item lower
-@opindex lower
-Lowercase letters.
-@item print
-@opindex print
-Printable characters, including space.
-@item punct
-@opindex punct
-Punctuation characters.
-@item space
-@opindex space
-Horizontal or vertical whitespace.
-@item upper
-@opindex upper
-Uppercase letters.
-@item xdigit
-@opindex xdigit
-Hexadecimal digits.
-@end table
-
-@item Equivalence classes
-@cindex equivalence classes
-
-The syntax @samp{[=@var{c}=]} expands to all of the characters that are
-equivalent to @var{c}, in no particular order. Equivalence classes are
-a relatively recent invention intended to support non-English alphabets.
-But there seems to be no standard way to define them or determine their
-contents. Therefore, they are not fully implemented in @sc{gnu} @code{tr};
-each character's equivalence class consists only of that character,
-which is of no particular use.
-
-@end table
-
-
-@node Translating
-@subsection Translating
-
-@cindex translating characters
-
-@code{tr} performs translation when @var{set1} and @var{set2} are
-both given and the @samp{--delete} (@samp{-d}) option is not given.
-@code{tr} translates each character of its input that is in @var{set1}
-to the corresponding character in @var{set2}. Characters not in
-@var{set1} are passed through unchanged. When a character appears more
-than once in @var{set1} and the corresponding characters in @var{set2}
-are not all the same, only the final one is used. For example, these
-two commands are equivalent:
-
-@example
-tr aaa xyz
-tr a z
-@end example
-
-A common use of @code{tr} is to convert lowercase characters to
-uppercase. This can be done in many ways. Here are three of them:
-
-@example
-tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
-tr a-z A-Z
-tr '[:lower:]' '[:upper:]'
-@end example
-
-@noindent
-But note that using ranges like @code{a-z} above is not portable.
-
-When @code{tr} is performing translation, @var{set1} and @var{set2}
-typically have the same length. If @var{set1} is shorter than
-@var{set2}, the extra characters at the end of @var{set2} are ignored.
-
-On the other hand, making @var{set1} longer than @var{set2} is not
-portable; @sc{posix.2} says that the result is undefined. In this situation,
-BSD @code{tr} pads @var{set2} to the length of @var{set1} by repeating
-the last character of @var{set2} as many times as necessary. System V
-@code{tr} truncates @var{set1} to the length of @var{set2}.
-
-By default, @sc{gnu} @code{tr} handles this case like BSD @code{tr}. When
-the @samp{--truncate-set1} (@samp{-t}) option is given, @sc{gnu} @code{tr}
-handles this case like the System V @code{tr} instead. This option is
-ignored for operations other than translation.
-
-Acting like System V @code{tr} in this case breaks the relatively common
-BSD idiom:
-
-@example
-tr -cs A-Za-z0-9 '\012'
-@end example
-
-@noindent
-because it converts only zero bytes (the first element in the
-complement of @var{set1}), rather than all non-alphanumerics, to
-newlines.
-
-@noindent
-By the way, the above idiom is not portable because it uses ranges.
-Assuming a @sc{posix} compliant @code{tr}, here is a better way to write it:
-
-@example
-tr -cs '[:alnum:]' '[\n*]'
-@end example
-
-
-@node Squeezing
-@subsection Squeezing repeats and deleting
-
-@cindex squeezing repeat characters
-@cindex deleting characters
-
-When given just the @samp{--delete} (@samp{-d}) option, @code{tr}
-removes any input characters that are in @var{set1}.
-
-When given just the @samp{--squeeze-repeats} (@samp{-s}) option,
-@code{tr} replaces each input sequence of a repeated character that
-is in @var{set1} with a single occurrence of that character.
-
-When given both @samp{--delete} and @samp{--squeeze-repeats}, @code{tr}
-first performs any deletions using @var{set1}, then squeezes repeats
-from any remaining characters using @var{set2}.
-
-The @samp{--squeeze-repeats} option may also be used when translating,
-in which case @code{tr} first performs translation, then squeezes
-repeats from any remaining characters using @var{set2}.
-
-Here are some examples to illustrate various combinations of options:
-
-@itemize @bullet
-
-@item
-Remove all zero bytes:
-
-@example
-tr -d '\000'
-@end example
-
-@item
-Put all words on lines by themselves. This converts all
-non-alphanumeric characters to newlines, then squeezes each string
-of repeated newlines into a single newline:
-
-@example
-tr -cs '[:alnum:]' '[\n*]'
-@end example
-
-@item
-Convert each sequence of repeated newlines to a single newline:
-
-@example
-tr -s '\n'
-@end example
-
-@item
-Find doubled occurrences of words in a document.
-For example, people often write ``the the'' with the duplicated words
-separated by a newline. The bourne shell script below works first
-by converting each sequence of punctuation and blank characters to a
-single newline. That puts each ``word'' on a line by itself.
-Next it maps all uppercase characters to lower case, and finally it
-runs @code{uniq} with the @samp{-d} option to print out only the words
-that were adjacent duplicates.
-
-@example
-#!/bin/sh
-cat "$@@" \
- | tr -s '[:punct:][:blank:]' '\n' \
- | tr '[:upper:]' '[:lower:]' \
- | uniq -d
-@end example
-
-@item
-Deleting a small set of characters is usually straightforward. For example,
-to remove all @samp{a}s, @samp{x}s, and @samp{M}s you would do this:
-
-@example
-tr -d axM
-@end example
-
-However, when @samp{-} is one of those characters, it can be tricky because
-@samp{-} has special meanings. Performing the same task as above but also
-removing all @samp{-} characters, we might try @code{tr -d -axM}, but
-that would fail because @code{tr} would try to interpret @samp{-a} as
-a command-line option. Alternatively, we could try putting the hyphen
-inside the string, @code{tr -d a-xM}, but that wouldn't work either because
-it would make @code{tr} interpret @code{a-x} as the range of characters
-@samp{a}@dots{}@samp{x} rather than the three.
-One way to solve the problem is to put the hyphen at the end of the list
-of characters:
-
-@example
-tr -d axM-
-@end example
-
-More generally, use the character class notation @code{[=c=]}
-with @samp{-} (or any other character) in place of the @samp{c}:
-
-@example
-tr -d '[=-=]axM'
-@end example
-
-Note how single quotes are used in the above example to protect the
-square brackets from interpretation by a shell.
-
-@end itemize
-
-
-@node Warnings in tr
-@subsection Warning messages
-
-@vindex POSIXLY_CORRECT
-Setting the environment variable @env{POSIXLY_CORRECT} turns off the
-following warning and error messages, for strict compliance with
-@sc{posix.2}. Otherwise, the following diagnostics are issued:
-
-@enumerate
-
-@item
-When the @samp{--delete} option is given but @samp{--squeeze-repeats}
-is not, and @var{set2} is given, @sc{gnu} @code{tr} by default prints
-a usage message and exits, because @var{set2} would not be used.
-The @sc{posix} specification says that @var{set2} must be ignored in
-this case. Silently ignoring arguments is a bad idea.
-
-@item
-When an ambiguous octal escape is given. For example, @samp{\400}
-is actually @samp{\40} followed by the digit @samp{0}, because the
-value 400 octal does not fit into a single byte.
-
-@end enumerate
-
-@sc{gnu} @code{tr} does not provide complete BSD or System V compatibility.
-For example, it is impossible to disable interpretation of the @sc{posix}
-constructs @samp{[:alpha:]}, @samp{[=c=]}, and @samp{[c*10]}. Also, @sc{gnu}
-@code{tr} does not delete zero bytes automatically, unlike traditional
-Unix versions, which provide no way to preserve zero bytes.
-
-
-@node expand invocation
-@section @code{expand}: Convert tabs to spaces
-
-@pindex expand
-@cindex tabs to spaces, converting
-@cindex converting tabs to spaces
-
-@code{expand} writes the contents of each given @var{file}, or standard
-input if none are given or for a @var{file} of @samp{-}, to standard
-output, with tab characters converted to the appropriate number of
-spaces. Synopsis:
-
-@example
-expand [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-By default, @code{expand} converts all tabs to spaces. It preserves
-backspace characters in the output; they decrement the column count for
-tab calculations. The default action is equivalent to @samp{-8} (set
-tabs every 8 columns).
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -@var{tab1}[,@var{tab2}]@dots{}
-@itemx -t @var{tab1}[,@var{tab2}]@dots{}
-@itemx --tabs=@var{tab1}[,@var{tab2}]@dots{}
-@opindex -@var{tab}
-@opindex -t
-@opindex --tabs
-@cindex tabstops, setting
-If only one tab stop is given, set the tabs @var{tab1} spaces apart
-(default is 8). Otherwise, set the tabs at columns @var{tab1},
-@var{tab2}, @dots{} (numbered from 0), and replace any tabs beyond the
-last tabstop given with single spaces. If the tabstops are specified
-with the @samp{-t} or @samp{--tabs} option, they can be separated by
-blanks as well as by commas.
-
-@item -i
-@itemx --initial
-@opindex -i
-@opindex --initial
-@cindex initial tabs, converting
-Only convert initial tabs (those that precede all non-space or non-tab
-characters) on each line to spaces.
-
-@end table
-
-
-@node unexpand invocation
-@section @code{unexpand}: Convert spaces to tabs
-
-@pindex unexpand
-
-@code{unexpand} writes the contents of each given @var{file}, or
-standard input if none are given or for a @var{file} of @samp{-}, to
-standard output, with strings of two or more space or tab characters
-converted to as many tabs as possible followed by as many spaces as are
-needed. Synopsis:
-
-@example
-unexpand [@var{option}]@dots{} [@var{file}]@dots{}
-@end example
-
-By default, @code{unexpand} converts only initial spaces and tabs (those
-that precede all non space or tab characters) on each line. It
-preserves backspace characters in the output; they decrement the column
-count for tab calculations. By default, tabs are set at every 8th
-column.
-
-The program accepts the following options. Also see @ref{Common options}.
-
-@table @samp
-
-@item -@var{tab1}[,@var{tab2}]@dots{}
-@itemx -t @var{tab1}[,@var{tab2}]@dots{}
-@itemx --tabs=@var{tab1}[,@var{tab2}]@dots{}
-@opindex -@var{tab}
-@opindex -t
-@opindex --tabs
-If only one tab stop is given, set the tabs @var{tab1} spaces apart
-instead of the default 8. Otherwise, set the tabs at columns
-@var{tab1}, @var{tab2}, @dots{} (numbered from 0), and leave spaces and
-tabs beyond the tabstops given unchanged. If the tabstops are specified
-with the @samp{-t} or @samp{--tabs} option, they can be separated by
-blanks as well as by commas. This option implies the @samp{-a} option.
-
-@item -a
-@itemx --all
-@opindex -a
-@opindex --all
-Convert all strings of two or more spaces or tabs, not just initial
-ones, to tabs.
-
-@end table
-
-@c What's GNU?
-@c Arnold Robbins
-@node Opening the software toolbox
-@chapter Opening the software toolbox
-
-This chapter originally appeared in @cite{Linux Journal}, volume 1,
-number 2, in the @cite{What's GNU?} column. It was written by Arnold
-Robbins.
-
-@menu
-* Toolbox introduction:: Toolbox introduction
-* I/O redirection:: I/O redirection
-* The who command:: The @code{who} command
-* The cut command:: The @code{cut} command
-* The sort command:: The @code{sort} command
-* The uniq command:: The @code{uniq} command
-* Putting the tools together:: Putting the tools together
-@end menu
-
-
-@node Toolbox introduction
-@unnumberedsec Toolbox introduction
-
-This month's column is only peripherally related to the @sc{gnu} Project, in
-that it describes a number of the @sc{gnu} tools on your Linux system and how
-they might be used. What it's really about is the ``Software Tools'' philosophy
-of program development and usage.
-
-The software tools philosophy was an important and integral concept
-in the initial design and development of Unix (of which Linux and @sc{gnu} are
-essentially clones). Unfortunately, in the modern day press of
-Internetworking and flashy GUIs, it seems to have fallen by the
-wayside. This is a shame, since it provides a powerful mental model
-for solving many kinds of problems.
-
-Many people carry a Swiss Army knife around in their pants pockets (or
-purse). A Swiss Army knife is a handy tool to have: it has several knife
-blades, a screwdriver, tweezers, toothpick, nail file, corkscrew, and perhaps
-a number of other things on it. For the everyday, small miscellaneous jobs
-where you need a simple, general purpose tool, it's just the thing.
-
-On the other hand, an experienced carpenter doesn't build a house using
-a Swiss Army knife. Instead, he has a toolbox chock full of specialized
-tools---a saw, a hammer, a screwdriver, a plane, and so on. And he knows
-exactly when and where to use each tool; you won't catch him hammering nails
-with the handle of his screwdriver.
-
-The Unix developers at Bell Labs were all professional programmers and trained
-computer scientists. They had found that while a one-size-fits-all program
-might appeal to a user because there's only one program to use, in practice
-such programs are
-
-@enumerate a
-@item
-difficult to write,
-
-@item
-difficult to maintain and
-debug, and
-
-@item
-difficult to extend to meet new situations.
-@end enumerate
-
-Instead, they felt that programs should be specialized tools. In short, each
-program ``should do one thing well.'' No more and no less. Such programs are
-simpler to design, write, and get right---they only do one thing.
-
-Furthermore, they found that with the right machinery for hooking programs
-together, that the whole was greater than the sum of the parts. By combining
-several special purpose programs, you could accomplish a specific task
-that none of the programs was designed for, and accomplish it much more
-quickly and easily than if you had to write a special purpose program.
-We will see some (classic) examples of this further on in the column.
-(An important additional point was that, if necessary, take a detour
-and build any software tools you may need first, if you don't already
-have something appropriate in the toolbox.)
-
-@node I/O redirection
-@unnumberedsec I/O redirection
-
-Hopefully, you are familiar with the basics of I/O redirection in the
-shell, in particular the concepts of ``standard input,'' ``standard output,''
-and ``standard error''. Briefly, ``standard input'' is a data source, where
-data comes from. A program should not need to either know or care if the
-data source is a disk file, a keyboard, a magnetic tape, or even a punched
-card reader. Similarly, ``standard output'' is a data sink, where data goes
-to. The program should neither know nor care where this might be.
-Programs that only read their standard input, do something to the data,
-and then send it on, are called ``filters'', by analogy to filters in a
-water pipeline.
-
-With the Unix shell, it's very easy to set up data pipelines:
-
-@smallexample
-program_to_create_data | filter1 | .... | filterN > final.pretty.data
-@end smallexample
-
-We start out by creating the raw data; each filter applies some successive
-transformation to the data, until by the time it comes out of the pipeline,
-it is in the desired form.
-
-This is fine and good for standard input and standard output. Where does the
-standard error come in to play? Well, think about @code{filter1} in
-the pipeline above. What happens if it encounters an error in the data it
-sees? If it writes an error message to standard output, it will just
-disappear down the pipeline into @code{filter2}'s input, and the
-user will probably never see it. So programs need a place where they can send
-error messages so that the user will notice them. This is standard error,
-and it is usually connected to your console or window, even if you have
-redirected standard output of your program away from your screen.
-
-For filter programs to work together, the format of the data has to be
-agreed upon. The most straightforward and easiest format to use is simply
-lines of text. Unix data files are generally just streams of bytes, with
-lines delimited by the @sc{ascii} @sc{lf} (Line Feed) character,
-conventionally called a ``newline'' in the Unix literature. (This is
-@code{'\n'} if you're a C programmer.) This is the format used by all
-the traditional filtering programs. (Many earlier operating systems
-had elaborate facilities and special purpose programs for managing
-binary data. Unix has always shied away from such things, under the
-philosophy that it's easiest to simply be able to view and edit your
-data with a text editor.)
-
-OK, enough introduction. Let's take a look at some of the tools, and then
-we'll see how to hook them together in interesting ways. In the following
-discussion, we will only present those command line options that interest
-us. As you should always do, double check your system documentation
-for the full story.
-
-@node The who command
-@unnumberedsec The @code{who} command
-
-The first program is the @code{who} command. By itself, it generates a
-list of the users who are currently logged in. Although I'm writing
-this on a single-user system, we'll pretend that several people are
-logged in:
-
-@example
-$ who
-arnold console Jan 22 19:57
-miriam ttyp0 Jan 23 14:19(:0.0)
-bill ttyp1 Jan 21 09:32(:0.0)
-arnold ttyp2 Jan 23 20:48(:0.0)
-@end example
-
-Here, the @samp{$} is the usual shell prompt, at which I typed @code{who}.
-There are three people logged in, and I am logged in twice. On traditional
-Unix systems, user names are never more than eight characters long. This
-little bit of trivia will be useful later. The output of @code{who} is nice,
-but the data is not all that exciting.
-
-@node The cut command
-@unnumberedsec The @code{cut} command
-
-The next program we'll look at is the @code{cut} command. This program
-cuts out columns or fields of input data. For example, we can tell it
-to print just the login name and full name from the @file{/etc/passwd
-file}. The @file{/etc/passwd} file has seven fields, separated by
-colons:
-
-@example
-arnold:xyzzy:2076:10:Arnold D. Robbins:/home/arnold:/bin/ksh
-@end example
-
-To get the first and fifth fields, we would use cut like this:
-
-@example
-$ cut -d: -f1,5 /etc/passwd
-root:Operator
-@dots{}
-arnold:Arnold D. Robbins
-miriam:Miriam A. Robbins
-@dots{}
-@end example
-
-With the @samp{-c} option, @code{cut} will cut out specific characters
-(i.e., columns) in the input lines. This command looks like it might be
-useful for data filtering.
-
-
-@node The sort command
-@unnumberedsec The @code{sort} command
-
-Next we'll look at the @code{sort} command. This is one of the most
-powerful commands on a Unix-style system; one that you will often find
-yourself using when setting up fancy data plumbing. The @code{sort}
-command reads and sorts each file named on the command line. It then
-merges the sorted data and writes it to standard output. It will read
-standard input if no files are given on the command line (thus
-making it into a filter). The sort is based on the character collating
-sequence or based on user-supplied ordering criteria.
-
-
-@node The uniq command
-@unnumberedsec The @code{uniq} command
-
-Finally (at least for now), we'll look at the @code{uniq} program. When
-sorting data, you will often end up with duplicate lines, lines that
-are identical. Usually, all you need is one instance of each line.
-This is where @code{uniq} comes in. The @code{uniq} program reads its
-standard input, which it expects to be sorted. It only prints out one
-copy of each duplicated line. It does have several options. Later on,
-we'll use the @samp{-c} option, which prints each unique line, preceded
-by a count of the number of times that line occurred in the input.
-
-
-@node Putting the tools together
-@unnumberedsec Putting the tools together
-
-Now, let's suppose this is a large BBS system with dozens of users
-logged in. The management wants the SysOp to write a program that will
-generate a sorted list of logged in users. Furthermore, even if a user
-is logged in multiple times, his or her name should only show up in the
-output once.
-
-The SysOp could sit down with the system documentation and write a C
-program that did this. It would take perhaps a couple of hundred lines
-of code and about two hours to write it, test it, and debug it.
-However, knowing the software toolbox, the SysOp can instead start out
-by generating just a list of logged on users:
-
-@example
-$ who | cut -c1-8
-arnold
-miriam
-bill
-arnold
-@end example
-
-Next, sort the list:
-
-@example
-$ who | cut -c1-8 | sort
-arnold
-arnold
-bill
-miriam
-@end example
-
-Finally, run the sorted list through @code{uniq}, to weed out duplicates:
-
-@example
-$ who | cut -c1-8 | sort | uniq
-arnold
-bill
-miriam
-@end example
-
-The @code{sort} command actually has a @samp{-u} option that does what
-@code{uniq} does. However, @code{uniq} has other uses for which one
-cannot substitute @samp{sort -u}.
-
-The SysOp puts this pipeline into a shell script, and makes it available for
-all the users on the system:
-
-@example
-# cat > /usr/local/bin/listusers
-who | cut -c1-8 | sort | uniq
-^D
-# chmod +x /usr/local/bin/listusers
-@end example
-
-There are four major points to note here. First, with just four
-programs, on one command line, the SysOp was able to save about two
-hours worth of work. Furthermore, the shell pipeline is just about as
-efficient as the C program would be, and it is much more efficient in
-terms of programmer time. People time is much more expensive than
-computer time, and in our modern ``there's never enough time to do
-everything'' society, saving two hours of programmer time is no mean
-feat.
-
-Second, it is also important to emphasize that with the
-@emph{combination} of the tools, it is possible to do a special
-purpose job never imagined by the authors of the individual programs.
-
-Third, it is also valuable to build up your pipeline in stages, as we did here.
-This allows you to view the data at each stage in the pipeline, which helps
-you acquire the confidence that you are indeed using these tools correctly.
-
-Finally, by bundling the pipeline in a shell script, other users can use
-your command, without having to remember the fancy plumbing you set up for
-them. In terms of how you run them, shell scripts and compiled programs are
-indistinguishable.
-
-After the previous warm-up exercise, we'll look at two additional, more
-complicated pipelines. For them, we need to introduce two more tools.
-
-The first is the @code{tr} command, which stands for ``transliterate.''
-The @code{tr} command works on a character-by-character basis, changing
-characters. Normally it is used for things like mapping upper case to
-lower case:
-
-@example
-$ echo ThIs ExAmPlE HaS MIXED case! | tr '[:upper:]' '[:lower:]'
-this example has mixed case!
-@end example
-
-There are several options of interest:
-
-@table @samp
-@item -c
-work on the complement of the listed characters, i.e.,
-operations apply to characters not in the given set
-
-@item -d
-delete characters in the first set from the output
-
-@item -s
-squeeze repeated characters in the output into just one character.
-@end table
-
-We will be using all three options in a moment.
-
-The other command we'll look at is @code{comm}. The @code{comm}
-command takes two sorted input files as input data, and prints out the
-files' lines in three columns. The output columns are the data lines
-unique to the first file, the data lines unique to the second file, and
-the data lines that are common to both. The @samp{-1}, @samp{-2}, and
-@samp{-3} command line options omit the respective columns. (This is
-non-intuitive and takes a little getting used to.) For example:
-
-@example
-$ cat f1
-11111
-22222
-33333
-44444
-$ cat f2
-00000
-22222
-33333
-55555
-$ comm f1 f2
- 00000
-11111
- 22222
- 33333
-44444
- 55555
-@end example
-
-The single dash as a filename tells @code{comm} to read standard input
-instead of a regular file.
-
-Now we're ready to build a fancy pipeline. The first application is a word
-frequency counter. This helps an author determine if he or she is over-using
-certain words.
-
-The first step is to change the case of all the letters in our input file
-to one case. ``The'' and ``the'' are the same word when doing counting.
-
-@example
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | ...
-@end example
-
-The next step is to get rid of punctuation. Quoted words and unquoted words
-should be treated identically; it's easiest to just get the punctuation out of
-the way.
-
-@smallexample
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' | ...
-@end smallexample
-
-The second @code{tr} command operates on the complement of the listed
-characters, which are all the letters, the digits, the underscore, and
-the blank. The @samp{\012} represents the newline character; it has to
-be left alone. (The @sc{ascii} tab character should also be included for
-good measure in a production script.)
-
-At this point, we have data consisting of words separated by blank space.
-The words only contain alphanumeric characters (and the underscore). The
-next step is break the data apart so that we have one word per line. This
-makes the counting operation much easier, as we will see shortly.
-
-@smallexample
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
-> tr -s ' ' '\012' | ...
-@end smallexample
-
-This command turns blanks into newlines. The @samp{-s} option squeezes
-multiple newline characters in the output into just one. This helps us
-avoid blank lines. (The @samp{>} is the shell's ``secondary prompt.''
-This is what the shell prints when it notices you haven't finished
-typing in all of a command.)
-
-We now have data consisting of one word per line, no punctuation, all one
-case. We're ready to count each word:
-
-@smallexample
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
-> tr -s ' ' '\012' | sort | uniq -c | ...
-@end smallexample
-
-At this point, the data might look something like this:
-
-@example
- 60 a
- 2 able
- 6 about
- 1 above
- 2 accomplish
- 1 acquire
- 1 actually
- 2 additional
-@end example
-
-The output is sorted by word, not by count! What we want is the most
-frequently used words first. Fortunately, this is easy to accomplish,
-with the help of two more @code{sort} options:
-
-@table @samp
-@item -n
-do a numeric sort, not a textual one
-
-@item -r
-reverse the order of the sort
-@end table
-
-The final pipeline looks like this:
-
-@smallexample
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
-> tr -s ' ' '\012' | sort | uniq -c | sort -nr
- 156 the
- 60 a
- 58 to
- 51 of
- 51 and
- ...
-@end smallexample
-
-Whew! That's a lot to digest. Yet, the same principles apply. With six
-commands, on two lines (really one long one split for convenience), we've
-created a program that does something interesting and useful, in much
-less time than we could have written a C program to do the same thing.
-
-A minor modification to the above pipeline can give us a simple spelling
-checker! To determine if you've spelled a word correctly, all you have to
-do is look it up in a dictionary. If it is not there, then chances are
-that your spelling is incorrect. So, we need a dictionary. If you
-have the Slackware Linux distribution, you have the file
-@file{/usr/lib/ispell/ispell.words}, which is a sorted, 38,400 word
-dictionary.
-
-Now, how to compare our file with the dictionary? As before, we generate
-a sorted list of words, one per line:
-
-@smallexample
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
-> tr -s ' ' '\012' | sort -u | ...
-@end smallexample
-
-Now, all we need is a list of words that are @emph{not} in the
-dictionary. Here is where the @code{comm} command comes in.
-
-@smallexample
-$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
-> tr -s ' ' '\012' | sort -u |
-> comm -23 - /usr/lib/ispell/ispell.words
-@end smallexample
-
-The @samp{-2} and @samp{-3} options eliminate lines that are only in the
-dictionary (the second file), and lines that are in both files. Lines
-only in the first file (standard input, our stream of words), are
-words that are not in the dictionary. These are likely candidates for
-spelling errors. This pipeline was the first cut at a production
-spelling checker on Unix.
-
-There are some other tools that deserve brief mention.
-
-@table @code
-@item grep
-search files for text that matches a regular expression
-
-@item egrep
-like @code{grep}, but with more powerful regular expressions
-
-@item wc
-count lines, words, characters
-
-@item tee
-a T-fitting for data pipes, copies data to files and to standard output
-
-@item sed
-the stream editor, an advanced tool
-
-@item awk
-a data manipulation language, another advanced tool
-@end table
-
-The software tools philosophy also espoused the following bit of
-advice: ``Let someone else do the hard part.'' This means, take
-something that gives you most of what you need, and then massage it the
-rest of the way until it's in the form that you want.
-
-To summarize:
-
-@enumerate 1
-@item
-Each program should do one thing well. No more, no less.
-
-@item
-Combining programs with appropriate plumbing leads to results where
-the whole is greater than the sum of the parts. It also leads to novel
-uses of programs that the authors might never have imagined.
-
-@item
-Programs should never print extraneous header or trailer data, since these
-could get sent on down a pipeline. (A point we didn't mention earlier.)
-
-@item
-Let someone else do the hard part.
-
-@item
-Know your toolbox! Use each program appropriately. If you don't have an
-appropriate tool, build one.
-@end enumerate
-
-As of this writing, all the programs we've discussed are available via
-anonymous @code{ftp} from @code{prep.ai.mit.edu} as
-@file{/pub/gnu/textutils-1.9.tar.gz}.@footnote{Version 1.9 was current
-when this column was written. Check the nearest @sc{gnu} archive for the
-current version. The main @sc{gnu} FTP site is now @code{ftp.gnu.org}.}
-
-None of what I have presented in this column is new. The Software Tools
-philosophy was first introduced in the book @cite{Software Tools},
-by Brian Kernighan and P.J. Plauger (Addison-Wesley, ISBN
-0-201-03669-X). This book showed how to write and use software
-tools. It was written in 1976, using a preprocessor for FORTRAN named
-@code{ratfor} (RATional FORtran). At the time, C was not as ubiquitous
-as it is now; FORTRAN was. The last chapter presented a @code{ratfor}
-to FORTRAN processor, written in @code{ratfor}. @code{ratfor} looks an
-awful lot like C; if you know C, you won't have any problem following
-the code.
-
-In 1981, the book was updated and made available as @cite{Software
-Tools in Pascal} (Addison-Wesley, ISBN 0-201-10342-7). Both books
-remain in print, and are well worth reading if you're a programmer.
-They certainly made a major change in how I view programming.
-
-Initially, the programs in both books were available (on 9-track tape)
-from Addison-Wesley. Unfortunately, this is no longer the case,
-although you might be able to find copies floating around the Internet.
-For a number of years, there was an active Software Tools Users Group,
-whose members had ported the original @code{ratfor} programs to essentially
-every computer system with a FORTRAN compiler. The popularity of the
-group waned in the middle '80s as Unix began to spread beyond universities.
-
-With the current proliferation of @sc{gnu} code and other clones of Unix
-programs, these programs now receive little attention; modern C versions are
-much more efficient and do more than these programs do. Nevertheless, as
-exposition of good programming style, and evangelism for a still-valuable
-philosophy, these books are unparalleled, and I recommend them highly.
-
-Acknowledgment: I would like to express my gratitude to Brian Kernighan
-of Bell Labs, the original Software Toolsmith, for reviewing this column.
-
-
-@node Index
-@unnumbered Index
-
-@printindex cp
-
-@contents
-@bye
-
-@c Local variables:
-@c texinfo-column-for-description: 32
-@c End: