diff options
Diffstat (limited to 'doc')
-rw-r--r-- | doc/textutils.texi | 130 |
1 files changed, 66 insertions, 64 deletions
diff --git a/doc/textutils.texi b/doc/textutils.texi index 2deeef748..968c0312e 100644 --- a/doc/textutils.texi +++ b/doc/textutils.texi @@ -1,7 +1,7 @@ \input texinfo @c %**start of header @setfilename textutils.info -@settitle GNU text utilities +@settitle @sc{gnu} text utilities @c %**end of header @include version.texi @@ -20,7 +20,7 @@ @ifinfo @format START-INFO-DIR-ENTRY -* Text utilities: (textutils). GNU text utilities. +* Text utilities: (textutils). GNU text utilities. * cat: (textutils)cat invocation. Concatenate and write files. * cksum: (textutils)cksum invocation. Print @sc{posix} CRC checksum. * comm: (textutils)comm invocation. Compare sorted files by line. @@ -79,7 +79,7 @@ by the Foundation. @end ifinfo @titlepage -@title GNU @code{textutils} +@title @sc{gnu} @code{textutils} @subtitle A set of text utilities @subtitle for version @value{VERSION}, @value{UPDATED} @author David MacKenzie et al. @@ -114,7 +114,7 @@ by the Foundation. @cindex text utilities @cindex utilities for text handling -This manual documents version @value{VERSION} of the GNU text utilities. +This manual documents version @value{VERSION} of the @sc{gnu} text utilities. @menu * Introduction:: Caveats, overview, and authors. @@ -217,11 +217,12 @@ Opening the software toolbox This manual is incomplete: No attempt is made to explain basic concepts in a way suitable for novices. Thus, if you are interested, please get -involved in improving this manual. The entire GNU community will +involved in improving this manual. The entire @sc{gnu} community will benefit. @cindex POSIX.2 -The GNU text utilities are mostly compatible with the @sc{posix.2} standard. +The @sc{gnu} text utilities are mostly compatible with the @sc{posix.2} +standard. @c This paragraph appears in all of fileutils.texi, textutils.texi, and @c sh-utils.texi too -- so be sure to keep them consistent. @@ -251,7 +252,7 @@ overall process. Certain options are available in all of these programs. Rather than writing identical descriptions for each of the programs, they are -described here. (In fact, every GNU program accepts (or should accept) +described here. (In fact, every @sc{gnu} program accepts (or should accept) these options.) Some of these programs recognize the @samp{--help} and @samp{--version} @@ -763,7 +764,8 @@ is not given at all, the default is 16. @end table The next several options map the old, pre-@sc{posix} format specification -options to the corresponding @sc{posix} format specs. GNU @code{od} accepts +options to the corresponding @sc{posix} format specs. +@sc{gnu} @code{od} accepts any combination of old- and new-style options. Format specification options accumulate. @@ -1445,13 +1447,13 @@ one-line header consisting of before the output for each @var{file}. @cindex BSD @code{tail} -GNU @code{tail} can output any amount of data (some other versions of +@sc{gnu} @code{tail} can output any amount of data (some other versions of @code{tail} cannot). It also has no @samp{-r} option (print in reverse), since reversing a file is really a different job from printing the end of a file; BSD @code{tail} (which is the one with @code{-r}) can only reverse files that are at most as large as its buffer, which is typically 32k. A more reliable and versatile way to reverse files is -the GNU @code{tac} command. +the @sc{gnu} @code{tac} command. @code{tail} accepts two option formats: the new one, in which numbers are arguments to the options (@samp{-n 1}), and the old one, in which @@ -1901,7 +1903,7 @@ is given, file names are also printed (by default). (With the @samp{--sysv} option, corresponding file names are printed when there is at least one file argument.) -By default, GNU @code{sum} computes checksums using an algorithm +By default, @sc{gnu} @code{sum} computes checksums using an algorithm compatible with BSD @code{sum} and prints file sizes in units of 1024-byte blocks. @@ -2133,9 +2135,9 @@ disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order. If no fields or global options are specified, @samp{-s} has no effect. -GNU @code{sort} (as specified for all GNU utilities) has no limits on +@sc{gnu} @code{sort} (as specified for all @sc{gnu} utilities) has no limits on input line length or restrictions on bytes allowed within lines. In -addition, if the final byte of an input file is not a newline, GNU +addition, if the final byte of an input file is not a newline, @sc{gnu} @code{sort} silently supplies one. A line's trailing newline is not part of the line for comparison purposes.@footnote{@sc{posix}.2-1992 requires that the trailing newline be part of the comparison, and some @@ -2333,13 +2335,13 @@ and character positions are numbered starting with 0. See below. @end table -In addition, when GNU @code{sort} is invoked with exactly one argument, +In addition, when @sc{gnu} @code{sort} is invoked with exactly one argument, options @samp{--help} and @samp{--version} are recognized. @xref{Common options}. Historical (BSD and System V) implementations of @code{sort} have differed in their interpretation of some options, particularly -@samp{-b}, @samp{-f}, and @samp{-n}. GNU sort follows the @sc{posix} +@samp{-b}, @samp{-f}, and @samp{-n}. @sc{gnu} sort follows the @sc{posix} behavior, which is usually (but not always!) like the System V behavior. According to @sc{posix}, @samp{-n} no longer implies @samp{-b}. For consistency, @samp{-M} has been changed in the same way. This may @@ -2538,7 +2540,7 @@ Print only duplicate lines. Print all duplicate lines and only duplicate lines. This option is useful mainly in conjunction with other options e.g., to ignore case or to compare only selected fields. -This is a GNU extension. +This is a @sc{gnu} extension. @c FIXME: give an example showing *how* it's useful @item -u @@ -2667,15 +2669,15 @@ ptx -G [@var{option} @dots{}] [@var{input} [@var{output}]] @end example The @samp{-G} (or its equivalent: @samp{--traditional}) option disables -all GNU extensions and reverts to traditional mode, thus introducing some +all @sc{gnu} extensions and reverts to traditional mode, thus introducing some limitations and changing several of the program's default option values. -When @samp{-G} is not specified, GNU extensions are always enabled. GNU -extensions to @code{ptx} are documented wherever appropriate in this +When @samp{-G} is not specified, @sc{gnu} extensions are always enabled. +@sc{gnu} extensions to @code{ptx} are documented wherever appropriate in this document. For the full list, see @xref{Compatibility in ptx}. Individual options are explained in the following sections. -When GNU extensions are enabled, there may be zero, one or several +When @sc{gnu} extensions are enabled, there may be zero, one or several @var{file}s after the options. If there is no @var{file}, the program reads the standard input. If there is one or several @var{file}s, they give the name of input files which are all read in turn, as if all the @@ -2685,7 +2687,7 @@ file names and line numbers refer to individual text input files. In all cases, the program outputs the permuted index to the standard output. -When GNU extensions are @emph{not} enabled, that is, when the program +When @sc{gnu} extensions are @emph{not} enabled, that is, when the program operates in traditional mode, there may be zero, one or two parameters besides the options. If there are no parameters, the program reads the standard input and outputs the permuted index to the standard output. @@ -2695,7 +2697,7 @@ respectively the name of the @var{input} file to read and the name of the @var{output} file to produce. @emph{Be very careful} to note that, in this case, the contents of file given by the second parameter is destroyed. This behavior is dictated by System V @code{ptx} -compatibility; GNU Standards normally discourage output parameters not +compatibility; @sc{gnu} Standards normally discourage output parameters not introduced by an option. Note that for @emph{any} file named as the value of an option or as an @@ -2724,7 +2726,7 @@ exit without further processing. @item -G @itemx --traditional -As already explained, this option disables all GNU extensions to +As already explained, this option disables all @sc{gnu} extensions to @code{ptx} and switches to traditional mode. @item --help @@ -2745,7 +2747,7 @@ processing. As it is set up now, the program assumes that the input file is coded using 8-bit ISO 8859-1 code, also known as Latin-1 character set, @emph{unless} it is compiled for MS-DOS, in which case it uses the -character set of the IBM-PC. (GNU @code{ptx} is not known to work on +character set of the IBM-PC. (@sc{gnu} @code{ptx} is not known to work on smaller MS-DOS machines anymore.) Compared to 7-bit @sc{ascii}, the set of characters which are letters is different; this alters the behavior of regular expression matching. Thus, the default regular expression @@ -2778,9 +2780,9 @@ is not part of the Break file is a word constituent. If both options @samp{-b} and @samp{-W} are specified, then @samp{-W} has precedence and @samp{-b} is ignored. -When GNU extensions are enabled, the only way to avoid newline as a +When @sc{gnu} extensions are enabled, the only way to avoid newline as a break character is to write all the break characters in the file with no -newline at all, not even at the end of the file. When GNU extensions +newline at all, not even at the end of the file. When @sc{gnu} extensions are disabled, spaces, tabs and newlines are always considered as break characters even if not included in the Break file. @@ -2823,7 +2825,7 @@ Using this option changes the default value for option @samp{-S}. Using this option, the program does not try very hard to remove references from contexts in output, but it succeeds in doing so @emph{when} the context ends exactly at the newline. If option -@samp{-r} is used with @samp{-S} default value, or when GNU extensions +@samp{-r} is used with @samp{-S} default value, or when @sc{gnu} extensions are disabled, this condition is always met and references are completely excluded from the output contexts. @@ -2834,15 +2836,15 @@ This option selects which regular expression will describe the end of a line or the end of a sentence. In fact, this regular expression is not the only distinction between end of lines or end of sentences, and input line boundaries have no special significance outside this option. By -default, when GNU extensions are enabled and if @samp{-r} option is not +default, when @sc{gnu} extensions are enabled and if @samp{-r} option is not used, end of sentences are used. In this case, this @var{regex} is -imported from GNU Emacs: +imported from @sc{gnu} Emacs: @example [.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]* @end example -Whenever GNU extensions are disabled or if @samp{-r} option is used, end +Whenever @sc{gnu} extensions are disabled or if @samp{-r} option is used, end of lines are used; in this case, the default @var{regexp} is just: @example @@ -2874,8 +2876,8 @@ corresponding characters by @code{ptx} itself. @itemx --word-regexp=@var{regexp} This option selects which regular expression will describe each keyword. -By default, if GNU extensions are enabled, a word is a sequence of -letters; the @var{regexp} used is @samp{\w+}. When GNU extensions are +By default, if @sc{gnu} extensions are enabled, a word is a sequence of +letters; the @var{regexp} used is @samp{\w+}. When @sc{gnu} extensions are disabled, a word is by default anything which ends with a space, a tab or a newline; the @var{regexp} used is @samp{[^ \t\n]+}. @@ -2895,14 +2897,14 @@ the corresponding characters by @code{ptx} itself. Output format is mainly controlled by the @samp{-O} and @samp{-T} options described in the table below. When neither @samp{-O} nor @samp{-T} are -selected, and if GNU extensions are enabled, the program chooses an +selected, and if @sc{gnu} extensions are enabled, the program chooses an output format suitable for a dumb terminal. Each keyword occurrence is output to the center of one line, surrounded by its left and right contexts. Each field is properly justified, so the concordance output can be readily observed. As a special feature, if automatic references are selected by option @samp{-A} and are output before the left context, that is, if option @samp{-R} is @emph{not} selected, then -a colon is added after the reference; this nicely interfaces with GNU +a colon is added after the reference; this nicely interfaces with @sc{gnu} Emacs @code{next-error} processing. In this default output format, each white space character, like newline and tab, is merely changed to exactly one space, with no special attempt to compress consecutive @@ -2955,7 +2957,7 @@ context. For any other output format, option @samp{-R} is ignored, with one exception: with @samp{-R} the width of references is @emph{not} taken into account in total output width given by @samp{-w}. -This option is automatically selected whenever GNU extensions are +This option is automatically selected whenever @sc{gnu} extensions are disabled. @item -F @var{string} @@ -2997,7 +2999,7 @@ processing. Each output line will look like: @end smallexample so it will be possible to write a @samp{.xx} roff macro to take care of -the output typesetting. This is the default output format when GNU +the output typesetting. This is the default output format when @sc{gnu} extensions are disabled. Option @samp{-M} can be used to change @samp{xx} to another macro name. @@ -3042,13 +3044,13 @@ processing for @TeX{}. @node Compatibility in ptx -@subsection The GNU extensions to @code{ptx} +@subsection The @sc{gnu} extensions to @code{ptx} This version of @code{ptx} contains a few features which do not exist in System V @code{ptx}. These extra features are suppressed by using the @samp{-G} command line option, unless overridden by other command line -options. Some GNU extensions cannot be recovered by overriding, so the -simple rule is to avoid @samp{-G} if you care about GNU extensions. +options. Some @sc{gnu} extensions cannot be recovered by overriding, so the +simple rule is to avoid @samp{-G} if you care about @sc{gnu} extensions. Here are the differences between this program and System V @code{ptx}. @itemize @bullet @@ -3061,8 +3063,8 @@ or, if a second @var{file} parameter is given on the command, to that @var{file}. Having output parameters not introduced by options is a dangerous -practice which GNU avoids as far as possible. So, for using @code{ptx} -portably between GNU and System V, you should always use it with a +practice which @sc{gnu} avoids as far as possible. So, for using @code{ptx} +portably between @sc{gnu} and System V, you should always use it with a single input file, and always expect the result on standard output. You might also want to automatically configure in a @samp{-G} option to @code{ptx} calls in products using @code{ptx}, if the configurator finds @@ -3071,9 +3073,9 @@ that the installed @code{ptx} accepts @samp{-G}. @item The only options available in System V @code{ptx} are options @samp{-b}, @samp{-f}, @samp{-g}, @samp{-i}, @samp{-o}, @samp{-r}, @samp{-t} and -@samp{-w}. All other options are GNU extensions and are not repeated in +@samp{-w}. All other options are @sc{gnu} extensions and are not repeated in this enumeration. Moreover, some options have a slightly different -meaning when GNU extensions are enabled, as explained below. +meaning when @sc{gnu} extensions are enabled, as explained below. @item By default, concordance output is not formatted for @code{troff} or @@ -3082,29 +3084,29 @@ or @code{nroff} output may still be selected through option @samp{-O}. @item Unless @samp{-R} option is used, the maximum reference width is -subtracted from the total output line width. With GNU extensions +subtracted from the total output line width. With @sc{gnu} extensions disabled, width of references is not taken into account in the output line width computations. @item All 256 characters, even @kbd{NUL}s, are always read and processed from -input file with no adverse effect, even if GNU extensions are disabled. +input file with no adverse effect, even if @sc{gnu} extensions are disabled. However, System V @code{ptx} does not accept 8-bit characters, a few control characters are rejected, and the tilde @kbd{~} is also rejected. @item -Input line length is only limited by available memory, even if GNU +Input line length is only limited by available memory, even if @sc{gnu} extensions are disabled. However, System V @code{ptx} processes only the first 200 characters in each line. @item The break (non-word) characters default to be every character except all -letters of the underlying character set, diacriticized or not. When GNU +letters of the underlying character set, diacriticized or not. When @sc{gnu} extensions are disabled, the break characters default to space, tab and newline only. @item -The program makes better use of output line width. If GNU extensions +The program makes better use of output line width. If @sc{gnu} extensions are disabled, the program rather tries to imitate System V @code{ptx}, but still, there are some slight disposition glitches this program does not completely reproduce. @@ -3339,7 +3341,7 @@ Print a line for each unpairable line in file @var{file-number} @end table -In addition, when GNU @code{join} is invoked with exactly one argument, +In addition, when @sc{gnu} @code{join} is invoked with exactly one argument, options @samp{--help} and @samp{--version} are recognized. @xref{Common options}. @@ -3447,7 +3449,7 @@ from @var{m} through @var{n}, in ascending order. @var{m} should collate before @var{n}; if it doesn't, an error results. As an example, @samp{0-9} is the same as @samp{0123456789}. -GNU @code{tr} does not support the System V syntax that uses square +@sc{gnu} @code{tr} does not support the System V syntax that uses square brackets to enclose ranges. Translations specified in that format sometimes work as expected, since the brackets are often transliterated to themselves. However, they should be avoided because they sometimes @@ -3535,7 +3537,7 @@ The syntax @samp{[=@var{c}=]} expands to all of the characters that are equivalent to @var{c}, in no particular order. Equivalence classes are a relatively recent invention intended to support non-English alphabets. But there seems to be no standard way to define them or determine their -contents. Therefore, they are not fully implemented in GNU @code{tr}; +contents. Therefore, they are not fully implemented in @sc{gnu} @code{tr}; each character's equivalence class consists only of that character, which is of no particular use. @@ -3583,8 +3585,8 @@ BSD @code{tr} pads @var{set2} to the length of @var{set1} by repeating the last character of @var{set2} as many times as necessary. System V @code{tr} truncates @var{set1} to the length of @var{set2}. -By default, GNU @code{tr} handles this case like BSD @code{tr}. When -the @samp{--truncate-set1} (@samp{-t}) option is given, GNU @code{tr} +By default, @sc{gnu} @code{tr} handles this case like BSD @code{tr}. When +the @samp{--truncate-set1} (@samp{-t}) option is given, @sc{gnu} @code{tr} handles this case like the System V @code{tr} instead. This option is ignored for operations other than translation. @@ -3723,7 +3725,7 @@ following warning and error messages, for strict compliance with @item When the @samp{--delete} option is given but @samp{--squeeze-repeats} -is not, and @var{set2} is given, GNU @code{tr} by default prints +is not, and @var{set2} is given, @sc{gnu} @code{tr} by default prints a usage message and exits, because @var{set2} would not be used. The @sc{posix} specification says that @var{set2} must be ignored in this case. Silently ignoring arguments is a bad idea. @@ -3735,9 +3737,9 @@ value 400 octal does not fit into a single byte. @end enumerate -GNU @code{tr} does not provide complete BSD or System V compatibility. +@sc{gnu} @code{tr} does not provide complete BSD or System V compatibility. For example, it is impossible to disable interpretation of the @sc{posix} -constructs @samp{[:alpha:]}, @samp{[=c=]}, and @samp{[c*10]}. Also, GNU +constructs @samp{[:alpha:]}, @samp{[=c=]}, and @samp{[c*10]}. Also, @sc{gnu} @code{tr} does not delete zero bytes automatically, unlike traditional Unix versions, which provide no way to preserve zero bytes. @@ -3862,13 +3864,13 @@ Robbins. @node Toolbox introduction @unnumberedsec Toolbox introduction -This month's column is only peripherally related to the GNU Project, in -that it describes a number of the GNU tools on your Linux system and how they -might be used. What it's really about is the ``Software Tools'' philosophy +This month's column is only peripherally related to the @sc{gnu} Project, in +that it describes a number of the @sc{gnu} tools on your Linux system and how +they might be used. What it's really about is the ``Software Tools'' philosophy of program development and usage. The software tools philosophy was an important and integral concept -in the initial design and development of Unix (of which Linux and GNU are +in the initial design and development of Unix (of which Linux and @sc{gnu} are essentially clones). Unfortunately, in the modern day press of Internetworking and flashy GUIs, it seems to have fallen by the wayside. This is a shame, since it provides a powerful mental model @@ -4361,8 +4363,8 @@ appropriate tool, build one. As of this writing, all the programs we've discussed are available via anonymous @code{ftp} from @code{prep.ai.mit.edu} as @file{/pub/gnu/textutils-1.9.tar.gz}.@footnote{Version 1.9 was current -when this column was written. Check the nearest GNU archive for the -current version. The main GNU FTP site is now @code{ftp.gnu.org}.} +when this column was written. Check the nearest @sc{gnu} archive for the +current version. The main @sc{gnu} FTP site is now @code{ftp.gnu.org}.} None of what I have presented in this column is new. The Software Tools philosophy was first introduced in the book @cite{Software Tools}, @@ -4388,8 +4390,8 @@ whose members had ported the original @code{ratfor} programs to essentially every computer system with a FORTRAN compiler. The popularity of the group waned in the middle '80s as Unix began to spread beyond universities. -With the current proliferation of GNU code and other clones of Unix programs, -these programs now receive little attention; modern C versions are +With the current proliferation of @sc{gnu} code and other clones of Unix +programs, these programs now receive little attention; modern C versions are much more efficient and do more than these programs do. Nevertheless, as exposition of good programming style, and evangelism for a still-valuable philosophy, these books are unparalleled, and I recommend them highly. |