From 0d9807440368af42a310f7db555773022c2faa0a Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 8 Aug 2006 22:11:49 +0000 Subject: (shuf invocation, Random sources): New sections. (Operating on sorted files): Add shuf. (sort invocation, shred invocation): New option --random-source. (sort invocation): Fix typo: -R -> -r. --- doc/coreutils.texi | 217 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 212 insertions(+), 5 deletions(-) (limited to 'doc') diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 8d6553fe2..5a497614f 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -97,6 +97,7 @@ * sha1sum: (coreutils)sha1sum invocation. Print or check SHA-1 digests. * sha2: (coreutils)sha2 utilities. Print or check SHA-2 digests. * shred: (coreutils)shred invocation. Remove files more securely. +* shuf: (coreutils)shuf invocation. Shuffling text files. * sleep: (coreutils)sleep invocation. Delay for a specified time. * sort: (coreutils)sort invocation. Sort text files. * split: (coreutils)split invocation. Split into fixed-size pieces. @@ -174,7 +175,7 @@ Free Documentation License''. * Formatting file contents:: fmt pr fold * Output of parts of files:: head tail split csplit * Summarizing files:: wc sum cksum md5sum sha1sum sha2 -* Operating on sorted files:: sort uniq comm ptx tsort +* Operating on sorted files:: sort shuf uniq comm ptx tsort * Operating on fields within a line:: cut paste join * Operating on characters:: tr expand unexpand * Directory listing:: ls dir vdir dircolors @@ -207,6 +208,7 @@ Common Options * Exit status:: Indicating program success or failure. * Backup options:: Backup options * Block size:: Block size +* Random sources:: Sources of random data * Target directory:: Target directory * Trailing slashes:: Trailing slashes * Traversing symlinks:: Traversing symlinks to directories @@ -246,6 +248,7 @@ Summarizing files Operating on sorted files * sort invocation:: Sort text files. +* shuf invocation:: Shuffle text files. * uniq invocation:: Uniquify files. * comm invocation:: Compare two sorted files line by line. * ptx invocation:: Produce a permuted index of file contents. @@ -641,6 +644,7 @@ name. * Exit status:: Indicating program success or failure. * Backup options:: -b -S, in some programs. * Block size:: BLOCK_SIZE and --block-size, in some programs. +* Random sources:: --random-source, in some programs. * Target directory:: Specifying a target directory, in some programs. * Trailing slashes:: --strip-trailing-slashes, in some programs. * Traversing symlinks:: -H, -L, or -P, in some programs. @@ -920,6 +924,44 @@ set. The @option{-h} or @option{--human-readable} option is equivalent to @option{--block-size=human-readable}. The @option{--si} option is equivalent to @option{--block-size=si}. +@node Random sources +@section Sources of random data + +@cindex random sources + +The @command{shuf}, @command{shred}, and @command{sort} commands +sometimes need random data to do their work. For example, @samp{sort +-R} must choose a hash function at random, and it needs random data to +make this selection. + +Normally these commands use the device file @file{/dev/urandom} as the +source of random data. Typically, this device gathers environmental +noise from device drivers and other sources into an entropy pool, and +uses the pool to generate random bits. If the pool is short of data, +the device reuses the internal pool to produce more bits, using a +cryptographically secure pseudorandom number generator. + +@file{/dev/urandom} suffices for most practical uses, but applications +requiring high-value or long-term protection of private data may +require an alternate data source like @file{/dev/random} or +@file{/dev/arandom}. The set of available sources depends on your +operating system. + +To use such a source, specify the @option{--random-source=@var{file}} +option, e.g., @samp{shuf --random-source=/dev/random}. The contents +of @var{file} should be as random as possible. An error is reported +if @var{file} does not contain enough bytes to randomize the input +adequately. + +To reproduce the results of an earlier invocation of a command, you +can save some random data into a file and then use that file as the +random source in earlier and later invocations of the command. + +Some old-fashioned or stripped-down operating systems lack support for +@command{/dev/urandom}. On these systems commands like @command{shuf} +by default fall back on an internal pseudorandom generator initialized +by a small amount of entropy. + @node Target directory @section Target directory @@ -3262,6 +3304,7 @@ These commands work with (or produce) sorted files. @menu * sort invocation:: Sort text files. +* shuf invocation:: Shuffle text files. * uniq invocation:: Uniquify files. * comm invocation:: Compare two sorted files line by line. * ptx invocation:: Produce a permuted index of file contents. @@ -3509,9 +3552,19 @@ appear earlier in the output instead of later. @opindex -R @opindex --random-sort @cindex random sort -Sort by hashing the input keys and then sorting the hash values. This -is much like a random shuffle of the inputs, except that keys with the -same value sort together. The hash function is chosen at random. +Sort by hashing the input keys and then sorting the hash values. +Choose the hash function at random, ensuring that it is free of +collisions so that differing keys have differing hash values. This is +like a random permutation of the inputs (@pxref{shuf invocation}), +except that keys with the same value sort together. + +If multiple random sort fields are specified, the same random hash +function is used for all fields. To use different random hash +functions for different fields, you can invoke @command{sort} more +than once. + +The choice of hash function is affected by the +@option{--random-source} option. @end table @@ -3550,6 +3603,13 @@ On newer systems, @option{-o} cannot appear after an input file if scripts should specify @option{-o @var{output-file}} before any input files. +@item --random-source=@var{file} +@opindex --random-source +@cindex random source for sorting +Use @var{file} as a source of random data used to determine which +random hash function to use with the @option{-R} option. @xref{Random +sources}. + @item -s @itemx --stable @opindex -s @@ -3559,7 +3619,7 @@ files. Make @command{sort} stable by disabling its last-resort comparison. This option has no effect if no fields or global ordering options -other than @option{--reverse} (@option{-R}) are specified. +other than @option{--reverse} (@option{-r}) are specified. @item -S @var{size} @itemx --buffer-size=@var{size} @@ -3835,6 +3895,147 @@ ls */* | sort -t / -k 1,1R -k 2,2 @end itemize +@node shuf invocation +@section @command{shuf}: Shuffling text + +@pindex shuf +@cindex shuffling files + +@command{shuf} shuffles its input by outputting a random permutation +of its input lines. Each output permutation is equally likely. +Synopses: + +@example +shuf [@var{option}]@dots{} [@var{file}] +shuf -e [@var{option}]@dots{} [@var{arg}]@dots{} +shuf -i @var{lo}-@var{hi} [@var{option}]@dots{} +@end example + +@command{shuf} has three modes of operation that affect where it +obtains its input lines. By default, it reads lines from standard +input. The following options change the operation mode: + +@table @samp + +@item -e +@itemx --echo +@opindex -c +@opindex --echo +@cindex command-line operands to shuffle +Treat each command-line operand as an input line. + +@item -i @var{lo}-@var{hi} +@itemx --input-range=@var{lo}-@var{hi} +@opindex -i +@opindex --input-range +@cindex input range to shuffle +Act as if input came from a file containing the range of unsigned +decimal integers @var{lo}@dots{}@var{hi}, one per line. + +@end table + +@command{shuf}'s other options can affect its behavior in all +operation modes: + +@table @samp + +@item -n @var{lines} +@itemx --head-lines=@var{lines} +@opindex -n +@opindex --head-lines +@cindex head of output +Output at most @var{lines} lines. By default, all input lines are +output. + +@item -o @var{output-file} +@itemx --output=@var{output-file} +@opindex -o +@opindex --output +@cindex overwriting of input, allowed +Write output to @var{output-file} instead of standard output. +@command{shuf} reads all input before opening +@var{output-file}, so you can safely shuffle a file in place by using +commands like @code{shuf -o F