summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorPaul Eggert <eggert@cs.ucla.edu>2006-08-08 22:11:49 +0000
committerPaul Eggert <eggert@cs.ucla.edu>2006-08-08 22:11:49 +0000
commit0d9807440368af42a310f7db555773022c2faa0a (patch)
tree9e145dff7a5d36f4af1847e4939e5f6f2392b78b /doc
parentf0992c673c87746e20e07e23318be4e9a14ef1b2 (diff)
downloadcoreutils-0d9807440368af42a310f7db555773022c2faa0a.tar.xz
(shuf invocation, Random sources): New sections.
(Operating on sorted files): Add shuf. (sort invocation, shred invocation): New option --random-source. (sort invocation): Fix typo: -R -> -r.
Diffstat (limited to 'doc')
-rw-r--r--doc/coreutils.texi217
1 files changed, 212 insertions, 5 deletions
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 8d6553fe2..5a497614f 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -97,6 +97,7 @@
* sha1sum: (coreutils)sha1sum invocation. Print or check SHA-1 digests.
* sha2: (coreutils)sha2 utilities. Print or check SHA-2 digests.
* shred: (coreutils)shred invocation. Remove files more securely.
+* shuf: (coreutils)shuf invocation. Shuffling text files.
* sleep: (coreutils)sleep invocation. Delay for a specified time.
* sort: (coreutils)sort invocation. Sort text files.
* split: (coreutils)split invocation. Split into fixed-size pieces.
@@ -174,7 +175,7 @@ Free Documentation License''.
* Formatting file contents:: fmt pr fold
* Output of parts of files:: head tail split csplit
* Summarizing files:: wc sum cksum md5sum sha1sum sha2
-* Operating on sorted files:: sort uniq comm ptx tsort
+* Operating on sorted files:: sort shuf uniq comm ptx tsort
* Operating on fields within a line:: cut paste join
* Operating on characters:: tr expand unexpand
* Directory listing:: ls dir vdir dircolors
@@ -207,6 +208,7 @@ Common Options
* Exit status:: Indicating program success or failure.
* Backup options:: Backup options
* Block size:: Block size
+* Random sources:: Sources of random data
* Target directory:: Target directory
* Trailing slashes:: Trailing slashes
* Traversing symlinks:: Traversing symlinks to directories
@@ -246,6 +248,7 @@ Summarizing files
Operating on sorted files
* sort invocation:: Sort text files.
+* shuf invocation:: Shuffle text files.
* uniq invocation:: Uniquify files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
@@ -641,6 +644,7 @@ name.
* Exit status:: Indicating program success or failure.
* Backup options:: -b -S, in some programs.
* Block size:: BLOCK_SIZE and --block-size, in some programs.
+* Random sources:: --random-source, in some programs.
* Target directory:: Specifying a target directory, in some programs.
* Trailing slashes:: --strip-trailing-slashes, in some programs.
* Traversing symlinks:: -H, -L, or -P, in some programs.
@@ -920,6 +924,44 @@ set. The @option{-h} or @option{--human-readable} option is equivalent to
@option{--block-size=human-readable}. The @option{--si} option is
equivalent to @option{--block-size=si}.
+@node Random sources
+@section Sources of random data
+
+@cindex random sources
+
+The @command{shuf}, @command{shred}, and @command{sort} commands
+sometimes need random data to do their work. For example, @samp{sort
+-R} must choose a hash function at random, and it needs random data to
+make this selection.
+
+Normally these commands use the device file @file{/dev/urandom} as the
+source of random data. Typically, this device gathers environmental
+noise from device drivers and other sources into an entropy pool, and
+uses the pool to generate random bits. If the pool is short of data,
+the device reuses the internal pool to produce more bits, using a
+cryptographically secure pseudorandom number generator.
+
+@file{/dev/urandom} suffices for most practical uses, but applications
+requiring high-value or long-term protection of private data may
+require an alternate data source like @file{/dev/random} or
+@file{/dev/arandom}. The set of available sources depends on your
+operating system.
+
+To use such a source, specify the @option{--random-source=@var{file}}
+option, e.g., @samp{shuf --random-source=/dev/random}. The contents
+of @var{file} should be as random as possible. An error is reported
+if @var{file} does not contain enough bytes to randomize the input
+adequately.
+
+To reproduce the results of an earlier invocation of a command, you
+can save some random data into a file and then use that file as the
+random source in earlier and later invocations of the command.
+
+Some old-fashioned or stripped-down operating systems lack support for
+@command{/dev/urandom}. On these systems commands like @command{shuf}
+by default fall back on an internal pseudorandom generator initialized
+by a small amount of entropy.
+
@node Target directory
@section Target directory
@@ -3262,6 +3304,7 @@ These commands work with (or produce) sorted files.
@menu
* sort invocation:: Sort text files.
+* shuf invocation:: Shuffle text files.
* uniq invocation:: Uniquify files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
@@ -3509,9 +3552,19 @@ appear earlier in the output instead of later.
@opindex -R
@opindex --random-sort
@cindex random sort
-Sort by hashing the input keys and then sorting the hash values. This
-is much like a random shuffle of the inputs, except that keys with the
-same value sort together. The hash function is chosen at random.
+Sort by hashing the input keys and then sorting the hash values.
+Choose the hash function at random, ensuring that it is free of
+collisions so that differing keys have differing hash values. This is
+like a random permutation of the inputs (@pxref{shuf invocation}),
+except that keys with the same value sort together.
+
+If multiple random sort fields are specified, the same random hash
+function is used for all fields. To use different random hash
+functions for different fields, you can invoke @command{sort} more
+than once.
+
+The choice of hash function is affected by the
+@option{--random-source} option.
@end table
@@ -3550,6 +3603,13 @@ On newer systems, @option{-o} cannot appear after an input file if
scripts should specify @option{-o @var{output-file}} before any input
files.
+@item --random-source=@var{file}
+@opindex --random-source
+@cindex random source for sorting
+Use @var{file} as a source of random data used to determine which
+random hash function to use with the @option{-R} option. @xref{Random
+sources}.
+
@item -s
@itemx --stable
@opindex -s
@@ -3559,7 +3619,7 @@ files.
Make @command{sort} stable by disabling its last-resort comparison.
This option has no effect if no fields or global ordering options
-other than @option{--reverse} (@option{-R}) are specified.
+other than @option{--reverse} (@option{-r}) are specified.
@item -S @var{size}
@itemx --buffer-size=@var{size}
@@ -3835,6 +3895,147 @@ ls */* | sort -t / -k 1,1R -k 2,2
@end itemize
+@node shuf invocation
+@section @command{shuf}: Shuffling text
+
+@pindex shuf
+@cindex shuffling files
+
+@command{shuf} shuffles its input by outputting a random permutation
+of its input lines. Each output permutation is equally likely.
+Synopses:
+
+@example
+shuf [@var{option}]@dots{} [@var{file}]
+shuf -e [@var{option}]@dots{} [@var{arg}]@dots{}
+shuf -i @var{lo}-@var{hi} [@var{option}]@dots{}
+@end example
+
+@command{shuf} has three modes of operation that affect where it
+obtains its input lines. By default, it reads lines from standard
+input. The following options change the operation mode:
+
+@table @samp
+
+@item -e
+@itemx --echo
+@opindex -c
+@opindex --echo
+@cindex command-line operands to shuffle
+Treat each command-line operand as an input line.
+
+@item -i @var{lo}-@var{hi}
+@itemx --input-range=@var{lo}-@var{hi}
+@opindex -i
+@opindex --input-range
+@cindex input range to shuffle
+Act as if input came from a file containing the range of unsigned
+decimal integers @var{lo}@dots{}@var{hi}, one per line.
+
+@end table
+
+@command{shuf}'s other options can affect its behavior in all
+operation modes:
+
+@table @samp
+
+@item -n @var{lines}
+@itemx --head-lines=@var{lines}
+@opindex -n
+@opindex --head-lines
+@cindex head of output
+Output at most @var{lines} lines. By default, all input lines are
+output.
+
+@item -o @var{output-file}
+@itemx --output=@var{output-file}
+@opindex -o
+@opindex --output
+@cindex overwriting of input, allowed
+Write output to @var{output-file} instead of standard output.
+@command{shuf} reads all input before opening
+@var{output-file}, so you can safely shuffle a file in place by using
+commands like @code{shuf -o F <F} and @code{cat F | shuf -o F}.
+
+@item --random-source=@var{file}
+@opindex --random-source
+@cindex random source for shuffling
+Use @var{file} as a source of random data used to determine which
+permutation to generate. @xref{Random sources}.
+
+@item -z
+@itemx --zero-terminated
+@opindex -z
+@opindex --zero-terminated
+@cindex sort zero-terminated lines
+Treat the input and output as a set of lines, each terminated by a zero byte
+(@acronym{ASCII} @sc{nul} (Null) character) instead of an
+@acronym{ASCII} @sc{lf} (Line Feed).
+This option can be useful in conjunction with @samp{perl -0} or
+@samp{find -print0} and @samp{xargs -0} which do the same in order to
+reliably handle arbitrary file names (even those containing blanks
+or other special characters).
+
+@end table
+
+For example:
+
+@example
+shuf <<EOF
+A man,
+a plan,
+a canal:
+Panama!
+EOF
+@end example
+
+@noindent
+might produce the output
+
+@example
+Panama!
+A man,
+a canal:
+a plan,
+@end example
+
+@noindent
+Similarly, the command:
+
+@example
+shuf -e clubs hearts diamonds spades
+@end example
+
+@noindent
+might output:
+
+@example
+clubs
+diamonds
+spades
+hearts
+@end example
+
+@noindent
+and the command @samp{shuf -i 1-4} might output:
+
+@example
+4
+2
+1
+3
+@end example
+
+@noindent
+These examples all have four input lines, so @command{shuf} might
+produce any of the twenty-four possible permutations of the input. In
+general, if there are @var{N} input lines, there are @var{N}! (i.e.,
+@var{N} factorial, or @var{N} * (@var{N} - 1) * @dots{} * 1) possible
+output permutations.
+
+@exitstatus
+
+
@node uniq invocation
@section @command{uniq}: Uniquify files
@@ -7746,6 +7947,12 @@ for all of the useful overwrite patterns to be used at least once.
You can reduce this to save time, or increase it if you have a lot of
time to waste.
+@item --random-source=@var{file}
+@opindex --random-source
+@cindex random source for shredding
+Use @var{file} as a source of random data used to overwrite and to
+choose pass ordering. @xref{Random sources}.
+
@item -s @var{BYTES}
@itemx --size=@var{BYTES}
@opindex -s @var{BYTES}