summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorChen Guo <chen.guo.0625@gmail.com>2010-01-08 03:42:27 -0800
committerPádraig Brady <P@draigBrady.com>2010-11-22 01:45:15 +0000
commitbe107398e56e9f6ada8cd558b3f43bb1ed70fb84 (patch)
treed0fb06eeb230ba097da352d041443a27e5a01d74 /doc
parentdadca988afc4bbe132c4919fd669272ac6cb2566 (diff)
downloadcoreutils-be107398e56e9f6ada8cd558b3f43bb1ed70fb84.tar.xz
split: add --number to generate a particular number of files
* src/split.c (usage, long_options, main): New options --number, --unbuffered, --elide-empty-files. (set_suffix_length): New function to auto increase suffix length to handle a specified number of files. (create): New function. Refactored from cwrite() and ofile_open(). (bytes_split): Add max_files argument to support byte chunking. (lines_chunk_split): New function. Split file into chunks of lines. (bytes_chunk_extract): New function. Extract a chunk of file. (of_info): New struct. Used by functions lines_rr and ofile_open to keep track of file descriptors associated with output files. (ofile_open): New function. Shuffle file descriptors when there are more output files than available file descriptors. (lines_rr): New function to distribute lines round-robin to files. (chunk_parse): New function. Parses K/N syntax. * tests/misc/split-bchunk: New test for byte chunking. * tests/misc/split-lchunk: New test for line delimited chunking. * tests/misc/split-rchunk: New test for round-robin chunking. * tests/Makefile.am: Reference new tests. * tests/misc/split-fail: Add failure scenarios for new options. * tests/misc/split-l: Fix a typo. s/ln/split/. * doc/coreutils.texi (split invocation): Document --number. * NEWS: Mention the new feature. * .mailmap: Map new email address for shortlog. Signed-off-by: Pádraig Brady <P@draigBrady.com>
Diffstat (limited to 'doc')
-rw-r--r--doc/coreutils.texi71
1 files changed, 61 insertions, 10 deletions
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 1373f941c..34d9ff031 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -104,7 +104,7 @@
* shuf: (coreutils)shuf invocation. Shuffling text files.
* sleep: (coreutils)sleep invocation. Delay for a specified time.
* sort: (coreutils)sort invocation. Sort text files.
-* split: (coreutils)split invocation. Split into fixed-size pieces.
+* split: (coreutils)split invocation. Split into pieces.
* stat: (coreutils)stat invocation. Report file(system) status.
* stdbuf: (coreutils)stdbuf invocation. Modify stdio buffering.
* stty: (coreutils)stty invocation. Print/change terminal settings.
@@ -2624,7 +2624,7 @@ These commands output pieces of the input.
@menu
* head invocation:: Output the first part of files.
* tail invocation:: Output the last part of files.
-* split invocation:: Split a file into fixed-size pieces.
+* split invocation:: Split a file into pieces.
* csplit invocation:: Split a file into context-determined pieces.
@end menu
@@ -2920,15 +2920,15 @@ mean either @samp{tail ./+4} or @samp{tail -n +4}.
@node split invocation
-@section @command{split}: Split a file into fixed-size pieces
+@section @command{split}: Split a file into pieces.
@pindex split
@cindex splitting a file into pieces
@cindex pieces, splitting a file into
-@command{split} creates output files containing consecutive sections of
-@var{input} (standard input if none is given or @var{input} is
-@samp{-}). Synopsis:
+@command{split} creates output files containing consecutive or interleaved
+sections of @var{input} (standard input if none is given or @var{input}
+is @samp{-}). Synopsis:
@example
split [@var{option}] [@var{input} [@var{prefix}]]
@@ -2941,10 +2941,9 @@ left over for the last section), into each output file.
The output files' names consist of @var{prefix} (@samp{x} by default)
followed by a group of characters (@samp{aa}, @samp{ab}, @dots{} by
default), such that concatenating the output files in traditional
-sorted order by file name produces
-the original input file. If the output file names are exhausted,
-@command{split} reports an error without deleting the output files
-that it did create.
+sorted order by file name produces the original input file (except
+@option{-r}). If the output file names are exhausted, @command{split}
+reports an error without deleting the output files that it did create.
The program accepts the following options. Also see @ref{Common options}.
@@ -2976,6 +2975,41 @@ possible without exceeding @var{size} bytes. Individual lines longer than
@var{size} bytes are broken into multiple files.
@var{size} has the same format as for the @option{--bytes} option.
+@item -n @var{chunks}
+@itemx --number=@var{chunks}
+@opindex -n
+@opindex --number
+
+Split @var{input} to @var{chunks} output files where @var{chunks} may be:
+
+@example
+@var{n} generate @var{n} files based on current size of @var{input}
+@var{k}/@var{n} only output @var{k}th of @var{n} to stdout
+l/@var{n} generate @var{n} files without splitting lines
+l/@var{k}/@var{n} likewise but only output @var{k}th of @var{n} to stdout
+r/@var{n} like @samp{l} but use round robin distribution
+r/@var{k}/@var{n} likewise but only output @var{k}th of @var{n} to stdout
+@end example
+
+Any excess bytes remaining after dividing the @var{input}
+into @var{n} chunks, are assigned to the last chunk.
+Any excess bytes appearing after the initial calculation are discarded
+(except when using @samp{r} mode).
+
+All @var{n} files are created even if there are fewer than @var{n} lines,
+or the @var{input} is truncated.
+
+For @samp{l} mode, chunks are approximately @var{input} size / @var{n}.
+The @var{input} is partitioned into @var{n} equal sized portions, with
+the last assigned any excess. If a line @emph{starts} within a partition
+it is written completely to the corresponding file. Since lines
+are not split even if they overlap a partition, the files written
+can be larger or smaller than the partition size, and even empty
+if a line is so long as to completely overlap the partition.
+
+For @samp{r} mode, the size of @var{input} is irrelevant,
+and so can be a pipe for example.
+
@item -a @var{length}
@itemx --suffix-length=@var{length}
@opindex -a
@@ -2988,6 +3022,23 @@ Use suffixes of length @var{length}. The default @var{length} is 2.
@opindex --numeric-suffixes
Use digits in suffixes rather than lower-case letters.
+@item -e
+@itemx --elide-empty-files
+@opindex -e
+@opindex --elide-empty-files
+Suppress the generation of zero-length output files. This can happen
+with the @option{--number} option if a file is (truncated to be) shorter
+than the number requested, or if a line is so long as to completely
+span a chunk. The output file sequence numbers, always run consecutively
+even when this option is specified.
+
+@item -u
+@itemx --unbuffered
+@opindex -u
+@opindex --unbuffered
+Immediately copy input to output in @option{--number r/...} mode,
+which is a much slower mode of operation.
+
@itemx --verbose
@opindex --verbose
Write a diagnostic just before each output file is opened.