split: new -t option to select record separator

* src/split.c (eolchar): A new variable to hold the separator character (unibyte for now). This is reference throughout rather than hardcoding '\n'. (usage): Describe the new --separator option, and mention records along with lines so there is no ambiguity that all options treat lines and records equivalently. (main): Have -t update eolchar, or default to '\n'. * tests/split/record-sep.sh: New test case. * tests/local.mk: Reference the new test. * doc/coreutils.texi (split invocation): Document the new option. Adjust --lines, --line-bytes, --number=[lr]/... to mention they pertain to records if --separator is specified. * NEWS: Mention the new feature.
author: Assaf Gordon <assafgordon@gmail.com> 2015-01-07 18:30:28 -0500
committer: Pádraig Brady <P@draigBrady.com> 2015-01-19 23:22:37 +0000
commit: 4c795d543908ea4715b3e0bd6c6cf908315936d8 (patch)
tree: 74e9d10d130ce903bf9053508a42f9cb3f48858a /doc
parent: c4c2a09cc804afb338efa5ccedffa269888c4685 (diff)
download: coreutils-4c795d543908ea4715b3e0bd6c6cf908315936d8.tar.xz
1 files changed, 20 insertions, 5 deletions
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 1cc65329c..5a3c31a15 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3395,6 +3395,8 @@ The program accepts the following options.  Also see @ref{Common options}.
 @opindex -l
 @opindex --lines
 Put @var{lines} lines of @var{input} into each output file.
+If @option{--separator} is specified, then @var{lines} determines
+the number of records.
 
 For compatibility @command{split} also supports an obsolete
 option syntax @option{-@var{lines}}.  New scripts should use
@@ -3412,9 +3414,11 @@ Put @var{size} bytes of @var{input} into each output file.
 @opindex -C
 @opindex --line-bytes
 Put into each output file as many complete lines of @var{input} as
-possible without exceeding @var{size} bytes.  Individual lines longer than
-@var{size} bytes are broken into multiple files.
+possible without exceeding @var{size} bytes.  Individual lines or records
+longer than @var{size} bytes are broken into multiple files.
 @var{size} has the same format as for the @option{--bytes} option.
+If @option{--separator} is specified, then @var{lines} determines
+the number of records.
 
 @item --filter=@var{command}
 @opindex --filter
@@ -3445,7 +3449,7 @@ Split @var{input} to @var{chunks} output files where @var{chunks} may be:
 @example
 @var{n}      generate @var{n} files based on current size of @var{input}
 @var{k}/@var{n}    only output @var{k}th of @var{n} to stdout
-l/@var{n}    generate @var{n} files without splitting lines
+l/@var{n}    generate @var{n} files without splitting lines or records
 l/@var{k}/@var{n}  likewise but only output @var{k}th of @var{n} to stdout
 r/@var{n}    like @samp{l} but use round robin distribution
 r/@var{k}/@var{n}  likewise but only output @var{k}th of @var{n} to stdout
@@ -3462,10 +3466,10 @@ or the @var{input} is truncated.
 For @samp{l} mode, chunks are approximately @var{input} size / @var{n}.
 The @var{input} is partitioned into @var{n} equal sized portions, with
 the last assigned any excess.  If a line @emph{starts} within a partition
-it is written completely to the corresponding file.  Since lines
+it is written completely to the corresponding file.  Since lines or records
 are not split even if they overlap a partition, the files written
 can be larger or smaller than the partition size, and even empty
-if a line is so long as to completely overlap the partition.
+if a line/record is so long as to completely overlap the partition.
 
 For @samp{r} mode, the size of @var{input} is irrelevant,
 and so can be a pipe for example.
@@ -3505,6 +3509,17 @@ than the number requested, or if a line is so long as to completely
 span a chunk.  The output file sequence numbers, always run consecutively
 even when this option is specified.
 
+@item -t @var{separator}
+@itemx --separator=@var{separator}
+@opindex -t
+@opindex --separator
+@cindex line separator character
+@cindex record separator character
+Use character @var{separator} as the record separator instead of the default
+newline character (ASCII LF).
+To specify ASCII NUL as the separator, use the two-character string @samp{\0},
+e.g., @samp{split -t '\0'}.
+
 @item -u
 @itemx --unbuffered
 @opindex -u
author	Assaf Gordon <assafgordon@gmail.com>	2015-01-07 18:30:28 -0500
committer	Pádraig Brady <P@draigBrady.com>	2015-01-19 23:22:37 +0000
commit	4c795d543908ea4715b3e0bd6c6cf908315936d8 (patch)
tree	74e9d10d130ce903bf9053508a42f9cb3f48858a /doc
parent	c4c2a09cc804afb338efa5ccedffa269888c4685 (diff)
download	coreutils-4c795d543908ea4715b3e0bd6c6cf908315936d8.tar.xz