Age | Commit message (Collapse) | Author |
|
Extract the functionality of parsing --field=LIST into a separate
module, to be used by other programs.
* src/cut.c: move field parsing code from here ...
* src/set-fields.{c,h}: ... to here.
(set_fields): generalize by supporting multiple parsing/reporting
options.
(struct range_pair): rename to field_range_pair.
* src/local.mk: link cut with set-field.
* po/POTFILES.in: add set-field.c
* tests/misc/cut.pl: update wording of error messages
|
|
Run "make update-copyright" and then...
* tests/sample-test: Adjust to use the single most recent year.
* tests/du/bind-mount-dir-cycle-v2.sh: Fix case in copyright message,
so that year is updated automatically in future.
|
|
commits v8.20-98-g51ce0bf and v8.20-99-gd302aed changed cut(1)
to process each line independently and thus promptly output
each line without buffering. As part of those changes we removed
the special handling of --delimiter=$'\n' --fields=... which
could be used to select arbitrary (ranges of) lines, so as to
simplify and optimize the implementation while also matching the
behavior of different cut(1) implementations.
However that GNU behavior was in place for a long time, and
could be useful in certain cases like making a separated list like
`seq 10 | cut -f1- -d$'\n' --output-delimiter=,` although other tools
like head(1) and paste(1) are more suited to this operation.
This patch reinstates that functionality but restricts the
"line behind" buffering behavior to only the -d$'\n' case.
We also fix the following related edge case to be more consistent:
before> printf "\n" | cut -s -d$'\n' -f1- | wc -l
2
before> printf "\n" | cut -d$'\n' -f1- | wc -l
1
after > printf "\n" | cut -s -d$'\n' -f1- | wc -l
1
after > printf "\n" | cut -d$'\n' -f1- | wc -l
1
* src/cut.c (cut_fields): Adjust as discussed above.
* tests/misc/cut.pl: Likewise.
* NEWS: Mention the change in behavior both for v8.21
and this effective revert.
* cfg.mk (old_NEWS_hash): Adjust for originally omitted v8.21 entry.
* src/paste.c: s/delimeter/delimiter/ comment typo fix.
|
|
Run "make update-copyright", but then also run this,
perl -pi -e 's/2\d\d\d-//' tests/sample-test
to make that one script use the single most recent year number.
|
|
The current implementation of cut, uses a bit array,
an array of `struct range_pair's, and (when --output-delimiter
is specified) a hash_table. The new implementation will use
only an array of `struct range_pair's.
The old implementation is memory inefficient because:
1. When -b with a big num is specified, it allocates a lot of
memory for `printable_field'.
2. When --output-delimiter is specified, it will allocate 31 buckets.
Even if only a few ranges are specified.
Note CPU overhead is increased to determine if an item is to be printed,
as shown by:
$ yes abcdfeg | head -n1MB > big-file
$ for c in with-bitarray without-bitarray; do
src/cut-$c 2>/dev/null
echo -ne "\n== $c =="
time src/cut-$c -b1,3 big-file > /dev/null
done
== with-bitarray ==
real 0m0.084s
user 0m0.078s
sys 0m0.006s
== without-bitarray ==
real 0m0.111s
user 0m0.108s
sys 0m0.002s
Subsequent patches will reduce this overhead.
* src/cut.c (set_fields): Set and initialize RP
instead of printable_field.
* src/cut.c (is_range_start_index): Use CURRENT_RP rather than a hash.
* tests/misc/cut.pl: Check if `eol_range_start' is set correctly.
* tests/misc/cut-huge-range.sh: Rename from cut-huge-to-eol-range.sh,
and add a test to verify large amounts of mem aren't allocated.
Fixes http://bugs.gnu.org/13127
|
|
Fixes the issue introduced in unreleased commit v8.20-60-gec48bea.
* src/cut.c (set_fields): Don't access the bit array if
we've an open ended range that's outside any finite range.
* tests/misc/cut.pl: Add tests for this case.
Reported by Marcel Böhme in http://bugs.gnu.org/13627
|
|
* src/cut.c (cut_fields): Handle the edge case where '\n' is
the delimiter, which could be used for example to suppress
the last line if it doesn't contain a '\n'.
* test/misc/cut.pl: Add tests for this edge case.
|
|
Previously line N+1 was inspected before line N was fully output,
which causes output ordering issues at the terminal or delays
from intermittent sources like tail -f.
* src/cut.c (cut_fields): Adjust so that we record the
previous output character so we can use that info to
determine wether to output a '\n' or not.
* tests/misc/cut.pl: Add tests to ensure existing
functionality isn't broken.
* NEWS: Mention the fix.
Fixes bug http://bugs.gnu.org/13498
|
|
Run "make update-copyright", but then also run this,
perl -pi -e 's/2\d\d\d-//' tests/sample-test
to make that one script use the single most recent year number.
|
|
* tests/misc/cut.pl: This particular bug existed up to v8.10.
|
|
* tests/misc/cut.pl: Since we now output the more
complete error message irrespective of running
in a multi-byte locale or not, adjust the test accordingly.
|
|
* src/cut.c (main): Treat a NUL delimiter (-d '') consistently
with non NUL delimiters, and disallow such a delimiter option,
unless a field is also specified.
(set_fields): Provide a more accurate error message when
a given list is invalid.
* tests/misc/cut.pl: Add a test case.
|
|
When printing output delimiters, and when a to-EOL range subsumes
at least one other range, cut would mistakenly print delimiters for
the subsumed range. This bug was probably introduced via commit
v5.2.1-639-g847e066.
* src/cut.c (set_fields): Ignore any range that is subsumed by a
to-EOL range. Also, move two declarations down.
* tests/misc/cut.pl: Add tests to exercise this.
* NEWS (Bug fixes): Mention it.
Reported by Marcel Böhme in http://bugs.gnu.org/12966
|
|
* src/cut.c (set_fields): When two right-open-ended ranges are
specified, don't blindly let the latter one take precedence over
the former. Instead, use the union of the ranges.
* tests/misc/cut.pl: Add tests to exercise this.
* NEWS (Bug fixes): Mention it.
Reported by Marcel Böhme in http://bugs.gnu.org/12966
Thanks to Berhard Voelker for catching log and NEWS typos.
|
|
The command "echo 12345 | cut -b 0-" prints an empty line while
it should fail with "fields and positions are numbered from 1".
* src/cut.c (set_fields): Add a diagnostic for the invalid open
range which starts with Zero, i.e., the range 0-.
* tests/misc/cut.pl: Add tests to ensure the range 0- fails for
fields (-f) and for positions (-b, -c).
* NEWS: Mention the fix.
Reported by Marcel Böhme in <http://bugs.gnu.org/12903>.
|
|
Not only this shrinks the size of the generated Makefile (from > 6300
lines to ~3000), but will allow further simplifications in future
changes.
* tests/Makefile.am (TEST_EXTENSIONS): Add '.sh' and '.pl'.
(PL_LOG_COMPILER, SH_LOG_COMPILER): New, still defined simply to
$(LOG_COMPILER) for the time being.
(TESTS, root_tests): Adjust as described.
* All tests: Rename as described.
|