diff options
author | Kristoffer Brånemyr <ztion1@yahoo.se> | 2015-03-18 15:32:19 +0000 |
---|---|---|
committer | Pádraig Brady <P@draigBrady.com> | 2015-03-20 00:48:52 +0000 |
commit | 1025243b6a0c8b8830b2d3676a97dae83c74d284 (patch) | |
tree | 94325b00bc4610e4af13fb6a88af7496ad683a04 /NEWS | |
parent | e2e11119e0ac653bd0bdab91c189b7803f8df1f0 (diff) | |
download | coreutils-1025243b6a0c8b8830b2d3676a97dae83c74d284.tar.xz |
wc: speedup counting of short lines
Using a test file generated with:
yes | head -n100M > 2x100M.txt
before> time wc -l 2x100M.txt
real 0.842s
user 0.810s
sys 0.033s
after> time wc -l 2x100M.txt
real 0.142s
user 0.111s
sys 0.031s
* src/wc.c (wc): Split the loop that deals with -l into 3.
The first is used at the start of the input to determine if
the average line length is < 15, and if so the second loop is
used to look for '\n' internally to wc. For longer lines,
memchr is used as before to take advantage of system specific
optimizations which any outweigh function call overhead.
Note the first 2 loops could be combined, though in testing,
GCC 4.9.2 at least, wasn't sophisticated enough to separate
the loops based on the "check_len" invariant.
Note also __builtin_memchr() isn't significant here as
GCC currently only applies constant folding with that.
* NEWS: Mention the improvement.
Diffstat (limited to 'NEWS')
-rw-r--r-- | NEWS | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -94,6 +94,8 @@ GNU coreutils NEWS -*- outline -*- stat and tail now know about IBRIX. stat -f --format=%T now reports the file system type, and tail -f uses polling for files on IBRIX file systems. + wc -l processes short lines much more efficiently. + References from --help and the man pages of utilities have been corrected in various cases, and more direct links to the corresponding online documentation are provided. |