summaryrefslogtreecommitdiff
path: root/tests/local.mk
diff options
context:
space:
mode:
authorAssaf Gordon <assafgordon@gmail.com>2013-03-06 18:25:49 -0500
committerPádraig Brady <P@draigBrady.com>2013-03-25 20:07:14 +0000
commit20d7bce0f7e57d9a98f0ee811e31c757e9fedfff (patch)
tree0c3a560e7bbd16762807e0b2fc28d32670549eff /tests/local.mk
parent4c49dc823ff7da589ae58d8d8313d38a75fc8f64 (diff)
downloadcoreutils-20d7bce0f7e57d9a98f0ee811e31c757e9fedfff.tar.xz
shuf: use reservoir-sampling for large or unknown sized inputs
Reservoir sampling optimizes selecting K random lines from large or unknown-sized input: http://en.wikipedia.org/wiki/Reservoir_sampling Note this also avoids reading any input when -n0 is specified. * src/shuf.c (main): Use reservoir-sampling when the number of output lines is known, and the input size is large or unknown. (input_size): A new function to get the input size for regular files. (read_input_reservoir_sampling): New function to read lines from input, keeping only K lines in memory, replacing lines with decreasing prob. (write_permuted_output_reservoir): New function to output reservoir. * tests/misc/shuf-reservoir.sh: An expensive_ test using valgrind to exercise the reservoir-sampling code. * tests/local.mk: Reference new test. * NEWS: Mention the improvement.
Diffstat (limited to 'tests/local.mk')
-rw-r--r--tests/local.mk1
1 files changed, 1 insertions, 0 deletions
diff --git a/tests/local.mk b/tests/local.mk
index 607ddc4d9..dc87ef491 100644
--- a/tests/local.mk
+++ b/tests/local.mk
@@ -313,6 +313,7 @@ all_tests = \
tests/misc/shred-passes.sh \
tests/misc/shred-remove.sh \
tests/misc/shuf.sh \
+ tests/misc/shuf-reservoir.sh \
tests/misc/sort.pl \
tests/misc/sort-benchmark-random.sh \
tests/misc/sort-compress.sh \