From eb3f5b3b3de8c6ca005a701f09bff43d778aece7 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Wed, 15 Aug 2012 12:30:44 +0200 Subject: sort: sort --unique (-u) could cause data loss sort -u could omit one or more lines of expected output. This bug arose because sort recorded the most recently printed line via reference, and if you were unlucky, the storage for that line would be reused (overwritten) as additional input was read into memory. If you were doubly unlucky, the new value of the "saved" line would not only match the very next line, but if that next line were also the first in a series of identical, not-yet-printed lines, then the corrupted "saved" line value would result in the omission of all matching lines. * src/sort.c (saved_line): New static/global, renamed and moved from... (write_unique): ...here. Old name was "saved", which was too generic for its new role as file-scoped global. (fillbuf): With --unique, when we're about to read into a buffer that overlaps the saved "preceding" line (saved_line), copy the line's .text member to a realloc'd-as-needed temporary buffer and adjust the line's key-defining members if they're set. (overlap): New function. * tests/misc/sort: New tests. * NEWS (Bug fixes): Mention it. * THANKS.in: Update. Bug introduced via commit v8.5-89-g9face83. Reported by Rasmus Borup Hansen in http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/23173/focus=24647 --- tests/misc/sort | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'tests/misc') diff --git a/tests/misc/sort b/tests/misc/sort index 5d15d7572..4e5116155 100755 --- a/tests/misc/sort +++ b/tests/misc/sort @@ -227,6 +227,17 @@ my @Tests = ["15d", '-i -u', {IN=>"\1a\na\n"}, {OUT=>"\1a\n"}], ["15e", '-i -u', {IN=>"a\n\1\1\1\1\1a\1\1\1\1\n"}, {OUT=>"a\n"}], +# This would fail (printing only the 7) for 8.6..8.18. +# Use --parallel=1 for reproducibility, and a small buffer size +# to let us trigger the problem with a smaller input. +["unique-1", '--p=1 -S32b -u', {IN=>"7\n"x11 . "1\n"}, {OUT=>"1\n7\n"}], +# Demonstrate that 8.19's key-spec-adjusting code is required. +# These are more finicky in that they are arch-dependent. +["unique-key-i686", '-u -k2,2 --p=1 -S32b', + {IN=>"a 7\n"x10 . "b 1\n"}, {OUT=>"b 1\na 7\n"}], +["unique-key-x86_64", '-u -k2,2 --p=1 -S32b', + {IN=>"a 7\n"x11 . "b 1\n"}, {OUT=>"b 1\na 7\n"}], + # From Erick Branderhorst -- fixed around 1.19e ["16a", '-f', {IN=>"éminence\nüberhaupt\n's-Gravenhage\naëroclub\nAag\naagtappels\n"}, -- cgit v1.2.3-70-g09d2