summaryrefslogtreecommitdiff
path: root/src/sort.c
AgeCommit message (Collapse)Author
2011-05-06sort: fix a contradictory --debug warningPádraig Brady
* src/sort.c (key_warn): `sort -k2,1n --debug` would output warnings about being both "zero width" and "spanning multiple fields". Suppress the latter one. * tests/misc/sort-debug-warn: Add a couple of test cases.
2011-03-16sort: avoid memory pressure of 130MB/thread when reading from pipeJim Meyering
* src/sort.c (INPUT_FILE_SIZE_GUESS): Decrease initial allocation factor used to size buffer used when reading a non-regular file. For motivation, see discussion here: http://thread.gmane.org/gmane.comp.gnu.coreutils.general/878/focus=887
2011-03-13sort: spawn fewer threads for small inputsJim Meyering
* src/sort.c (SUBTHREAD_LINES_HEURISTIC): Do not spawn a new thread for every 4 lines. Increase this from 4 to 128K. 128K lines seems appropriate for a 5-year-old dual-core laptop, but it is too low for some common combinations of short lines and/or newer systems. * NEWS (Bug fixes): Mention it.
2011-02-03sort: fix --debug key highlighting when key start after key endPádraig Brady
This case was overlooked in commit bdde34f9, 2010-08-05, "sort: tune and refactor --debug code, and fix minor underlining bug" * src/sort.c (debug_key): Don't adjust the key end when it's before the key start. * tests/misc/sort-debug-keys: Add a test case.
2011-01-07maint: suppress some clang scan-build warningsPádraig Brady
* src/pr.c (char_to_clump): Remove a dead store. * src/remove.c (fts_skip_tree): Likewise. * src/sort.c (key_warnings): Likewise. (sort): Suppress an uninitialized pointer warning.
2011-01-01maint: update all copyright year number rangesJim Meyering
Run "make update-copyright".
2010-12-28coreutils: keep lines within 80-column limitsPaul Eggert
* cfg.mk (LINE_LEN_MAX, FILTER_LONG_LINES): New macros. (sc_long_lines): New rule. * HACKING: Use shorter URLs to the same material. * doc/Makefile.am, doc/coreutils.texi, m4/boottime.m4: * man/help2man, man/stdbuf.x, src/Makefile.am, src/cat.c, src/copy.c: * src/cp.c, src/dd.c, src/df.c, src/du.c, src/groups.c, src/install.c: * src/ls.c, src/md5sum.c, src/mv.c, src/od.c, src/pinky.c, src/ptx.c: * src/readlink.c, src/remove.c, src/rmdir.c, src/setuidgid.c: * src/sort.c, src/tail.c, src/touch.c, tests/Coreutils.pm: * tests/cp/existing-perm-race, tests/cp/perm, tests/cp/preserve-gid: * tests/du/2g, tests/du/long-from-unreadable, tests/init.sh: * tests/install/basic-1, tests/ls/nameless-uid: * tests/ls/readdir-mountpoint-inode, tests/misc/chroot-credentials: * tests/misc/cut, tests/misc/date, tests/misc/join, tests/misc/md5sum: * tests/misc/sha1sum, tests/misc/sha224sum, tests/misc/sort: * tests/misc/sort-continue, tests/misc/sort-files0-from: * tests/misc/sort-rand, tests/misc/stdbuf, tests/misc/tr: * tests/misc/uniq, tests/mv/atomic, tests/mv/part-fail: * tests/mv/part-symlink, tests/mv/sticky-to-xpart, tests/pr/pr-tests: * tests/rm/fail-2eperm, tests/rm/interactive-always: Reformat to fit within 80 columns. * doc/Makefile.am (BAD_POSIX_PERL): New macro. * doc/coreutils.texi: Reword slightly, to make menus and index lines shorter. * src/md5sum.c: Redo --help output so that it fits within 79 columns, since that's a bit more portable and all the other --help strings fit in 79 columns.
2010-12-22sort: minor performance tweak with num_processorsPaul Eggert
* src/sort.c (main): Don't invoke num_processors twice.
2010-12-20maint: fix a typo in sort --parallel help messagePádraig Brady
Also fix up Chen Guo's contacts * src/sort.c (usage): Add a missing "of" * THANKS: Add Chen Guo * .mailmap: Add Chen Guo's UCLA address
2010-12-19sort: use at most 8 threads by defaultPádraig Brady
* src/sort.c (main): If --parallel isn't specified, restrict the number of threads to 8 by default. If the --parallel option is specified, then allow any number of threads to be set, independent of the number of processors on the system. * doc/coreutils.texi (sort invocation): Document the changes to determining the number of threads to use. Mention the memory overhead when using multiple threads. * tests/misc/sort-spinlock-abuse: Allow single core systems that support pthreads. * tests/misc/sort-stale-thread-mem: Likewise. * tests/misc/sort-unique-segv: Likewise. * NEWS: Mention the change in behaviour.
2010-12-16sort: do not generate thousands of subprocesses for 16-way mergePaul Eggert
Without this change, tests/misc/sort-compress-hang would consume more than 10,000 process slots on my RHEL 5.5 x86-64 server, making it likely for other applications to fail due to lack of process slots. With this change, the same benchmark causes 'sort' to consume at most 19 process slots. The change also improved wall-clock time by 2% and user+system time by 14% on that benchmark. * NEWS: Document this. * src/sort.c (MAX_PROCS_BEFORE_REAP): Remove. (reap_exited): Renamed from reap_some; this is a more accurate name, since "some" incorrectly implies that it reaps at least one process. All uses changed. (reap_some): New function: it *does* reap at least one process. (pipe_fork): Do not allow more than NMERGE + 2 subprocesses. (mergefps, sort): Omit check for exited processes: no longer needed, and anyway the code consumed too much CPU per line when 2 < nprocs.
2010-12-16sort: fix hang with sort --compressPaul Eggert
* NEWS: Document this. * src/sort.c (UNCOMPRESSED, UNREAPED, REAPED): New constants. (struct tempnode): New member 'state', to hold these constants. The pid member is now undefined if state == UNCOMPRESSED. (struct sortfile): Replace member 'pid' with member 'temp'. (uintptr): Remove. (proctab_hasher, proctab_comparator, register_proc, delete_proc): Proctab entries are now struct tempnode *, not pid_t, to handle the case where multiple tempnode objects correspond to the same pid. This avoids a race condition that can cause a hang. (register_proc): Arg is now struct tempnode *, not pid_t. All callers changed. (delete_proc): Set tempnode state to REAPED. (create_temp_file): No need to set pid member here; it's now done when the pid is known. (maybe_create_temp, create_temp): Remove PPID arg. Return struct tempnode *, not char *. All callers changed. (maybe_create_temp): Set node state to UNCOMPRESSED or UNREAPED. No need to set node->pid to 0. (open_temp): Replace NAME and PID args with a single TEMP arg. All callers changed. Wait only for unreaped children. (zaptemp): Wait for decompressor to finish before removing its temporary-file input. This avoids .nfsXXXX hassles with NFS and fixes a race (leading to a hang) regardless of NFS. (open_input_files): Adjust to new way of dealing with temp files and their subprocesses. * tests/Makefile.am (TESTS): Add misc/sort-compress-hang. * tests/misc/sort-compress-hang: New file.
2010-12-16sort: don't dump core when merging from input twicePaul Eggert
* NEWS: Document this. * src/sort.c (avoid_trashing_input): The previous fix to this function didn't fix all the problems with this code. Replace it with something simpler: just copy the input file. This doesn't change the number of files, so return void instead of the updated file count. Caller changed. * tests/misc/sort-merge-fdlimit: Test for the bug.
2010-12-14sort: fix very-unlikely buffer overrun when merging to input filePaul Eggert
* src/sort.c (avoid_trashing_input): Fix a typo that could cause a buffer overrun in theory. In practice this is extremely unlikely, as it requires running out of file descriptors in a small merge, presumably because some other process is hogging all the OS's file descriptors.
2010-12-13sort: fix some --compress reaper bugsPaul Eggert
* src/sort.c (uintptr): New type. (enum procstate, struct procnode, update_proc): Remove. (proctab_hasher, proctab_comparator, register_proc, wait_proc): (reap_some): The proctab is now simply a hash of process-IDs rather than of pointers to objects with reference counts and states; this is smaller and faster and easier to understand. (nprocs): Now pid_t, not size_t, since one cannot have more than PID_MAX children. (reap): If the argument is -1, wait; if 0 (a new value), do not. Delete pid from proctab as needed. Ignore children that are not in proctab, as they are from the program that exec'ed us and are irrelevant to our success or failure. (delete_proc, reap_all): New functions. (open_temp): Register the child. (sort): Clean up all children afterwards; without this patch, 'sort' sometimes missed failures in children due to race conditions. * tests/Makefile.am (TESTS): Add misc/sort-compress-proc. * tests/misc/sort-compress-proc: New file, to test for the bugs fixed above.
2010-12-11sort: syntax cleanupJim Meyering
* src/sort.c (xfopen, debug_key, sortlines, sort, main): Adjust formatting: fix misplaced braces, use consistent spacing, split a 2-stmt line.
2010-12-11sort: integer overflow checks in thread counts, etc.Paul Eggert
* src/sort.c (specify_nthreads, merge_tree_init, init_node): (queue_init, sortlines, struct thread_args, sort, main): Use size_t, not unsigned long int, for thread counts, since thread counts are now used to compute sizes. (specify_nthreads): Check for size_t overflow. (merge_tree_init, sort): Shorten name of local variable, for readability. (merge_tree_init): Move constants next to each other in product, so that the constant folding is easier to see. (init_node): Now static. Add 'restrict' only where it might be helpful for compiler optimization. (queue_init): 2nd arg is now nthreads, not "reserve", which is a bit harder to follow. All uses changed. (struct thread_args): Rename lo_child to is_lo_child, so that it's obvious to the reader when we're talking about this boolean as opposed to the new lo_child member of the other structure. All uses changed. (sort): Remove unused local variable end_node. (main): Don't allow large thread counts to cause undefined behavior later, due to integer overflow.
2010-12-11sort: preallocate merge tree nodes to heap.Chen Guo
* src/sort.c: (merge_tree_init) New function. Allocates memory for merge tree nodes. (merge_tree_destory) New function. (init_node) New function. (sortlines) Refactor node creation code to init_node. Remove now superfluous arguments. All callers changed. (sort) Initialize/destory merge tree. Refactor root node creation to merge_tree_init.
2010-12-11sort: comment fixPaul Eggert
* src/sort.c: Comment fix re spin locks.
2010-12-11sort: use mutexes, not spinlocks (avoid busy loop on blocked output)Chen Guo
Running a command like this on a multi-core system sort < big-file | less would peg all processors at near 100% utilization. * src/sort.c: (struct merge_node) Change member lock to mutex. All uses changed. * tests/Makefile.am (XFAIL_TESTS): Remove definition, now that this test passes once again. I.e., the sort-spinlock-abuse test no longer fails. * NEWS (Bug reports): Mention this. Reported by DJ Lucas in http://debbugs.gnu.org/7489.
2010-12-03sort: merge_queue -> queuePaul Eggert
* src/sort.c (struct thread_args, sortlines_thread, sortlines, sort): Rename "merge_queue" to "queue", for consistency with other functions that just use the name "queue" for these things.
2010-12-03sort: clarify queue_check_insertPaul Eggert
* src/sort.c (queue_check_insert): Clarify body a bit, and remove no-longer-needed comment.
2010-12-03sort: fix problems with merge node dest pointerPaul Eggert
* src/sort.c (mergelines_node): Return void, not size_t. All callers changed. Change *node->dest here, not in caller. Do not change node->dest: it's not needed and could cause problems on (mostly theoretical) hosts that do not allow adding integers to null pointers. (queue_check_insert_parent): Omit MERGED parameter; no longer needed. All callers changed.
2010-12-03sort: simplify write_uniquePaul Eggert
* src/sort.c (write_unique): Simplify slightly so that there is just one call to write_line, not two.
2010-12-03sort: put queue arg firstPaul Eggert
* src/sort.c (queue_check_insert, queue_check_insert_parent): Make the queue arg first, for consistency with other functions such as queue_insert that put the queue arg first. Rename from check_insert and update_parent, respectively. All callers changed.
2010-12-03sort: tune struct_merge_node slightlyPaul Eggert
* src/sort.c (struct merge_node): 'lock' is now the actual lock, not a pointer to the lock; there's no need for indirection here. Make 'level' unsigned int instead of size_t, since it is a bit-shift count; also, move it next to a bool so that it's more likely to take less space. All uses changed. (sortlines, sort): Spell out initialization instead of using an initializer. This makes the initializer a bit easier to understand, and avoids unnecessary stores into the spin lock.
2010-12-03sort: Clarify commentsPaul Eggert
* src/sort.c: Improve comments a bit.
2010-12-01sort: fix bug on 64-bit hosts with at least 32768 processorsPaul Eggert
* src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine with (say) 32-bit int and 64-bit size_t and when level == 15. Without this fix, on such a machine with 32768 or more processors, the level computation could overflow on large input, and this would result in division by zero.
2010-12-01sort -u: fix a thread-race pointer corruption bugPaul Eggert
* src/sort.c (write_unique): Save the entire "struct line", not just a pointer to one. Otherwise, with a multi-thread run, sometimes, with some inputs, fillbuf would would win a race and clobber a "saved->text" pointer in one thread just before it was dereferenced in a comparison in another thread. * NEWS (Bug fixes): Mention it.
2010-10-14bug#7213: [PATCH] sort: fix buffer overrun on 32-bit hosts when warning re ↵Paul Eggert
obsolete keys * src/sort.c (key_warnings): Local buffer should be of size INT_BUFSIZE_BOUND (uintmax_t), not INT_BUFSIZE_BOUND (sword). This bug was discovered by running 'make check' on a 32-bit Solaris 8 sparc host, using Sun cc. I saw several other instances of invoking umaxtostr on a buffer declared to be of size INT_BUFSIZE_BOUND (VAR), and these instances should at some point be replaced by INT_BUFSIZE_BOUND (uintmax_t) too, as that's a less error-prone style.
2010-10-13sort: fix unportable cast of unsigned char * -> char *Paul Eggert
* src/sort.c (fold_toupper): Change this back from char to unsigned char, fixing a portability issue introduced in commit 59e2e55d0f154a388adc9bac37d2b45f2ba971f8 dated February 26, as the C Standard doesn't let you convert from unsigned char * to char * without a cast, and the (in theory more portable) style here is to convert char values, not pointer values. (getmonth): Convert char to unsigned char when needed for comparison.
2010-09-20sort: destroy spin locks portablyPaul Eggert
* src/sort.c (sortlines, sort): Use pthread_spin_destroy when a spin lock is no longer used. This isn't needed on GNU/Linux or Solaris, but POSIX says it may free up resources on some platforms.
2010-08-10sort, who: prefer free+malloc to realloc when contents are irrelevantPaul Eggert
This change was prompted by the previous one: I audited the code looking for similar examples. Too bad valgrind doesn't catch this. * src/sort.c (check, mergefps): xrealloc -> free + xmalloc * src/who.c (print_user): Likewise.
2010-08-10sort: free/xmalloc rather than xreallocPaul Eggert
* src/sort.c (compare_random): Use free/xmalloc rather than xrealloc, since the old buffer contents need not be preserved. Also, don't fail if the guessed-sized malloc fails. Suggested by Bruno Haible.
2010-08-10sort: avoid gcc warning: explicitly ignore strtold resultJim Meyering
* src/sort.c: Include "ignore-value.h". (debug_key): Use ignore_value.
2010-08-08sort: speed up -R with long lines in hard localesPaul Eggert
* src/sort.c (compare_random): Guess that the output will be 3X the input. This avoids the overhead of calling strxfrm twice on typical implementations. Suggested by Bruno Haible.
2010-08-06sort: support all combinations of -d, -f, -i, -R, and -VPaul Eggert
* NEWS: Document this. * src/sort.c (getmonth): Omit LEN arg, as MONTH is now null-terminated. (compare_random): Don't null-terminate keys, as caller now does that. (compare_version): Remove. (debug_key): Null-terminate string for getmonth. (keycompare): Support combining -R with any of -d, -f, -i, -V. Also, support combining -V with any of -d, -i. (check_ordering_compatibility): Allow newly-supported combinations. * tests/misc/sort (02q, 02r, 02s): New tests, for new combinations. (incompat2): Now test -nR, since -fR are now compatible.
2010-08-05sort: tune and refactor --debug code, and fix minor underlining bugPaul Eggert
Formerly, the 'compare' function and some of its subroutines had a debugging flag, which caused them to output underlines. This change refactors the code so that debugging output is more-separated from the actual sorting. In the process, the change fixes a minor error in the debugging output. The change shortens the source code and executable size a tad, and improves CPU performance by 2.4% on my platform with a simple benchmark (C locale, line sorting, no debug). * src/sort.c (long_double, strtold): Move back to prelude, since they're now used by multiple functions again. (unit_order): Move to file scope, since it's now used by two functions. (find_unit_order, human_numcompare, numcompare, general_numcompare): Remove endptr parameter. All callers changed. (human_numcompare): Args are now const pointers. (getmonth): Endptr is now non-const. (key_numeric): Move up, since it's needed earlier. (debug_key): Take a line and a key as argument, instead of having the caller figure out where the field is. (debug_line): New function. (keycompare, compare): Omit debug parameter; debug output now done elsewhere. All callers changed. (write_line): Renamed from write_bytes; all callers changed. Use debug_line (not 'compare') to output debug info. Use a slightly faster check for whether output file is stdout. (check): Don't do debugging output; it's not that useful here, and it confuses the code. (main): Check for incompatibility between -c and --debug. Use standard diagnostic for incompatible options. * tests/misc/sort-debug-keys: Fix test case: "--Mi-1" is not a number, so its first character should not be underlined when debugging a numeric sort.
2010-08-04sort: -R now uses less memory on long lines with internal NULsPaul Eggert
* lib/Makefile.am (libcoreutils_a_SOURCES): Remove xmemxfrm.c, xmemxfrm.h. * lib/memxfrm.c, lib/memxfrm.h, lib/xmemxfrm.c, lib/xmemxfrm.h: Remove. * m4/memxfrm.m4: Likewise. * m4/prereq.m4 (gl_PREREQ): Remove gl_MEMXFRM. * po/POTFILES.in: Remove lib/xmemxfrm.c. * src/sort.c: Don't include xmemxfrm.h. (cmp_hashes): Remove. (xstrxfrm): New function. (compare_random): If a line contains NULs, don't create a big buffer that contains the strxfrm output of each string in the line. Instead, accumulate checksums and differences as we go, so that at any one time we have to store at most the output of a single strxfrm call when processing the line. This removes the need for an memxfrm function.
2010-08-03sort: fix bug in --debug when \0 is followed by \tPaul Eggert
* src/sort.c (debug_width): New function, which does not stop counting tabs at \0, and also invokes mbsnwidth. Stamp out strnlen! (count_tabs): Remove. (debug_key): Use debug_width instead of mbsnwidth and count_tabs. * tests/misc/sort-debug-keys: Check that \0 and \t intermix.
2010-08-02sort: revert recent -h changes and use a more-conservative approachPaul Eggert
* NEWS: Document changes to sort -h, which are now minor with respect to the pre-July-30th version. * doc/coreutils.texi (sort invocation): Likewise. The documentation now describes how -h comparison is done rather than being vague with border cases. * src/sort.c (long_double, strtold): Move back to general_numcompare. (LD, compute_human): Remove. (find_unit_order): Remove THOU_SEP parameter, since thousands separators are now allowed by all callers. Revert to previous behavior of sorting by suffix, and returning the order rather than 2 * order + binary, since we no longer care whether binary powers are being used. However, treat all zeros the same, instead of sorting 0M before 0G; this is more consistent with the desired behavior of sorting -1G before -1M. * tests/misc/sort (h1, h3, h6): Adjust to match mostly-reverted behavior. However, check that all zeros sort together. * tests/misc/sort-debug-keys: Omit a "_", since the trailing "i" in "1234Gi" is no longer part of the key.
2010-07-30sort: -h now handles comparisons such as 6000K vs 5M and 5MiB vs 5MBPaul Eggert
* NEWS: Document changes to sort -h. * doc/coreutils.texi (sort invocation): Likewise. * src/sort.c (long_double, strtold): Move to prelude, since they're now used by multiple functions. (LD): New macro. (struct keyfield.iec_present): Remove this member. All uses removed. (check_mixed_SI_IEC): Remove. This code was busted in the presence of multiple threads, as it had a race condition. (find_unit_order): Remove arg KEY; add arg THOU_SEP; arg ENDPTR is now char ** rather than char const **. Return an integer that distinguishes decimal from binary powers. Parse the number consistently with the intersection of strtold and strnumcmp. Set *ENDPTR unconditionally. (compute_human): New static function. (human_numcompare): Remove arg KEY. Remove 'const' from other args. Use strnumcmp if possible, but fall back on floating point if not. (numcompare, general_numcompare): Arg EA is now char ** rather than char const **. (numcompare): Adjust to new find_unit_order signature and behavior. (keycompare): Adjus to new human_numcompare signature. * tests/misc/sort (h1, h3, h4, h6): Adjust to new behavior. * tests/misc/sort-debug-keys: Likewise.
2010-07-27sort: fix --debug display with very large offsetsPaul Eggert
* src/sort.c (mark_key): Don't assume offset <= INT_MAX. Make the code a bit clearer when width != 0.
2010-07-26sort: fix bug with EOF at buffer refillPaul Eggert
* src/sort.c (fillbuf): Don't append eol unless the line is nonempty. This fixes a bug that was partly but not completely fixed by the aadc67dfdb47f28bb8d1fa5e0fe0f52e2a8c51bf commit (dated July 15). * tests/misc/sort (realloc-buf-2): New test, which catches this bug on 64-bit hosts.
2010-07-26sort: omit some "inline"sPaul Eggert
* src/sort.c (mergelines, queue_destroy, queue_init, queue_insert): (queue_pop, write_unique, mergelines_node, check_insert): (update_parent): No longer inline; these uses of "inline" seemed unlikely to help performance much.
2010-07-26sort: don't assume ASCII when parsing K, M, G suffixesPaul Eggert
* src/sort.c (find_unit_order): Don't assume ASCII.
2010-07-25sort: omit unnecessary mutex unlock+lock; simplify heap accessPaul R. Eggert
* src/sort.c (queue_pop): Omit unnecessary unlock+lock after pthread_cond_wait returns. Don't access "count" member of the heap; any efficiency gains should be quite minor, the access complicates this code, and "count" should be private anyway.
2010-07-25sort: omit 'restrict' in doubtful casesPaul R. Eggert
* src/sort.c (lock_node, unlock_node, queue_destroy, queue_init): (queue_pop): Omit 'restrict'; it shouldn't help here, as these functions have just one pointer parameter and don't access static storage. (queue_insert, check_insert, update_parent): Omit 'restrict', as the pointer types differ, and are not char * or unsigned char *, and therefore can't alias. (write_unique): Omit 'restrict', as the pointer types are all read-only. (merge_loop, sortlines): Omit 'restrict', as any performance advantages are extremely unlikely and it's not worth cluttering the code for that. (struct thread_args): Omit 'restrict': this seems to be incorrect. It's unlikely for 'restrict' to be correct inside a typedef.
2010-07-25sort: omit unnecessary castsPaul R. Eggert
* src/sort.c (inittables, general_numcompare, compare_nodes): (queue_init, queue_pop): Omit casts that are not needed, typically because they are between void * and some other pointer type.
2010-07-25sort: use more-consistent style with constPaul R. Eggert
* src/sort.c (proctab_hasher, proctab_comparator, stream_open, xfopen): (open_temp, zaptemp, struct_month_cmp, begfield, limfield): (find_unit_order, human_numcompare, numcompare, general_numcompare): (count_tabs, keycompare, compare, compare_nodes, lock_node): (unlock_node, queue_destroy, queue_init, queue_insert, queue_pop): (write_unique, mergelines_node, check_insert, update_parent): (merge_loop, sortlines, struct thread_args, set_ordering): Prefer the style "T const" to "const T". * gl/lib/heap.h (struct heap, heap_alloc): Likewise. * gl/lib/heap.c (heap_default_compare, heapify_down, heapify_up): (heap_alloc): Likewise.