summaryrefslogtreecommitdiff
path: root/src/sort.c
AgeCommit message (Collapse)Author
2012-07-10sort: by default, do not exceed 3/4 of physical memoryPaul Eggert
* src/sort.c (default_sort_size): Do not exceed 3/4 of total memory. See Jeff Janes's bug report in <http://lists.gnu.org/archive/html/coreutils/2012-06/msg00018.html>.
2012-07-02sort: fix exit-status typoPaul Eggert
* src/sort.c (stream_open): EXIT_FAILURE -> SORT_FAILURE. Suggested by Pádraig Brady in <http://bugs.gnu.org/11816#34>.
2012-07-02sort: simplify -o handling to avoid fdopen, assertPaul Eggert
* src/sort.c (outfd): Remove. All uses replaced by STDOUT_FILENO. (stream_open): When writing, use stdout rather than fdopen. (move_fd_or_die): Renamed from dup2_or_die, with the added functionality of closing its first argument. All uses changed. (avoid_trashing_input): Special case for !outfile no longer needed. (check_output): Arrange for standard output to go to the file, rather than storing the fd in outfd.
2012-07-02sort: avoid redundant processing with inaccessible inputs or outputPádraig Brady
* src/sort.c (check_inputs): A new function to verify all inputs are accessible before further processing. (check_output): A new function to open or create a specified output file, before futher processing. (stream_open): Adjust to truncating the previously opened output file rather than opening directly. (avoid_trashing_input): Optimize to stat the output file descriptor, rather than the file name. (main): Call the new functions to check accessibility of inputs and output, before processing starts. * tests/misc/sort: Adjust to the changed error message. * tests/misc/sort-merge-fdlimit: Account for the earlier opened file descriptor of the specified output file. * tests/misc/sort-exit-early: A new test to exercise the improvements. * tests/Makefile.am: Reference the new test. * NEWS: Mention the improvement. Suggested-by: Bernhard Voelker
2012-06-20maint: sort: style adjustment to help clarify size determinationBernhard Voelker
* src/sort.c (default_sort_size): Move physmem code "down" to first use.
2012-05-16maint: add assertions to placate static analysis toolsJim Meyering
A static analysis tool (http://labs.oracle.com/projects/parfait/) produced some false positive diagnostics. Add assertions to help it understand that the code is correct. * src/stty.c: Include <assert.h>. (display_changed): Add an assertion to placate parfait. (display_all): Likewise. * src/sort.c: Include <assert.h>. (main): Add an assertion to placate parfait. * src/fmt.c: Include <assert.h>. (get_paragraph): Add an assertion to placate parfait.
2012-05-04maint: rely on gnulib's new sys_resource moduleJim Meyering
* bootstrap.conf (gnulib_modules): Add sys_resource. * src/sort.c: Remove #if HAVE_SYS_RESOURCE_H guard around inclusion of <sys/resource.h> and move the inclusion "up" into the alphabetized list of its peers. This also avoids a failure of the sc_prohibit_always_true_header_tests syntax-check rule. * m4/jm-macros.m4 (gl_CHECK_ALL_HEADERS): Remove sys/resource.h.
2012-02-25sort: default to physmem/8, not physmem/16Paul Eggert
* src/sort.c (default_sort_size): Don't divide advice by 2. Just divide the hard limits by 2. This matches the comments. Reported by Rogier Wolff in http://bugs.gnu.org/10877
2012-01-30maint: sort: remove the last uses of "'%s'" in diagnosticsJim Meyering
* src/sort.c (key_warnings): Use quote (quote_n, since there are two) rather than literal single quotes ('%s') in diagnostic.
2012-01-27maint: use single copyright year rangeJim Meyering
Run "make update-copyright".
2012-01-22maint: quote 'like this' or "like this", not `like this'Paul Eggert
* doc/coreutils.texi (Formatting the file names): coreutils now quotes 'like this'. * man/help2man: * src/timeout.c (usage): Quote 'like this' in diagnostics. * HACKING, Makefile.am, NEWS, README, README-hacking, TODO, cfg.mk: * doc/Makefile.am, doc/coreutils.texi, m4/jm-macros.m4: * man/Makefile.am, man/help2man, src/Makefile.am, src/copy.h: * src/extract-magic, src/ls.c, src/pinky.c, src/pr.c, src/sort.c: * src/split.c, src/timeout.c, src/who.c, tests/dd/skip-seek-past-file: * tests/pr/pr-tests: Quote 'like this' in commentary. * cfg.mk (old_NEWS_hash): Update due to changed old NEWS.
2012-01-09maint: src/*.[ch]: convert more `...' to '...'Jim Meyering
Run this (twice): git grep -E -l '`.+'\' src/*.[ch] \ |xargs perl -pi -e 's/`(.+?'\'')/'\''$1/'
2012-01-09maint: src/*.c: change remaining quotes (without embedded spaces)Jim Meyering
Run this (twice): git grep -E -l '`[^ ]+'\' src/*.c \ |xargs perl -pi -e 's/`([^ ]+'\'')/'\''$1/'
2012-01-09maint: convert `...' to '...' in --help outputJim Meyering
All affected lines end with \ or \n\, so run this command until it produces no new changes (4 times): git grep -E -l '`[^ ]+'\''.*\\' src \ |xargs perl -pi -e 's/`([^ ]+'\''.*\\)/'\''$1/'
2012-01-09maint: adjust quoting: emit '...', not `...' in diagnosticsJim Meyering
* src/csplit.c (parse_repeat_count, extract_regexp): As above. * src/date.c (main): Likewise. * src/ls.c (decode_switches): Likewise. * src/od.c (decode_one_format, main): Likewise. * src/pathchk.c (no_leading_hyphen): Likewise. * src/pr.c (main, getoptarg): Likewise. * src/rm.c (diagnose_leading_hyphen): Likewise. * src/sort.c (key_warnings, incompatible_options, main): Likewise. * src/stat.c (print_esc_char): Print '\x', not `\x' in diagnostic. * src/test.c (main): Likewise. * src/touch.c (main): Likewise. * src/tr.c (build_spec_list, validate, append_range): Likewise. * tests/misc/mktemp: This is an unusual case, since the affected string contains only the ` of an `...' string. So we change the long ` to a lone '. * tests/pr/pr-tests: Manual quote adapting fix-up. * tests/ln/hard-to-sym: Likewise. * tests/split/suffix-length: Likewise. * tests/mv/part-fail: Likewise. * tests/misc/chcon: Likewise. * tests/misc/stat-printf: Likewise.
2012-01-07maint: use new emit_try_help in place of equivalent fprintfJim Meyering
Run this command: perl -0777 -pi -e \ 's/fprintf \(stderr, _\("Try `%s --help.*\n.*;/emit_try_help ();/m'\ src/*.c
2012-01-01maint: update all copyright year number rangesJim Meyering
Run "make update-copyright".
2011-12-05maint: sort, stat: remove unused parametersJim Meyering
* src/sort.c (struct thread_args) [is_lo_child]: Remove member. (sortlines): Remove unused parameter, "is_lo_child". Update callers. * src/stat.c (out_epoch_sec): Mark unused parameter. (do_statfs, do_stat): Remove unused parameter, "terse". Update callers.
2011-11-16sort: clarify wording on -k syntaxEric Blake
* src/sort.c (usage): Use KEYDEF instead of POS, and call out the specific OPTS that can occur in KEYDEF. Based on a report by Lars Noodén, http://bugs.gnu.org/10019
2011-09-27sort: avoid a NaN-induced infloopJim Meyering
These commands would fail to terminate: yes -- -nan | head -156903 | sort -g > /dev/null echo nan > F; sort -m -g F F That can happen with any strtold implementation that includes uninitialized data in its return value. The problem arises in the mergefps function when bubble-sorting the two or more lines, each from one of the input streams being merged: compare(a,b) returns 64, yet compare(b,a) also returns a positive value. With a broken comparison function like that, the bubble sort never terminates. Why do the long-double bit strings corresponding to two identical "nan" strings not compare equal? Because some parts of the result are uninitialized and thus depend on the state of the stack. For more details, see http://bugs.gnu.org/9612. * src/sort.c (nan_compare): New function. (general_numcompare): Use it rather than bare memcmp. Reported by Aaron Denney in http://bugs.debian.org/642557. * NEWS (Bug fixes): Mention it. * tests/misc/sort-NaN-infloop: New file. * tests/Makefile.am (TESTS): Add it.
2011-07-02maint: use "const" and "pure" function attributes where possibleJim Meyering
* configure.ac (WARN_CFLAGS): Add -Wsuggest-attribute=const, -Wsuggest-attribute=pure and -Wsuggest-attribute=noreturn. (GNULIB_WARN_CFLAGS): But do not add them here... yet. * src/chown-core.h (chopt_free, uid_to_name): Add function attribute(s). * src/copy.c (is_ancestor, valid_options): Likewise. * src/copy.h (chown_failure_ok): Likewise. * src/dd.c (operand_matches, operand_is): Likewise. * src/df.c (selected_fstype, excluded_fstype): Likewise. * src/expr.c (null looks_like_integer): Likewise. * src/md5sum.c (hex_digits): Likewise. * src/od.c (get_lcm): Likewise. * src/pathchk.c (component_start, component_len): Likewise. * src/pinky.c (count_ampersands): Likewise. * src/pr.c (cols_ready_to_print): Likewise. * src/ptx.c (search_table): Likewise. * src/sort.c (find_unit_order): Likewise. * src/stty.c (mode_type_flag, string_to_baud, baud_to_value): Likewise. * src/system.h (gcd, lcm): Likewise. * src/tr.c (is_char_class_member, look_up_char_class): Likewise. (star_digits_closebracket): Likewise. * src/uniq.c (find_field): Likewise. * src/wc.c (compute_number_width): Likewise. * lib/xfts.h (cycle_warning_required): Likewise. * gl/lib/randint.h (randint_get_source): Likewise. * gl/lib/randperm.c (ceil_lg): Likewise. * gl/lib/randperm.h (randperm_bound): Likewise. * lib/strnumcmp.h (strintcmp): Likewise.
2011-05-19maint: correct typos involving misuse of "a" and "an"Jim Meyering
* NEWS: "an misleading" * src/expr.c: "a integer * src/ptx.c (find_occurs_in_text): "a end" * src/shred.c (do_wipefd): "a infinite" * src/sort.c (SUBTHREAD_LINES_HEURISTIC): "an dual-core" (compare_random): "an checksum" * cfg.mk (old_NEWS_hash): Update, since the typo was in old news.
2011-05-06sort: fix a contradictory --debug warningPádraig Brady
* src/sort.c (key_warn): `sort -k2,1n --debug` would output warnings about being both "zero width" and "spanning multiple fields". Suppress the latter one. * tests/misc/sort-debug-warn: Add a couple of test cases.
2011-03-16sort: avoid memory pressure of 130MB/thread when reading from pipeJim Meyering
* src/sort.c (INPUT_FILE_SIZE_GUESS): Decrease initial allocation factor used to size buffer used when reading a non-regular file. For motivation, see discussion here: http://thread.gmane.org/gmane.comp.gnu.coreutils.general/878/focus=887
2011-03-13sort: spawn fewer threads for small inputsJim Meyering
* src/sort.c (SUBTHREAD_LINES_HEURISTIC): Do not spawn a new thread for every 4 lines. Increase this from 4 to 128K. 128K lines seems appropriate for a 5-year-old dual-core laptop, but it is too low for some common combinations of short lines and/or newer systems. * NEWS (Bug fixes): Mention it.
2011-02-03sort: fix --debug key highlighting when key start after key endPádraig Brady
This case was overlooked in commit bdde34f9, 2010-08-05, "sort: tune and refactor --debug code, and fix minor underlining bug" * src/sort.c (debug_key): Don't adjust the key end when it's before the key start. * tests/misc/sort-debug-keys: Add a test case.
2011-01-07maint: suppress some clang scan-build warningsPádraig Brady
* src/pr.c (char_to_clump): Remove a dead store. * src/remove.c (fts_skip_tree): Likewise. * src/sort.c (key_warnings): Likewise. (sort): Suppress an uninitialized pointer warning.
2011-01-01maint: update all copyright year number rangesJim Meyering
Run "make update-copyright".
2010-12-28coreutils: keep lines within 80-column limitsPaul Eggert
* cfg.mk (LINE_LEN_MAX, FILTER_LONG_LINES): New macros. (sc_long_lines): New rule. * HACKING: Use shorter URLs to the same material. * doc/Makefile.am, doc/coreutils.texi, m4/boottime.m4: * man/help2man, man/stdbuf.x, src/Makefile.am, src/cat.c, src/copy.c: * src/cp.c, src/dd.c, src/df.c, src/du.c, src/groups.c, src/install.c: * src/ls.c, src/md5sum.c, src/mv.c, src/od.c, src/pinky.c, src/ptx.c: * src/readlink.c, src/remove.c, src/rmdir.c, src/setuidgid.c: * src/sort.c, src/tail.c, src/touch.c, tests/Coreutils.pm: * tests/cp/existing-perm-race, tests/cp/perm, tests/cp/preserve-gid: * tests/du/2g, tests/du/long-from-unreadable, tests/init.sh: * tests/install/basic-1, tests/ls/nameless-uid: * tests/ls/readdir-mountpoint-inode, tests/misc/chroot-credentials: * tests/misc/cut, tests/misc/date, tests/misc/join, tests/misc/md5sum: * tests/misc/sha1sum, tests/misc/sha224sum, tests/misc/sort: * tests/misc/sort-continue, tests/misc/sort-files0-from: * tests/misc/sort-rand, tests/misc/stdbuf, tests/misc/tr: * tests/misc/uniq, tests/mv/atomic, tests/mv/part-fail: * tests/mv/part-symlink, tests/mv/sticky-to-xpart, tests/pr/pr-tests: * tests/rm/fail-2eperm, tests/rm/interactive-always: Reformat to fit within 80 columns. * doc/Makefile.am (BAD_POSIX_PERL): New macro. * doc/coreutils.texi: Reword slightly, to make menus and index lines shorter. * src/md5sum.c: Redo --help output so that it fits within 79 columns, since that's a bit more portable and all the other --help strings fit in 79 columns.
2010-12-22sort: minor performance tweak with num_processorsPaul Eggert
* src/sort.c (main): Don't invoke num_processors twice.
2010-12-20maint: fix a typo in sort --parallel help messagePádraig Brady
Also fix up Chen Guo's contacts * src/sort.c (usage): Add a missing "of" * THANKS: Add Chen Guo * .mailmap: Add Chen Guo's UCLA address
2010-12-19sort: use at most 8 threads by defaultPádraig Brady
* src/sort.c (main): If --parallel isn't specified, restrict the number of threads to 8 by default. If the --parallel option is specified, then allow any number of threads to be set, independent of the number of processors on the system. * doc/coreutils.texi (sort invocation): Document the changes to determining the number of threads to use. Mention the memory overhead when using multiple threads. * tests/misc/sort-spinlock-abuse: Allow single core systems that support pthreads. * tests/misc/sort-stale-thread-mem: Likewise. * tests/misc/sort-unique-segv: Likewise. * NEWS: Mention the change in behaviour.
2010-12-16sort: do not generate thousands of subprocesses for 16-way mergePaul Eggert
Without this change, tests/misc/sort-compress-hang would consume more than 10,000 process slots on my RHEL 5.5 x86-64 server, making it likely for other applications to fail due to lack of process slots. With this change, the same benchmark causes 'sort' to consume at most 19 process slots. The change also improved wall-clock time by 2% and user+system time by 14% on that benchmark. * NEWS: Document this. * src/sort.c (MAX_PROCS_BEFORE_REAP): Remove. (reap_exited): Renamed from reap_some; this is a more accurate name, since "some" incorrectly implies that it reaps at least one process. All uses changed. (reap_some): New function: it *does* reap at least one process. (pipe_fork): Do not allow more than NMERGE + 2 subprocesses. (mergefps, sort): Omit check for exited processes: no longer needed, and anyway the code consumed too much CPU per line when 2 < nprocs.
2010-12-16sort: fix hang with sort --compressPaul Eggert
* NEWS: Document this. * src/sort.c (UNCOMPRESSED, UNREAPED, REAPED): New constants. (struct tempnode): New member 'state', to hold these constants. The pid member is now undefined if state == UNCOMPRESSED. (struct sortfile): Replace member 'pid' with member 'temp'. (uintptr): Remove. (proctab_hasher, proctab_comparator, register_proc, delete_proc): Proctab entries are now struct tempnode *, not pid_t, to handle the case where multiple tempnode objects correspond to the same pid. This avoids a race condition that can cause a hang. (register_proc): Arg is now struct tempnode *, not pid_t. All callers changed. (delete_proc): Set tempnode state to REAPED. (create_temp_file): No need to set pid member here; it's now done when the pid is known. (maybe_create_temp, create_temp): Remove PPID arg. Return struct tempnode *, not char *. All callers changed. (maybe_create_temp): Set node state to UNCOMPRESSED or UNREAPED. No need to set node->pid to 0. (open_temp): Replace NAME and PID args with a single TEMP arg. All callers changed. Wait only for unreaped children. (zaptemp): Wait for decompressor to finish before removing its temporary-file input. This avoids .nfsXXXX hassles with NFS and fixes a race (leading to a hang) regardless of NFS. (open_input_files): Adjust to new way of dealing with temp files and their subprocesses. * tests/Makefile.am (TESTS): Add misc/sort-compress-hang. * tests/misc/sort-compress-hang: New file.
2010-12-16sort: don't dump core when merging from input twicePaul Eggert
* NEWS: Document this. * src/sort.c (avoid_trashing_input): The previous fix to this function didn't fix all the problems with this code. Replace it with something simpler: just copy the input file. This doesn't change the number of files, so return void instead of the updated file count. Caller changed. * tests/misc/sort-merge-fdlimit: Test for the bug.
2010-12-14sort: fix very-unlikely buffer overrun when merging to input filePaul Eggert
* src/sort.c (avoid_trashing_input): Fix a typo that could cause a buffer overrun in theory. In practice this is extremely unlikely, as it requires running out of file descriptors in a small merge, presumably because some other process is hogging all the OS's file descriptors.
2010-12-13sort: fix some --compress reaper bugsPaul Eggert
* src/sort.c (uintptr): New type. (enum procstate, struct procnode, update_proc): Remove. (proctab_hasher, proctab_comparator, register_proc, wait_proc): (reap_some): The proctab is now simply a hash of process-IDs rather than of pointers to objects with reference counts and states; this is smaller and faster and easier to understand. (nprocs): Now pid_t, not size_t, since one cannot have more than PID_MAX children. (reap): If the argument is -1, wait; if 0 (a new value), do not. Delete pid from proctab as needed. Ignore children that are not in proctab, as they are from the program that exec'ed us and are irrelevant to our success or failure. (delete_proc, reap_all): New functions. (open_temp): Register the child. (sort): Clean up all children afterwards; without this patch, 'sort' sometimes missed failures in children due to race conditions. * tests/Makefile.am (TESTS): Add misc/sort-compress-proc. * tests/misc/sort-compress-proc: New file, to test for the bugs fixed above.
2010-12-11sort: syntax cleanupJim Meyering
* src/sort.c (xfopen, debug_key, sortlines, sort, main): Adjust formatting: fix misplaced braces, use consistent spacing, split a 2-stmt line.
2010-12-11sort: integer overflow checks in thread counts, etc.Paul Eggert
* src/sort.c (specify_nthreads, merge_tree_init, init_node): (queue_init, sortlines, struct thread_args, sort, main): Use size_t, not unsigned long int, for thread counts, since thread counts are now used to compute sizes. (specify_nthreads): Check for size_t overflow. (merge_tree_init, sort): Shorten name of local variable, for readability. (merge_tree_init): Move constants next to each other in product, so that the constant folding is easier to see. (init_node): Now static. Add 'restrict' only where it might be helpful for compiler optimization. (queue_init): 2nd arg is now nthreads, not "reserve", which is a bit harder to follow. All uses changed. (struct thread_args): Rename lo_child to is_lo_child, so that it's obvious to the reader when we're talking about this boolean as opposed to the new lo_child member of the other structure. All uses changed. (sort): Remove unused local variable end_node. (main): Don't allow large thread counts to cause undefined behavior later, due to integer overflow.
2010-12-11sort: preallocate merge tree nodes to heap.Chen Guo
* src/sort.c: (merge_tree_init) New function. Allocates memory for merge tree nodes. (merge_tree_destory) New function. (init_node) New function. (sortlines) Refactor node creation code to init_node. Remove now superfluous arguments. All callers changed. (sort) Initialize/destory merge tree. Refactor root node creation to merge_tree_init.
2010-12-11sort: comment fixPaul Eggert
* src/sort.c: Comment fix re spin locks.
2010-12-11sort: use mutexes, not spinlocks (avoid busy loop on blocked output)Chen Guo
Running a command like this on a multi-core system sort < big-file | less would peg all processors at near 100% utilization. * src/sort.c: (struct merge_node) Change member lock to mutex. All uses changed. * tests/Makefile.am (XFAIL_TESTS): Remove definition, now that this test passes once again. I.e., the sort-spinlock-abuse test no longer fails. * NEWS (Bug reports): Mention this. Reported by DJ Lucas in http://debbugs.gnu.org/7489.
2010-12-03sort: merge_queue -> queuePaul Eggert
* src/sort.c (struct thread_args, sortlines_thread, sortlines, sort): Rename "merge_queue" to "queue", for consistency with other functions that just use the name "queue" for these things.
2010-12-03sort: clarify queue_check_insertPaul Eggert
* src/sort.c (queue_check_insert): Clarify body a bit, and remove no-longer-needed comment.
2010-12-03sort: fix problems with merge node dest pointerPaul Eggert
* src/sort.c (mergelines_node): Return void, not size_t. All callers changed. Change *node->dest here, not in caller. Do not change node->dest: it's not needed and could cause problems on (mostly theoretical) hosts that do not allow adding integers to null pointers. (queue_check_insert_parent): Omit MERGED parameter; no longer needed. All callers changed.
2010-12-03sort: simplify write_uniquePaul Eggert
* src/sort.c (write_unique): Simplify slightly so that there is just one call to write_line, not two.
2010-12-03sort: put queue arg firstPaul Eggert
* src/sort.c (queue_check_insert, queue_check_insert_parent): Make the queue arg first, for consistency with other functions such as queue_insert that put the queue arg first. Rename from check_insert and update_parent, respectively. All callers changed.
2010-12-03sort: tune struct_merge_node slightlyPaul Eggert
* src/sort.c (struct merge_node): 'lock' is now the actual lock, not a pointer to the lock; there's no need for indirection here. Make 'level' unsigned int instead of size_t, since it is a bit-shift count; also, move it next to a bool so that it's more likely to take less space. All uses changed. (sortlines, sort): Spell out initialization instead of using an initializer. This makes the initializer a bit easier to understand, and avoids unnecessary stores into the spin lock.
2010-12-03sort: Clarify commentsPaul Eggert
* src/sort.c: Improve comments a bit.
2010-12-01sort: fix bug on 64-bit hosts with at least 32768 processorsPaul Eggert
* src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine with (say) 32-bit int and 64-bit size_t and when level == 15. Without this fix, on such a machine with 32768 or more processors, the level computation could overflow on large input, and this would result in division by zero.