summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2010-12-16sort: do not generate thousands of subprocesses for 16-way mergePaul Eggert
Without this change, tests/misc/sort-compress-hang would consume more than 10,000 process slots on my RHEL 5.5 x86-64 server, making it likely for other applications to fail due to lack of process slots. With this change, the same benchmark causes 'sort' to consume at most 19 process slots. The change also improved wall-clock time by 2% and user+system time by 14% on that benchmark. * NEWS: Document this. * src/sort.c (MAX_PROCS_BEFORE_REAP): Remove. (reap_exited): Renamed from reap_some; this is a more accurate name, since "some" incorrectly implies that it reaps at least one process. All uses changed. (reap_some): New function: it *does* reap at least one process. (pipe_fork): Do not allow more than NMERGE + 2 subprocesses. (mergefps, sort): Omit check for exited processes: no longer needed, and anyway the code consumed too much CPU per line when 2 < nprocs.
2010-12-16sort: fix hang with sort --compressPaul Eggert
* NEWS: Document this. * src/sort.c (UNCOMPRESSED, UNREAPED, REAPED): New constants. (struct tempnode): New member 'state', to hold these constants. The pid member is now undefined if state == UNCOMPRESSED. (struct sortfile): Replace member 'pid' with member 'temp'. (uintptr): Remove. (proctab_hasher, proctab_comparator, register_proc, delete_proc): Proctab entries are now struct tempnode *, not pid_t, to handle the case where multiple tempnode objects correspond to the same pid. This avoids a race condition that can cause a hang. (register_proc): Arg is now struct tempnode *, not pid_t. All callers changed. (delete_proc): Set tempnode state to REAPED. (create_temp_file): No need to set pid member here; it's now done when the pid is known. (maybe_create_temp, create_temp): Remove PPID arg. Return struct tempnode *, not char *. All callers changed. (maybe_create_temp): Set node state to UNCOMPRESSED or UNREAPED. No need to set node->pid to 0. (open_temp): Replace NAME and PID args with a single TEMP arg. All callers changed. Wait only for unreaped children. (zaptemp): Wait for decompressor to finish before removing its temporary-file input. This avoids .nfsXXXX hassles with NFS and fixes a race (leading to a hang) regardless of NFS. (open_input_files): Adjust to new way of dealing with temp files and their subprocesses. * tests/Makefile.am (TESTS): Add misc/sort-compress-hang. * tests/misc/sort-compress-hang: New file.
2010-12-16sort: don't dump core when merging from input twicePaul Eggert
* NEWS: Document this. * src/sort.c (avoid_trashing_input): The previous fix to this function didn't fix all the problems with this code. Replace it with something simpler: just copy the input file. This doesn't change the number of files, so return void instead of the updated file count. Caller changed. * tests/misc/sort-merge-fdlimit: Test for the bug.
2010-12-14doc: tail: semi-deprecate --sleep-interval and --max-unchanged-statsJim Meyering
Those options are useful only on systems that lack inotify support and in the unusual event that a system with inotify support must resort to polling. * src/tail.c (usage): Note that the --max-unchanged-stats=N and --sleep-interval=N options are rarely useful on systems with inotify support. * doc/coreutils.texi (tail invocation): Likewise.
2010-12-14sort: fix very-unlikely buffer overrun when merging to input filePaul Eggert
* src/sort.c (avoid_trashing_input): Fix a typo that could cause a buffer overrun in theory. In practice this is extremely unlikely, as it requires running out of file descriptors in a small merge, presumably because some other process is hogging all the OS's file descriptors.
2010-12-13sort: fix some --compress reaper bugsPaul Eggert
* src/sort.c (uintptr): New type. (enum procstate, struct procnode, update_proc): Remove. (proctab_hasher, proctab_comparator, register_proc, wait_proc): (reap_some): The proctab is now simply a hash of process-IDs rather than of pointers to objects with reference counts and states; this is smaller and faster and easier to understand. (nprocs): Now pid_t, not size_t, since one cannot have more than PID_MAX children. (reap): If the argument is -1, wait; if 0 (a new value), do not. Delete pid from proctab as needed. Ignore children that are not in proctab, as they are from the program that exec'ed us and are irrelevant to our success or failure. (delete_proc, reap_all): New functions. (open_temp): Register the child. (sort): Clean up all children afterwards; without this patch, 'sort' sometimes missed failures in children due to race conditions. * tests/Makefile.am (TESTS): Add misc/sort-compress-proc. * tests/misc/sort-compress-proc: New file, to test for the bugs fixed above.
2010-12-11sort: syntax cleanupJim Meyering
* src/sort.c (xfopen, debug_key, sortlines, sort, main): Adjust formatting: fix misplaced braces, use consistent spacing, split a 2-stmt line.
2010-12-11sort: integer overflow checks in thread counts, etc.Paul Eggert
* src/sort.c (specify_nthreads, merge_tree_init, init_node): (queue_init, sortlines, struct thread_args, sort, main): Use size_t, not unsigned long int, for thread counts, since thread counts are now used to compute sizes. (specify_nthreads): Check for size_t overflow. (merge_tree_init, sort): Shorten name of local variable, for readability. (merge_tree_init): Move constants next to each other in product, so that the constant folding is easier to see. (init_node): Now static. Add 'restrict' only where it might be helpful for compiler optimization. (queue_init): 2nd arg is now nthreads, not "reserve", which is a bit harder to follow. All uses changed. (struct thread_args): Rename lo_child to is_lo_child, so that it's obvious to the reader when we're talking about this boolean as opposed to the new lo_child member of the other structure. All uses changed. (sort): Remove unused local variable end_node. (main): Don't allow large thread counts to cause undefined behavior later, due to integer overflow.
2010-12-11sort: preallocate merge tree nodes to heap.Chen Guo
* src/sort.c: (merge_tree_init) New function. Allocates memory for merge tree nodes. (merge_tree_destory) New function. (init_node) New function. (sortlines) Refactor node creation code to init_node. Remove now superfluous arguments. All callers changed. (sort) Initialize/destory merge tree. Refactor root node creation to merge_tree_init.
2010-12-11sort: comment fixPaul Eggert
* src/sort.c: Comment fix re spin locks.
2010-12-11sort: use mutexes, not spinlocks (avoid busy loop on blocked output)Chen Guo
Running a command like this on a multi-core system sort < big-file | less would peg all processors at near 100% utilization. * src/sort.c: (struct merge_node) Change member lock to mutex. All uses changed. * tests/Makefile.am (XFAIL_TESTS): Remove definition, now that this test passes once again. I.e., the sort-spinlock-abuse test no longer fails. * NEWS (Bug reports): Mention this. Reported by DJ Lucas in http://debbugs.gnu.org/7489.
2010-12-08split: fix a case where --elide-empty causes invalid chunkingPádraig Brady
When -n l/N is used and long lines are present that both span partitions and multiple buffers, one would get inconsistent chunk sizes. * src/split.c (main): Add a new undocumented ---io-blksize option to support full testing with varied buffer sizes. (cwrite): Refactor most handling of --elide-empty to here. (bytes_split): Remove handling of --elide-empty. (lines_chunk_split): Likewise. The specific issue here was the first handling of elide_empty_files interfered with the replenishing of the input buffer. * test/misc/split-lchunk: Add -e and the new ---io-blksize combinations to the test.
2010-12-03sort: merge_queue -> queuePaul Eggert
* src/sort.c (struct thread_args, sortlines_thread, sortlines, sort): Rename "merge_queue" to "queue", for consistency with other functions that just use the name "queue" for these things.
2010-12-03sort: clarify queue_check_insertPaul Eggert
* src/sort.c (queue_check_insert): Clarify body a bit, and remove no-longer-needed comment.
2010-12-03sort: fix problems with merge node dest pointerPaul Eggert
* src/sort.c (mergelines_node): Return void, not size_t. All callers changed. Change *node->dest here, not in caller. Do not change node->dest: it's not needed and could cause problems on (mostly theoretical) hosts that do not allow adding integers to null pointers. (queue_check_insert_parent): Omit MERGED parameter; no longer needed. All callers changed.
2010-12-03sort: simplify write_uniquePaul Eggert
* src/sort.c (write_unique): Simplify slightly so that there is just one call to write_line, not two.
2010-12-03sort: put queue arg firstPaul Eggert
* src/sort.c (queue_check_insert, queue_check_insert_parent): Make the queue arg first, for consistency with other functions such as queue_insert that put the queue arg first. Rename from check_insert and update_parent, respectively. All callers changed.
2010-12-03sort: tune struct_merge_node slightlyPaul Eggert
* src/sort.c (struct merge_node): 'lock' is now the actual lock, not a pointer to the lock; there's no need for indirection here. Make 'level' unsigned int instead of size_t, since it is a bit-shift count; also, move it next to a bool so that it's more likely to take less space. All uses changed. (sortlines, sort): Spell out initialization instead of using an initializer. This makes the initializer a bit easier to understand, and avoids unnecessary stores into the spin lock.
2010-12-03sort: Clarify commentsPaul Eggert
* src/sort.c: Improve comments a bit.
2010-12-01sort: fix bug on 64-bit hosts with at least 32768 processorsPaul Eggert
* src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine with (say) 32-bit int and 64-bit size_t and when level == 15. Without this fix, on such a machine with 32768 or more processors, the level computation could overflow on large input, and this would result in division by zero.
2010-12-01sort -u: fix a thread-race pointer corruption bugPaul Eggert
* src/sort.c (write_unique): Save the entire "struct line", not just a pointer to one. Otherwise, with a multi-thread run, sometimes, with some inputs, fillbuf would would win a race and clobber a "saved->text" pointer in one thread just before it was dereferenced in a comparison in another thread. * NEWS (Bug fixes): Mention it.
2010-11-27tsort: suppress a valgrind memory leak warningPádraig Brady
* src/tsort.c (tsort): Unconditionally invoking the free() doesn't increase scalability, so do it only with -Dlint
2010-11-22cp: give a better diagnostic for nonexistent dest/Paul Eggert
This patch was written by Jim Meyering and myself. * src/copy.c (copy_reg): Turn EISDIR to ENOTDIR to improve the quality of diagnostics for commands like "cp a nosuch/". Reported by Марк Коренберг and Alan Curry in the thread starting at: http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00178.html * THANKS: Update. * tests/mv/trailing-slash: Add a test.
2010-11-22split: add --number to generate a particular number of filesChen Guo
* src/split.c (usage, long_options, main): New options --number, --unbuffered, --elide-empty-files. (set_suffix_length): New function to auto increase suffix length to handle a specified number of files. (create): New function. Refactored from cwrite() and ofile_open(). (bytes_split): Add max_files argument to support byte chunking. (lines_chunk_split): New function. Split file into chunks of lines. (bytes_chunk_extract): New function. Extract a chunk of file. (of_info): New struct. Used by functions lines_rr and ofile_open to keep track of file descriptors associated with output files. (ofile_open): New function. Shuffle file descriptors when there are more output files than available file descriptors. (lines_rr): New function to distribute lines round-robin to files. (chunk_parse): New function. Parses K/N syntax. * tests/misc/split-bchunk: New test for byte chunking. * tests/misc/split-lchunk: New test for line delimited chunking. * tests/misc/split-rchunk: New test for round-robin chunking. * tests/Makefile.am: Reference new tests. * tests/misc/split-fail: Add failure scenarios for new options. * tests/misc/split-l: Fix a typo. s/ln/split/. * doc/coreutils.texi (split invocation): Document --number. * NEWS: Mention the new feature. * .mailmap: Map new email address for shortlog. Signed-off-by: Pádraig Brady <P@draigBrady.com>
2010-11-18od: fix bugs in displaying floating-point valuesPaul Eggert
* NEWS: Describe patch. * bootstrap.conf (gnulib_modules): Add ftoastr. * src/od.c: Include ftoastr.h, not float.h. (FLT_DIG, DBL_DIG): Remove. No need to verify LDBL_DIG. (FMT_BYTES_ALLOCATED): No need to worry about floating point now, since this format is no longer used for floating point. (PRINT_FIELDS): New macro, with most of the guts of the old PRINT_TYPE. (PRINT_TYPE): Rewrite to use PRINT_FIELDS. (PRINT_FLOATTYPE): New macro. This uses the new functions from ftoastr. (print_float, print_double, print_long_double): Reimplement using PRINT_FLOATTYPE. (decode_one_format): Calculate field widths based on ftoastr-supplied macros. * tests/Makefile.am (TESTS): Add misc/od-float. * tests/misc/od-float: New file.
2010-11-16split: fail immediately if impossible to create a large filePádraig Brady
* src/split.c (main): Error if -[bC] value > OFF_T_MAX * tests/misc/split-fail: Adjust for the new lower limits
2010-11-16truncate: fix a very unlikely case for undiagnosed errorsPádraig Brady
src/truncate.c (main): Use a bool to store if an error occurred, rather than an int, to protect against overflow. (do_ftruncate): Likewise. Also change 0/false to mean failure rather than success.
2010-11-16maint: fix a new -Wpointer-sign gcc warningPádraig Brady
* src/csplit.c (max_out): Fix a new warning introduced with commit 6568b173, 2010-11-10, "csplit: do not rely on..."
2010-11-13stat: do not provide variable precision time stampsJim Meyering
* src/stat.c: Don't include fstimeprec.c. (out_epoch_sec): Don't call fstimeprec. * NEWS: Update description. * doc/coreutils.texi: Likewise.
2010-11-11csplit: do not rely on undefined behavior in printf formatsPaul Eggert
* doc/coreutils.texi (csplit invocation): Say that %d and %i are aliases for %u. * src/csplit.c (FLAG_THOUSANDS, FLAG_ALTERNATIVE): New constants. (get_format_flags): Now take char const * and int * and return size_t. It now stores info about the flags instead of merely scanning them. Also, it handles '0' correctly. Drop support for the undocumented '+' and ' ' flags since the value is unsigned. Add support for the (undocumented) "'" flag. All uses changed. (get_format_width, get_format_prec): Remove. (check_format_conv_type): Renamed from get_format_conv_type, with a different signature. It now converts the format to one that is compatible with unsigned int, and checks flags. All uses changed. (max_out): Have snprintf compute the number of bytes needed rather than attempting to do it ourselves (which doesn't work portably with outlandish formats such as %4294967296d). (check_format_conv_type, main): Check for overflow in size calculations. Don't assume size_t fits in unsigned int. * tests/misc/csplit: Check for proper handling of flags, with %0#6.3x. Coreutils 8.6 mishandles this somewhat-weird example.
2010-11-11csplit: fix a memory leak per input bufferPádraig Brady
* src/csplit.c (free_buffer): Also free the line offsets buffers (remove_line): Also free the containing structure * tests/misc/csplit-heap: A new test to trigger with leaks of this magnitude. * tests/Makefile.am: Reference the new test * NEWS: Mention the fix Reported by David Hofstee
2010-11-10csplit: avoid buffer overrun when writing more than 999 filesJim Meyering
Without this fix, seq 1000 | csplit - /./ '{*}' would write the NUL-terminated file name, xx1000, into a buffer of size 6. * src/csplit.c (main): Use properly sized file name buffer. * NEWS (Bug fixes): Mention it. * tests/misc/csplit-1000: New test to trigger the bug. * tests/Makefile.am (TESTS): Add misc/csplit-1000.
2010-11-06stat: do not rely on undefined behavior in printf formatsPaul Eggert
* src/stat.c (digits, printf_flags): New static vars. (make_format): New function. (out_string, out_int, out_uint, out_uint_o, out_uint_x): (out_minus_zero): Use it to avoid undefined behavior when invoking printf. (print_it): Check for invalid conversion specifications such as %..X and %1-X, which would otherwise rely on undefined behavior when invoking printf. * tests/misc/stat-nanoseconds: Check that the "I" printf flag doesn't mess up in the C locale, as it formerly did on non-GNU hosts.
2010-11-06stat: use e.g. %.3X instead of %X.%3:X for sub-second precisionPaul Eggert
* NEWS: Document this. * doc/coreutils.texi (stat invocation): Likewise. * gl/lib/fstimeprec.c, gl/lib/fstimeprec.h, gl/modules/fstimeprec: * gl/modules/fstimeprec-tests, gl/tests/test-fstimeprec.c: New files. * bootstrap.conf (gnulib_modules): Add fstimeprec. * src/stat.c: Include fstimeprec.h. Don't include xstrtol.h. (decimal_point, decimal_point_len): New static vars. (main): Initialize them. (epoch_sec, out_ns): Remove. (out_int, out_uint): Now returns whatever printf returned. (out_minus_zero, out_epoch_secs): New functions. (print_stat): Use out_epoch_sec instead of out_ns and epoch_sec. (print_stat, print_it, usage): Remove the %:X-style formats. * tests/misc/stat-nanoseconds: Set TZ=UTC0 to avoid problems with weird time zones. Use a time stamp near the Epoch so that we don't have to worry about leap seconds. Redo test cases to match new behavior. * tests/touch/60-seconds: Change %Y.%:Y to %.9Y, to adjust to new behavior.
2010-11-03stat: revert %X-%Y-%Z change; use e.g., %:X to print fractional secondsJim Meyering
This reverts part of the recent commit 9069af45, "stat: print timestamps to full resolution", which made %X, %Y, %Z print floating point numbers. We prefer to retain portability of %X, %Y and %Z uses, while still providing access to full-resolution time stamps via modified format strings. Also make the new %W consistent. * src/stat.c: Include "xstrtol.h". (print_it): Accept a new %...:[XYZ] format directive, e.g., %:X, to print the nanoseconds portion of the corresponding time. For example, %3.3:Y prints the zero-padded, truncated, milliseconds part of the time of last modification. (print_it): Update print_func signature to match. (neg_to_zero): New helper function. (epoch_time): Remove function; replace with... (epoch_sec): New function; use timetostr. (out_ns): New function. Use "09" only when no other modifier is specified. (print_statfs): Change type of "m" to unsigned int, now that it must accommodate values larger than 255. (print_stat): Likewise. Map :X to a code of 'X' + 256. Likewise for Y, Z and W. (usage): Update. * tests/touch/60-seconds: Use %Y.%:Y in place of %Y. * tests/misc/stat-nanoseconds: New file. * tests/Makefile.am (TESTS): Add it. * NEWS (Changes in behavior): Mention this. With improvements by Pádraig Brady. Thanks to Andreas Schwab for raising the issue.
2010-10-28maint: remove an unnecessary FIXME commentPatrick W. Plusnick II
* src/seq.c (terminator): This does not need to be specifiable via an option. Remove the FIXME comment.
2010-10-27cp: make --attributes-only override --reflink completelyPádraig Brady
* doc/coreutils.texi (cp invocation): Change the description slightly so as users might not immediately discount using this option. Mention that --reflink is overridden by the other linking options and --attributes-only, and give an example where this might be useful. * src/copy.c (copy_internal): Bypass the reflink if --attributes-only is specifed. * tests/cp/reflink-perm: Ensure both --reflink modes are overridden by --attributes-only. * NEWS: Mention the change in behavior. Reported by Jim Meyering.
2010-10-25date: correct typos in date --helpTobias Quathamer
* src/date.c (usage): Use "e.g." correctly.
2010-10-25tail: support rechecking currently missing remote dirsPádraig Brady
src/tail.c (main): As an optimization, don't bother checking for stdin or remote files, when ---disable-inotify is specified. To improve the fix in commit 61b77891, set the disable_inotify flag when we fall back to polling, so that we recheck remote files. NEWS: Mention the fix
2010-10-23du: don't print junk when diagnosing out-of-range time stampsPaul Eggert
* src/du.c (show_date): Fix call to fputs with a buffer that contains some uninitialized data. * tests/Makefile.am (TESTS): Add du/big-timestamp. * tests/du/bigtime: New file, which checks for the bug.
2010-10-19md5sum: print a summary warning for improperly formatted linesBenno Schulenberg
And remove the now-superfluous totals from the other two warnings, so the plurals will also work in other languages than English. * src/md5sum.c (digest_check): Change as above. * tests/misc/md5sum (check-quiet2): Adjust accordingly.
2010-10-16fold: fix fadvise hintAndreas Schwab
* src/fold.c (fold_file): Apply fadvise to istream, not stdin. This bug would have inhibited the fadvise optimization when not reading from standard input.
2010-10-14bug#7213: [PATCH] sort: fix buffer overrun on 32-bit hosts when warning re ↵Paul Eggert
obsolete keys * src/sort.c (key_warnings): Local buffer should be of size INT_BUFSIZE_BOUND (uintmax_t), not INT_BUFSIZE_BOUND (sword). This bug was discovered by running 'make check' on a 32-bit Solaris 8 sparc host, using Sun cc. I saw several other instances of invoking umaxtostr on a buffer declared to be of size INT_BUFSIZE_BOUND (VAR), and these instances should at some point be replaced by INT_BUFSIZE_BOUND (uintmax_t) too, as that's a less error-prone style.
2010-10-13install: avoid warning with Solaris 10 ccPaul Eggert
* src/install.c (extra_mode): Don't assign ~S_IRWXUGO & ~S_IFMT to a mode_t variable, as the number might be too big to fit. Solaris 10 cc warns about this, and the C standard says it has undefined behavior.
2010-10-13sort: fix unportable cast of unsigned char * -> char *Paul Eggert
* src/sort.c (fold_toupper): Change this back from char to unsigned char, fixing a portability issue introduced in commit 59e2e55d0f154a388adc9bac37d2b45f2ba971f8 dated February 26, as the C Standard doesn't let you convert from unsigned char * to char * without a cast, and the (in theory more portable) style here is to convert char values, not pointer values. (getmonth): Convert char to unsigned char when needed for comparison.
2010-10-12tail: fix checking of currently unavailable directoriesPádraig Brady
* src/tail.c (tail_forever_inotify): Handle the case where tail --follow=name with inotify, is not able to add a watch on a specified directory. This may happen due to inotify resource limits or if the directory is currently missing or inaccessible. In all these cases, revert to polling which will try to reopen the file later. Note inotify returns ENOSPC when it runs out of resources, and instead we report a particular error message, lest users think one of their file systems is full. (main): Document another caveat with using inotify, where we currently don't recheck directories recreated after the initial watch is setup. * tests/tail-2/F-vs-rename: Fix the endless loop triggered by the above issue. * tests/tail-2/inotify-hash-abuse: Likewise. * tests/tail-2/wait: Don't fail in the resource exhaustion case. * tests/tail-2/F-vs-missing: A new test for this failure mode which was until now just triggered on older buggy linux kernels which returned ENOSPC constantly from inotify_add_watch(). * NEWS: Mention the fix.
2010-10-07split: fix reporting of read errorsPádraig Brady
The bug was introduced with commit 23f6d41f, 19-02-2003. * src/split.c (bytes_split, lines_split, line_bytes_split): Correctly check the return from full_read(). * tests/misc/split-fail: Ensure split fails when it can't read its input. * NEWS: Mention the fix.
2010-10-05build: complete the rename of get_dateEric Blake
* gnulib: Update to latest. * src/date.c (includes, batch_convert, main): Track rename. * src/touch.c (includes, get_reldate): Likewise. * doc/coreutils.texi (Top, Date input formats): Likewise. * bootstrap.conf (gnulib_modules): Likewise. * doc/Makefile.am (EXTRA_DIST): Likewise. * doc/.gitignore: Likewise. * bootstrap: Synchronize from upstream.
2010-10-05stat: drop %C support when printing file system detailsEric Blake
* src/stat.c (print_statfs, usage): Drop %C, since it applies to files, not file systems. (out_file_context): Match style of other out_* functions. (print_stat): Update caller. * doc/coreutils.texi (stat invocation): Document %C. * NEWS: Document the change.
2010-10-05stat: adjust the printing of SELinux contextPádraig Brady
* src/stat.c (default_format): Don't print SELinux context when in file system (-f) mode, as the context is associated with the file, not the file system. Fix logic inversion, so that in terse mode, %C is included only when is_selinux_enabled and not vice versa.