summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2010-09-29tr: fix various issues with case conversionPádraig Brady
This valid translation spec aborted: LC_ALL=en_US.iso-8859-1 tr '[:upper:]- ' '[:lower:]_' This invalid translation spec aborted: LC_ALL=en_US.iso-8859-1 tr '[:upper:] ' '[:lower:]' This was caused by commit 6efd1046, 05-01-2008, "Avoid tr case-conversion failure in some locales" This misaligned conversion spec was allowed: LC_ALL=C tr 'A-Y[:lower:]' 'a-z[:upper:]' This was caused by commit af5d0c36, 21-10-2007, "tr: do not reject an unmatched [:lower:] or [:upper:] in SET1" This misaligned spec was allowed by extending the class: LC_ALL=C tr '[:upper:] ' '[:lower:]' * src/tr.c (validate_case_classes): A new function to check alignment of case conversion classes. Also it adjusts the length of the sets so that locales with different numbers of upper and lower case characters, don't cause issues. (string2_extend): Disallow extending the case conversion class as in the above example. That is locale dependent and most likely not what the user wants. (validate): Do the simple test for "restricted" char classes earlier, so we don't redundantly do more expensive validation. (main): Remove the case class validation, and simplify. * tests/misc/tr-case-class: A new test to test the various alignment and locale issues, associated with case conversion. * tests/misc/tr: Move case conversion tests to new tr-case-class. * tests/Makefile.am: Reference the new test. * NEWS: Mention the fixes.
2010-09-20sort: destroy spin locks portablyPaul Eggert
* src/sort.c (sortlines, sort): Use pthread_spin_destroy when a spin lock is no longer used. This isn't needed on GNU/Linux or Solaris, but POSIX says it may free up resources on some platforms.
2010-09-18build: use gnulib's new termios moduleJim Meyering
With it, we can remove the two sole tests of HAVE_TERMIOS_H. * bootstrap.conf (gnulib_modules): Add termios. * src/ls.c: Don't test HAVE_TERMIOS_H. * src/stty.c: Likewise. * m4/jm-macros.m4 (gl_CHECK_ALL_TYPES): Remove configure-time test for termios.h.
2010-09-17maint: update to latest gnulibEric Blake
* gnulib: Update to latest. * src/copy.c (copy_reg): Use fdutimens instead of gl_futimens. * src/touch.c (touch): Adjust parameter order. * tests/init.sh: Resync from upstream.
2010-09-17rm: remove no-op -d optionEric Blake
* src/rm.c (long_opts, main): Resolve a fixme. * NEWS: Document the change. Based on a report by William Plusnick.
2010-09-17maint: update to latest gnulibEric Blake
* gnulib: Update to latest. * bootstrap.conf (gnulib_modules): Add fdutimensat. * src/touch.c (touch): Use fdutimensat instead of gl_futimens.
2010-09-13dircolors: add rxvt-unicode-256color terminal typeDmitry V. Levin
rxvt-unicode introduced new terminal type: http://cvs.schmorp.de/rxvt-unicode/src/rxvt.h?r1=1.398&r2=1.399 * src/dircolors.hin: Add rxvt-unicode-256color terminal type. Reported by Alexey I. Froloff in <http://bugzilla.altlinux.org/24052>. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
2010-09-01tac: suppress technically unneeded "free"Jim Meyering
* src/tac.c (main): Guard final free with #ifdef lint. Suggested by Paul Eggert.
2010-09-01join: improve performance when operating on whole linesPádraig Brady
Following on from commit f86bb696, 01-02-2010, "join: make -t '' operate on the whole line". Bypassing the delimiter search in this case, gives about an 8% performance boost. * src/join (xfields): Don't bother looking for '\n' in the data, which we know won't be present.
2010-08-28tac: avoid double freeJim Meyering
* src/tac.c (main): Reading a line longer than 16KiB would cause tac to realloc its primary buffer. Then, just before exit, tac would mistakenly free the original (now free'd) buffer. This bug was introduced by commit be6c13e7, "maint: always free a buffer, to avoid even semblance of a leak". * NEWS (Bug fixes): Mention it. * tests/misc/tac (double-free): New test, to exercise this. Reported by Salvo Tomaselli in <http://bugs.debian.org/594666>.
2010-08-27stat: add %m to output the mount point for a fileAaron Burgemeister
* src/find-mount-point.c: A new file refactoring find_mount_point() out from df.c * src/find-mount-point.h: Likewise. * src/df.c: Use the new find-mount-point module. * src/stat.c (print_stat): Handle the new %m format. (find_bind_mount): A new function to return the bind mount for a file if any. (out_mount_mount): Print the bind mount for a file, or else the standard mount point given by the find-mount-point module. (usage): Document the %m format directive. * src/Makefile.am: Reference the refactored find-mount-point.c * po/POTFILES.in: Add find_mount_point.c to the translation list * doc/coreutils.texi (stat invocation): Document %m, and how it may differ from the mount point that df outputs. * test/misc/stat-mount: A new test to correlate mount points output from df and stat. * tests/Makefile.am: Reference the new test. * NEWS: Mention the new feature * THANKS: Add the author Signed-off-by: Pádraig Brady <P@draigBrady.com>
2010-08-25df: always print the device name for bind mounted filesPádraig Brady
* src/df (show_point): Remove the optimization for comparing the specified path with the device name, as this produces inconsistent results in the presence of bind mounts. For bind mounts, the device name is populated with the bind mount target. * NEWS: Mention the change in behavior.
2010-08-23stat: fix a small memory leak with %NPádraig Brady
* src/stat.c (print_stat): Free the buffer returned from areadlink_with_size().
2010-08-10sort, who: prefer free+malloc to realloc when contents are irrelevantPaul Eggert
This change was prompted by the previous one: I audited the code looking for similar examples. Too bad valgrind doesn't catch this. * src/sort.c (check, mergefps): xrealloc -> free + xmalloc * src/who.c (print_user): Likewise.
2010-08-10sort: free/xmalloc rather than xreallocPaul Eggert
* src/sort.c (compare_random): Use free/xmalloc rather than xrealloc, since the old buffer contents need not be preserved. Also, don't fail if the guessed-sized malloc fails. Suggested by Bruno Haible.
2010-08-10sort: avoid gcc warning: explicitly ignore strtold resultJim Meyering
* src/sort.c: Include "ignore-value.h". (debug_key): Use ignore_value.
2010-08-09dircolors: add screen.rxvt & terminator to TERM listMike Frysinger
Resolves bug#6793. * src/dircolors.hin: Add screen.rxvt & terminator. Signed-off-by: Mike Frysinger <vapier@gentoo.org>
2010-08-08sort: speed up -R with long lines in hard localesPaul Eggert
* src/sort.c (compare_random): Guess that the output will be 3X the input. This avoids the overhead of calling strxfrm twice on typical implementations. Suggested by Bruno Haible.
2010-08-06sort: support all combinations of -d, -f, -i, -R, and -VPaul Eggert
* NEWS: Document this. * src/sort.c (getmonth): Omit LEN arg, as MONTH is now null-terminated. (compare_random): Don't null-terminate keys, as caller now does that. (compare_version): Remove. (debug_key): Null-terminate string for getmonth. (keycompare): Support combining -R with any of -d, -f, -i, -V. Also, support combining -V with any of -d, -i. (check_ordering_compatibility): Allow newly-supported combinations. * tests/misc/sort (02q, 02r, 02s): New tests, for new combinations. (incompat2): Now test -nR, since -fR are now compatible.
2010-08-05sort: tune and refactor --debug code, and fix minor underlining bugPaul Eggert
Formerly, the 'compare' function and some of its subroutines had a debugging flag, which caused them to output underlines. This change refactors the code so that debugging output is more-separated from the actual sorting. In the process, the change fixes a minor error in the debugging output. The change shortens the source code and executable size a tad, and improves CPU performance by 2.4% on my platform with a simple benchmark (C locale, line sorting, no debug). * src/sort.c (long_double, strtold): Move back to prelude, since they're now used by multiple functions again. (unit_order): Move to file scope, since it's now used by two functions. (find_unit_order, human_numcompare, numcompare, general_numcompare): Remove endptr parameter. All callers changed. (human_numcompare): Args are now const pointers. (getmonth): Endptr is now non-const. (key_numeric): Move up, since it's needed earlier. (debug_key): Take a line and a key as argument, instead of having the caller figure out where the field is. (debug_line): New function. (keycompare, compare): Omit debug parameter; debug output now done elsewhere. All callers changed. (write_line): Renamed from write_bytes; all callers changed. Use debug_line (not 'compare') to output debug info. Use a slightly faster check for whether output file is stdout. (check): Don't do debugging output; it's not that useful here, and it confuses the code. (main): Check for incompatibility between -c and --debug. Use standard diagnostic for incompatible options. * tests/misc/sort-debug-keys: Fix test case: "--Mi-1" is not a number, so its first character should not be underlined when debugging a numeric sort.
2010-08-04sort: -R now uses less memory on long lines with internal NULsPaul Eggert
* lib/Makefile.am (libcoreutils_a_SOURCES): Remove xmemxfrm.c, xmemxfrm.h. * lib/memxfrm.c, lib/memxfrm.h, lib/xmemxfrm.c, lib/xmemxfrm.h: Remove. * m4/memxfrm.m4: Likewise. * m4/prereq.m4 (gl_PREREQ): Remove gl_MEMXFRM. * po/POTFILES.in: Remove lib/xmemxfrm.c. * src/sort.c: Don't include xmemxfrm.h. (cmp_hashes): Remove. (xstrxfrm): New function. (compare_random): If a line contains NULs, don't create a big buffer that contains the strxfrm output of each string in the line. Instead, accumulate checksums and differences as we go, so that at any one time we have to store at most the output of a single strxfrm call when processing the line. This removes the need for an memxfrm function.
2010-08-03sort: fix bug in --debug when \0 is followed by \tPaul Eggert
* src/sort.c (debug_width): New function, which does not stop counting tabs at \0, and also invokes mbsnwidth. Stamp out strnlen! (count_tabs): Remove. (debug_key): Use debug_width instead of mbsnwidth and count_tabs. * tests/misc/sort-debug-keys: Check that \0 and \t intermix.
2010-08-02sort: revert recent -h changes and use a more-conservative approachPaul Eggert
* NEWS: Document changes to sort -h, which are now minor with respect to the pre-July-30th version. * doc/coreutils.texi (sort invocation): Likewise. The documentation now describes how -h comparison is done rather than being vague with border cases. * src/sort.c (long_double, strtold): Move back to general_numcompare. (LD, compute_human): Remove. (find_unit_order): Remove THOU_SEP parameter, since thousands separators are now allowed by all callers. Revert to previous behavior of sorting by suffix, and returning the order rather than 2 * order + binary, since we no longer care whether binary powers are being used. However, treat all zeros the same, instead of sorting 0M before 0G; this is more consistent with the desired behavior of sorting -1G before -1M. * tests/misc/sort (h1, h3, h6): Adjust to match mostly-reverted behavior. However, check that all zeros sort together. * tests/misc/sort-debug-keys: Omit a "_", since the trailing "i" in "1234Gi" is no longer part of the key.
2010-07-30sort: -h now handles comparisons such as 6000K vs 5M and 5MiB vs 5MBPaul Eggert
* NEWS: Document changes to sort -h. * doc/coreutils.texi (sort invocation): Likewise. * src/sort.c (long_double, strtold): Move to prelude, since they're now used by multiple functions. (LD): New macro. (struct keyfield.iec_present): Remove this member. All uses removed. (check_mixed_SI_IEC): Remove. This code was busted in the presence of multiple threads, as it had a race condition. (find_unit_order): Remove arg KEY; add arg THOU_SEP; arg ENDPTR is now char ** rather than char const **. Return an integer that distinguishes decimal from binary powers. Parse the number consistently with the intersection of strtold and strnumcmp. Set *ENDPTR unconditionally. (compute_human): New static function. (human_numcompare): Remove arg KEY. Remove 'const' from other args. Use strnumcmp if possible, but fall back on floating point if not. (numcompare, general_numcompare): Arg EA is now char ** rather than char const **. (numcompare): Adjust to new find_unit_order signature and behavior. (keycompare): Adjus to new human_numcompare signature. * tests/misc/sort (h1, h3, h4, h6): Adjust to new behavior. * tests/misc/sort-debug-keys: Likewise.
2010-07-27sort: fix --debug display with very large offsetsPaul Eggert
* src/sort.c (mark_key): Don't assume offset <= INT_MAX. Make the code a bit clearer when width != 0.
2010-07-26sort: fix bug with EOF at buffer refillPaul Eggert
* src/sort.c (fillbuf): Don't append eol unless the line is nonempty. This fixes a bug that was partly but not completely fixed by the aadc67dfdb47f28bb8d1fa5e0fe0f52e2a8c51bf commit (dated July 15). * tests/misc/sort (realloc-buf-2): New test, which catches this bug on 64-bit hosts.
2010-07-26sort: omit some "inline"sPaul Eggert
* src/sort.c (mergelines, queue_destroy, queue_init, queue_insert): (queue_pop, write_unique, mergelines_node, check_insert): (update_parent): No longer inline; these uses of "inline" seemed unlikely to help performance much.
2010-07-26sort: don't assume ASCII when parsing K, M, G suffixesPaul Eggert
* src/sort.c (find_unit_order): Don't assume ASCII.
2010-07-25sort: omit unnecessary mutex unlock+lock; simplify heap accessPaul R. Eggert
* src/sort.c (queue_pop): Omit unnecessary unlock+lock after pthread_cond_wait returns. Don't access "count" member of the heap; any efficiency gains should be quite minor, the access complicates this code, and "count" should be private anyway.
2010-07-25sort: omit 'restrict' in doubtful casesPaul R. Eggert
* src/sort.c (lock_node, unlock_node, queue_destroy, queue_init): (queue_pop): Omit 'restrict'; it shouldn't help here, as these functions have just one pointer parameter and don't access static storage. (queue_insert, check_insert, update_parent): Omit 'restrict', as the pointer types differ, and are not char * or unsigned char *, and therefore can't alias. (write_unique): Omit 'restrict', as the pointer types are all read-only. (merge_loop, sortlines): Omit 'restrict', as any performance advantages are extremely unlikely and it's not worth cluttering the code for that. (struct thread_args): Omit 'restrict': this seems to be incorrect. It's unlikely for 'restrict' to be correct inside a typedef.
2010-07-25sort: omit unnecessary castsPaul R. Eggert
* src/sort.c (inittables, general_numcompare, compare_nodes): (queue_init, queue_pop): Omit casts that are not needed, typically because they are between void * and some other pointer type.
2010-07-25sort: use more-consistent style with constPaul R. Eggert
* src/sort.c (proctab_hasher, proctab_comparator, stream_open, xfopen): (open_temp, zaptemp, struct_month_cmp, begfield, limfield): (find_unit_order, human_numcompare, numcompare, general_numcompare): (count_tabs, keycompare, compare, compare_nodes, lock_node): (unlock_node, queue_destroy, queue_init, queue_insert, queue_pop): (write_unique, mergelines_node, check_insert, update_parent): (merge_loop, sortlines, struct thread_args, set_ordering): Prefer the style "T const" to "const T". * gl/lib/heap.h (struct heap, heap_alloc): Likewise. * gl/lib/heap.c (heap_default_compare, heapify_down, heapify_up): (heap_alloc): Likewise.
2010-07-24du: tune, and fix some -L bugs with dangling or cyclic symlinksPaul R. Eggert
* src/du.c (process_file): Avoid recalculation of hashes and of file-exclusion for directories. Do not descend into the same directory more than once, unless -l is given; this is faster. Calculate stat buffer lazily, since it need not be computed at all for excluded files. Count space if FTS_ERR, since stat buffer is always valid then. No need for 'print' local variable. (main): Use FTS_NOSTAT. Use FTS_TIGHT_CYCLE_CHECK only when not hashing everything, since process_file finds cycles on its own when hashing everything. * tests/du/deref: Add test cases for -L bugs.
2010-07-22provide POSIX_FADV_SEQUENTIAL hint to appropriate utilsPádraig Brady
Following on from commit dae35bac, 01-03-2010, "sort: inform the system about our input access pattern" apply the same hint to all appropriate utils. This currently gives around a 5% speedup for reading large files from fast flash devices on GNU/Linux. * src/base64.c: Call fadvise (..., FADVISE_SEQUENTIAL); * src/cat.c: Likewise. * src/cksum.c: Likewise. * src/comm.c: Likewise. * src/cut.c: Likewise. * src/expand.c: Likewise. * src/fmt.c: Likewise. * src/fold.c: Likewise. * src/join.c: Likewise. * src/md5sum.c: Likewise. * src/nl.c: Likewise. * src/paste.c: Likewise. * src/pr.c: Likewise. * src/ptx.c: Likewise. * src/shuf.c: Likewise. * src/sum.c: Likewise. * src/tee.c: Likewise. * src/tr.c: Likewise. * src/tsort.c: Likewise. * src/unexpand.c: Likewise. * src/uniq.c: Likewise. * src/wc.c: Likewise, unless we don't actually read().
2010-07-22fadvise: new module providing a simpler interface to posix_fadvisePádraig Brady
* bootstrap.conf: Include the new module * gl/lib/fadvise.c: Provide a simpler interface to posix_fadvise. (fadvise): Provide hint to the whole file associated with a stream. (fdadvise): Provide hint to the specific portion of a file associated with a file descriptor. * gl/lib/fadvise.h: Redefine POSIX_FADV_* to FADVISE_* enums. * gl/modules/fadvise: New file. * m4/jm-macros.m4: Remove the no longer needed posix_fadvise check. * .x-sc_program_name: Exclude test-fadvise.c from this check. * gl/tests/test-fadvise (main): New test program. * gl/modules/fadvise-testss: A new index to reference the tests. * src/sort.c (stream_open): Use the new interface. * src/dd.c (iwrite): Likewise.
2010-07-19sort: -R no longer disables multithreadingPaul R. Eggert
* src/sort.c (random_md5_state): New static var. (random_md5_state_init): New function, to initialize random_md5_state. (random_state, randread_source): Remove. (cmp_hashes): Use random_md5_state rather than random_state. Break ties using memcmp, not by getting more randomness. If MD5 collisions turn into a problem in practice, we should simply use a better checksum. (main): If -R is given, call random_md5_state_init rather than going single-threaded.
2010-07-16randread: don't require -lrtPaul R. Eggert
Programs like 'sort' were linking to -lrt in order to get clock_gettime, but this was misguided: it wasted considerable resources while gaining at most 10 bits of entropy. Almost nobody needs the entropy, and there are better ways to get much better entropy for people who do need it. * gl/lib/rand-isaac.c (isaac_seed): Include <sys/time.h> not "gethrxtime.h". (isaac_seed): Use gettimeofday rather than gethrxtime. * gl/modules/randread (Depends-on): Depend on gettimeofday and not gethrxtime. * src/Makefile.am (mktemp_LDADD, shred_LDADD, shuf_LDADD, sort_LDADD): (tac_LDADD): Omit $(LIB_GETHRXTIME); no longer needed.
2010-07-15sort: fix a bug with sort -u and xmemcoll0, and tune keycomparePaul R. Eggert
* src/sort.c (keycompare): Use xmemcoll0, as it avoids a couple of stores. (write_bytes): Leave the buffer the way we found it, as it might be used again for a later comparison, if -u is used.
2010-07-15sort: fix a bug when sorting unterminated linesPádraig Brady
* src/sort.c (fillbuf): The previous commit incorrectly terminated the buffer when the last line of input didn't contain a terminating character.
2010-07-15sort: speed up default full line sortingChen Guo
Don't write NUL after the comparison buffers on each compare, which increases performance by about 3% for short lines on a pentium-m with gcc-4.4.1 * src/sort.c: (fillbuf): Delimit input items with NUL. (write_bytes): Restore the item delimiter char which was replaced with NUL in fillbuf().
2010-07-14maint: omit $(POW_LIB) when linking, as this is no longer neededPaul R. Eggert
* src/Makefile.am (printf_LDADD, seq_LDADD, sleep_LDADD, sort_LDADD): (tail_LDADD, uptime_LDADD): Omit $(POW_LIB), as it's no longer needed due to recent gnulib changes, where the strtod module no longer uses the pow function. strtold needs pow only because it's sometimes aliased to strtod. See http://lists.gnu.org/archive/html/bug-gnulib/2010-07/msg00076.html
2010-07-13sort: parallelize internal sortChen Guo
This patch is by Gene Auyeung, Chris Dickens, Chen Guo, and Mike Nichols, based off of a patch by Paul Eggert, Glen Lenker, et. al., with a basic heap implementation based off of the GDSL heap, originally by Nicolas Darnis. The number of sorts done in parallel is limited to the number of available processors by default, or can be further restricted with the --parallel option. On a dual-die, 8 core Intel Xeon, results show sorting with 8 threads is almost 4 times faster than using a single thread. Timings when sorting a 96MB file: THREADS TIME (s) 1 5.10 2 2.87 4 1.75 8 1.31 Single threaded sorting has also been improved, especially for cheaper comparison operations: COMMAND BEFORE (s) AFTER (s) sort 8.822 8.716 sort -g 10.336 10.222 sort -n 3.077 2.961 LANG=C sort 2.169 2.066 * bootstrap.conf: Add heap, pthread. * coreutils.texi (sort): Describe the new --parallel option. * gl/lib/heap.c: New file. Very basic heap implementation. * gl/lib/heap.h: New file. * gl/modules/heap: New file. * src/Makefile.am: Add LIB_PTHREAD. * src/sort.c: Include heap.h, nproc.h, pthread.h. (MAX_MERGE): New macro. (SUBTHREAD_LINES_HEURISTIC, PARALLEL_OPTION): New constants. (MERGE_END, MERGE_ROOT): New constants. (struct merge_node): New struct. (struct merge_node_queue): New struct. (sortlines temp): Remove declaration. (usage, long_options, main): New option, --parallel. (specify_nthreads): New function. (mergelines): New signature, to emphasize the fact that the HI area must be part of the destination. All callers changed. (sequential_sort): New function, renamed from sortlines. Merge in the functionality of sortlines_temp. (compare_nodes): New function. (lock_node, unlock_node): New functions. (queue_destroy): New function. (queue_init): New function. (queue_insert): New function. (queue_pop): New function. (write_unique): New function. (mergelines_node): New function. (check_insert): New function. (update_parent): New function. (merge_loop): New function. (sortlines): Rewrite to support and use parallelism, with a new signature. All callers changed. (struct thread_args): New struct. (sortlines_thread): New function. (sortlines_temp): Remove. (sort): New argument NTHREADS. All uses changed. Output moved to mergelines_node. (main): disable threading if we are sorting at random. * tests/Makefile.am (TESTS): Add misc/sort-benchmark-random. * tests/misc/sort-benchmark-random: New file. Signed-off-by: Pádraig Brady <P@draigBrady.com>
2010-07-12dd: also spell out size on memory exhaustionPaul R. Eggert
* src/dd.c (dd_copy): Use requested blocksize (not adjusted) in diagnostic, to forestall user complaints that the numbers don't match exactly. Report both exact and human-readable sizes, using a message format that is consistent with both "BBBB bytes (N XB) copied" in dd.c and "memory exhausted" in lib/xmalloc.c.
2010-07-09chcon, chmod, chown, du: don't translate "%s"Paul Eggert
* src/chcon.c (process_file): Replace _("%s") with "%s". * src/chmod.c (process_file): Likewise. * src/chown-core.c (change_file_owner): Likewise. * src/du.c (process_file): Likewise.
2010-07-06du: Hash with a mechanism that's simpler and takes less memory.Paul Eggert
* gl/lib/dev-map.c, gl/lib/dev-map.h, gl/modules/dev-map: Remove. * gl/lib/ino-map.c, gl/lib/ino-map.h, gl/modules/ino-map: New files. * gl/modules/dev-map-tests, gl/tests/test-dev-map.c: Remove. * gl/modules/ino-map-tests, gl/tests/test-ino-map.c: New files. * gl/lib/di-set.h (struct di_set): Renamed from struct di_set_state, and now private. All uses changed. (_ATTRIBUTE_NONNULL_): Don't assume C99. (di_set_alloc): Renamed from di_set_init, with no size arg. Now allocates the object rather than initializing it. For now, this no longer takes an initial size; we can put this back later if it is needed. * gl/lib/di-set.c: Include hash.h, ino-map.h, and limits.h instead of stdio.h, assert.h, stdint.h, sys/types.h (di-set.h includes that now), sys/stat.h, and verify.h. (N_DEV_BITS_4, N_INO_BITS_4, N_DEV_BITS_8, N_INO_BITS_8): Remove. (struct dev_ino_4, struct dev_ino_8, struct dev_ino_full): Remove. (enum di_mode): Remove. (hashint): New typedef. (HASHINT_MAX, LARGE_INO_MIN): New macros. (struct di_ent): Now maps a dev_t to a inode set, instead of containing a union. (struct dev_map_ent): Remove. (struct di_set): New type. (is_encoded_ptr, decode_ptr, di_ent_create): Remove. (di_ent_hash, di_ent_compare, di_ent_free, di_set_alloc, di_set_free): (di_set_insert): Adjust to new representation. (di_ino_hash, map_device, map_inode_number): New functions. * gl/modules/di-set (Depends-on): Replace dev-map with ino-map. Remove 'verify'. * gl/tests/test-di-set.c: Adjust to the above changes to API. * src/du.c (INITIAL_DI_SET_SIZE): Remove. (hash_ins, main): Adjust to new di-set API.
2010-07-05stat: getfilecon failure now evokes nonzero exit statusJim Meyering
Add comments and adjust interfaces to allow low-level failure to propagate out to callers. * src/stat.c (out_file_context): Return bool, not void, so we can tell callers about failure. (print_statfs, print_stat, print_it): Propagate failure to caller. (do_statfs): Propagate print_it failure to caller. (do_stat): Likewise. I nearly forgot to update do_stat to propagate print_it failure, and it compiled just fine in spite of that. To prevent possibility of a repeat, I've marked each function that returns non-void with ATTRIBUTE_WARN_UNUSED_RESULT.
2010-07-05system.h: define ATTRIBUTE_WARN_UNUSED_RESULTJim Meyering
* src/system.h (ATTRIBUTE_WARN_UNUSED_RESULT): Define.
2010-07-04du: increase the initial dev-inode set sizeJim Meyering
* src/du.c (INITIAL_DI_SET_SIZE): Increase to the prime just under 1024. This gives a speed-up of about 2% when processing a tree containing 100,000 files, each with a link count greater than 1, all pointing to files in some other tree.
2010-07-04du: use less than half as much memory when tracking hard linksJim Meyering
When processing a hard-linked file, du must keep track of the file's device and inode numbers in order to avoid counting its storage more than once. When du would process many hard linked files -- as are created by some backup tools -- the amount of memory required for the supporting data structure could become prohibitively large. This patch takes advantage of the fact that the amount of information in the numbers of the typical dev,inode pair is far less than even 32 bits, and hence usually fits in the space of a pointer, be it 32 or 64 bits wide. A typical du traversal examines files on no more than a handful of distinct devices, so the device number can be encoded in just a few bits. Similarly, few inode numbers use all of the high bits in an ino_t. Before, we would represent the dev,inode pair using a naive struct, and allocate space for each. Thus, an entry in the hash table consisted of a pointer (to that struct) and a "next" pointer. With this change, we encode the dev,inode information and put those bits in place of the pointer, and thus do away with the need to allocate additional space for each dev,inode pair. * src/du.c: Include "di-set.h". Don't include "hash.h"; it's no longer used. (INITIAL_DI_SET_SIZE): Define. (di_set): New global, to replace "htab". (entry_hash, entry_compare, hash_init): Remove functions. (hash_ins): Use di-set functions, rather than ones from the hash module. (main): Likewise. * bootstrap.conf (gnulib_modules): Add the new di-set module. * NEWS (New features): Mention it.
2010-07-03du: don't miscount duplicate directories or link-count-1 filesPaul Eggert
* NEWS: Mention this. * src/du.c (hash_all): New static var. (process_file): Use it. (main): Set it. * tests/du/hard-link: Add a couple of test cases to help make sure this bug stays squashed. * tests/du/files0-from: Adjust existing tests to reflect change in semantics with duplicate arguments.