sort: parallelize internal sort

This patch is by Gene Auyeung, Chris Dickens, Chen Guo, and Mike Nichols, based off of a patch by Paul Eggert, Glen Lenker, et. al., with a basic heap implementation based off of the GDSL heap, originally by Nicolas Darnis. The number of sorts done in parallel is limited to the number of available processors by default, or can be further restricted with the --parallel option. On a dual-die, 8 core Intel Xeon, results show sorting with 8 threads is almost 4 times faster than using a single thread. Timings when sorting a 96MB file: THREADS TIME (s) 1 5.10 2 2.87 4 1.75 8 1.31 Single threaded sorting has also been improved, especially for cheaper comparison operations: COMMAND BEFORE (s) AFTER (s) sort 8.822 8.716 sort -g 10.336 10.222 sort -n 3.077 2.961 LANG=C sort 2.169 2.066 * bootstrap.conf: Add heap, pthread. * coreutils.texi (sort): Describe the new --parallel option. * gl/lib/heap.c: New file. Very basic heap implementation. * gl/lib/heap.h: New file. * gl/modules/heap: New file. * src/Makefile.am: Add LIB_PTHREAD. * src/sort.c: Include heap.h, nproc.h, pthread.h. (MAX_MERGE): New macro. (SUBTHREAD_LINES_HEURISTIC, PARALLEL_OPTION): New constants. (MERGE_END, MERGE_ROOT): New constants. (struct merge_node): New struct. (struct merge_node_queue): New struct. (sortlines temp): Remove declaration. (usage, long_options, main): New option, --parallel. (specify_nthreads): New function. (mergelines): New signature, to emphasize the fact that the HI area must be part of the destination. All callers changed. (sequential_sort): New function, renamed from sortlines. Merge in the functionality of sortlines_temp. (compare_nodes): New function. (lock_node, unlock_node): New functions. (queue_destroy): New function. (queue_init): New function. (queue_insert): New function. (queue_pop): New function. (write_unique): New function. (mergelines_node): New function. (check_insert): New function. (update_parent): New function. (merge_loop): New function. (sortlines): Rewrite to support and use parallelism, with a new signature. All callers changed. (struct thread_args): New struct. (sortlines_thread): New function. (sortlines_temp): Remove. (sort): New argument NTHREADS. All uses changed. Output moved to mergelines_node. (main): disable threading if we are sorting at random. * tests/Makefile.am (TESTS): Add misc/sort-benchmark-random. * tests/misc/sort-benchmark-random: New file. Signed-off-by: Pádraig Brady <P@draigBrady.com>
author: Chen Guo <chenguo4@yahoo.com> 2010-07-09 08:03:50 +0100
committer: Pádraig Brady <P@draigBrady.com> 2010-07-13 01:44:46 +0100
commit: 9face836f36c507f01a7d7a33138c5a303e3b1df (patch)
tree: e71865d7bb7882ba3b8d06a8cff2c38ad9511baf /doc/coreutils.texi
parent: f6e8c18f8d2077ea84aef679852622c8bd73789c (diff)
download: coreutils-9face836f36c507f01a7d7a33138c5a303e3b1df.tar.xz
1 files changed, 15 insertions, 0 deletions
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 21cf36d1d..942978f33 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4068,6 +4068,14 @@ have a large sort or merge that is I/O-bound, you can often improve
 performance by using this option to specify directories on different
 disks and controllers.
 
+@item --parallel=@var{n}
+@opindex --parallel
+@cindex multithreaded sort
+Limit the number of sorts run in parallel to @var{n}. By default,
+@var{n} is set to the number of available processors, and values
+greater than that are reduced to that limit. Also see
+@ref{nproc invocation}.
+
 @item -u
 @itemx --unique
 @opindex -u
@@ -4163,6 +4171,13 @@ sort -n -r
 @end example
 
 @item
+Run no more that 4 sorts concurrently, using a buffer size of 10M.
+
+@example
+sort --parallel=4 -S 10M
+@end example
+
+@item
 Sort alphabetically, omitting the first and second fields
 and the blanks at the start of the third field.
 This uses a single key composed of the characters beginning
author	Chen Guo <chenguo4@yahoo.com>	2010-07-09 08:03:50 +0100
committer	Pádraig Brady <P@draigBrady.com>	2010-07-13 01:44:46 +0100
commit	9face836f36c507f01a7d7a33138c5a303e3b1df (patch)
tree	e71865d7bb7882ba3b8d06a8cff2c38ad9511baf /doc/coreutils.texi
parent	f6e8c18f8d2077ea84aef679852622c8bd73789c (diff)
download	coreutils-9face836f36c507f01a7d7a33138c5a303e3b1df.tar.xz