summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJim Meyering <jim@meyering.net>2003-08-02 06:27:13 +0000
committerJim Meyering <jim@meyering.net>2003-08-02 06:27:13 +0000
commit93f9ffc6141ad028223a33e5ab0614aebbd4aa2e (patch)
treef3975b185821197ff277c7709293a1ade0dbf238
parent054819d79102d6ffd64165b7596e07ec9f58c689 (diff)
downloadcoreutils-93f9ffc6141ad028223a33e5ab0614aebbd4aa2e.tar.xz
Document in TODO Paul's desire to make sort faster (and how he
was foiled this time around). from Paul Eggert.
-rw-r--r--TODO18
1 files changed, 18 insertions, 0 deletions
diff --git a/TODO b/TODO
index 30ccb08a6..06aaddba5 100644
--- a/TODO
+++ b/TODO
@@ -106,3 +106,21 @@ Let GNU su use the `wheel' group if appropriate.
cut: when stdin is a tty, one has to hit EOF *twice* to get it to stop
look at sort patches from http://www.math.cas.cz/~kasal/sw/gnu/coreutils/
+
+sort: Investigate better sorting algorithms; see Knuth vol. 3.
+
+ We tried list merge sort, but it was about 50% slower than the
+ recursive algorithm currently used by sortlines, and it used more
+ comparisons. We're not sure why this was, as the theory suggests it
+ should do fewer comparisons, so perhaps this should be revisited.
+ List merge sort was implemented in the style of Knuth algorithm
+ 5.2.4L, with the optimization suggested by exercise 5.2.4-22. The
+ test case was 140,213,394 bytes, 426,4424 lines, text taken from the
+ GCC 3.3 distribution, sort.c compiled with GCC 2.95.4 and running on
+ Debian 3.0r1 GNU/Linux, 2.4GHz Pentium 4, single pass with no
+ temporary files and plenty of RAM.
+
+ Since comparisons seem to be the bottleneck, perhaps the best
+ algorithm to try next should be merge insertion. See Knuth section
+ 5.3.1, who credits Lester Ford, Jr. and Selmer Johnson, American
+ Mathematical Monthly 66 (1959), 387-389.