summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorJim Meyering <jim@meyering.net>2000-08-11 09:11:20 +0000
committerJim Meyering <jim@meyering.net>2000-08-11 09:11:20 +0000
commit2ed0078725f672ef530d7d2fa71300571061c2d0 (patch)
treeedb5ad987a4a8020af473f25571c81141e09bc22 /doc
parentf64320db7bf9d4c2c84c0a50ab39314b77bb6cd1 (diff)
downloadcoreutils-2ed0078725f672ef530d7d2fa71300571061c2d0.tar.xz
Recommend against the System V syntax
for tr ranges, and don't use it in examples. Use POSIX classes rather than ranges, for portability.
Diffstat (limited to 'doc')
-rw-r--r--doc/textutils.texi39
1 files changed, 21 insertions, 18 deletions
diff --git a/doc/textutils.texi b/doc/textutils.texi
index 4ead6f371..b29543e09 100644
--- a/doc/textutils.texi
+++ b/doc/textutils.texi
@@ -3425,11 +3425,14 @@ A backslash.
The notation @samp{@var{m}-@var{n}} expands to all of the characters
from @var{m} through @var{n}, in ascending order. @var{m} should
collate before @var{n}; if it doesn't, an error results. As an example,
-@samp{0-9} is the same as @samp{0123456789}. Although GNU @code{tr}
-does not support the System V syntax that uses square brackets to
-enclose ranges, translations specified in that format will still work as
-long as the brackets in @var{string1} correspond to identical brackets
-in @var{string2}.
+@samp{0-9} is the same as @samp{0123456789}.
+
+GNU @code{tr} does not support the System V syntax that uses square
+brackets to enclose ranges. Translations specified in that format
+sometimes work as expected, since the brackets are often transliterated
+to themselves. However, they should be avoided because they sometimes
+behave unexpectedly. For example, @samp{tr -d '[0-9]'} deletes brackets
+as well as digits.
Many historically common and even accepted uses of ranges are not
portable. For example, on @sc{ebcdic} hosts using the @samp{A-Z}
@@ -4110,7 +4113,7 @@ characters. Normally it is used for things like mapping upper case to
lower case:
@example
-$ echo ThIs ExAmPlE HaS MIXED case! | tr '[A-Z]' '[a-z]'
+$ echo ThIs ExAmPlE HaS MIXED case! | tr '[:upper:]' '[:lower:]'
this example has mixed case!
@end example
@@ -4169,7 +4172,7 @@ The first step is to change the case of all the letters in our input file
to one case. ``The'' and ``the'' are the same word when doing counting.
@example
-$ tr '[A-Z]' '[a-z]' < whats.gnu | ...
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | ...
@end example
The next step is to get rid of punctuation. Quoted words and unquoted words
@@ -4177,7 +4180,7 @@ should be treated identically; it's easiest to just get the punctuation out of
the way.
@smallexample
-$ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' | ...
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' | ...
@end smallexample
The second @code{tr} command operates on the complement of the listed
@@ -4192,8 +4195,8 @@ next step is break the data apart so that we have one word per line. This
makes the counting operation much easier, as we will see shortly.
@smallexample
-$ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
-> tr -s '[ ]' '\012' | ...
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
+> tr -s ' ' '\012' | ...
@end smallexample
This command turns blanks into newlines. The @samp{-s} option squeezes
@@ -4206,8 +4209,8 @@ We now have data consisting of one word per line, no punctuation, all one
case. We're ready to count each word:
@smallexample
-$ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
-> tr -s '[ ]' '\012' | sort | uniq -c | ...
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
+> tr -s ' ' '\012' | sort | uniq -c | ...
@end smallexample
At this point, the data might look something like this:
@@ -4238,8 +4241,8 @@ reverse the order of the sort
The final pipeline looks like this:
@smallexample
-$ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
-> tr -s '[ ]' '\012' | sort | uniq -c | sort -nr
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
+> tr -s ' ' '\012' | sort | uniq -c | sort -nr
156 the
60 a
58 to
@@ -4265,16 +4268,16 @@ Now, how to compare our file with the dictionary? As before, we generate
a sorted list of words, one per line:
@smallexample
-$ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
-> tr -s '[ ]' '\012' | sort -u | ...
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
+> tr -s ' ' '\012' | sort -u | ...
@end smallexample
Now, all we need is a list of words that are @emph{not} in the
dictionary. Here is where the @code{comm} command comes in.
@smallexample
-$ tr '[A-Z]' '[a-z]' < whats.gnu | tr -cd '[A-Za-z0-9_ \012]' |
-> tr -s '[ ]' '\012' | sort -u |
+$ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \012' |
+> tr -s ' ' '\012' | sort -u |
> comm -23 - /usr/lib/ispell/ispell.words
@end smallexample