(printf invocation): Describe new unicode syntax.

From Bruno Haible.
author: Jim Meyering <jim@meyering.net> 2000-03-02 12:24:00 +0000
committer: Jim Meyering <jim@meyering.net> 2000-03-02 12:24:00 +0000
commit: b1307f5aff61788898697da537ed8b6e8fcdabc6 (patch)
tree: c46ae1e36b93ba1fc8ca17052af96ce8e84c836f /doc
parent: 3af9591bb824b314b0dd1a4251bb6ceb08fa1594 (diff)
download: coreutils-b1307f5aff61788898697da537ed8b6e8fcdabc6.tar.xz
1 files changed, 47 insertions, 0 deletions
diff --git a/doc/sh-utils.texi b/doc/sh-utils.texi
index 5cbf5abe3..60b70448a 100644
--- a/doc/sh-utils.texi
+++ b/doc/sh-utils.texi
@@ -320,6 +320,21 @@ the @var{format} string.
 and @samp{\xhhh} as a hexadecimal number (if @var{hhh} is 1 to 3 hex
 digits) specifying a character to print.
 
+@kindex \uhhhh
+@kindex \Uhhhhhhhh
+@code{printf} interprets two character syntaxes introduced in ISO C 99:
+@samp{\u} for 16-bit Unicode characters, specified as 4 hex digits
+@var{hhhh}, and @samp{\U} for 32-bit Unicode characters, specified as 8 hex
+digits @var{hhhhhhhh}. @code{printf} outputs the Unicode characters
+according to the LC_CTYPE part of the current locale, i.e. depending
+on the values of the environment variables @code{LC_ALL}, @code{LC_CTYPE},
+@code{LANG}.
+
+The processing of @samp{\u} and @samp{\U} requires a full-featured
+@code{iconv} facility. It is activated on systems with glibc 2.2 (or newer),
+or when @code{libiconv} is installed prior to the sh-utils. Otherwise the
+use of @samp{\u} and @samp{\U} will give an error message.
+
 @kindex \c
 An additional escape, @samp{\c}, causes @code{printf} to produce no
 further output.
@@ -327,6 +342,38 @@ further output.
 The only options are a lone @samp{--help} or
 @samp{--version}.  @xref{Common options}.
 
+The Unicode character syntaxes are useful for writing strings in a locale
+independent way. For example, a string containing the Euro currency symbol
+
+@example
+$ /usr/local/bin/printf '\u20AC 14.95'
+@end example
+
+will be output correctly in all locales supporting the Euro symbol
+(ISO-8859-15, UTF-8, and others). Similarly, a Chinese string
+
+@example
+$ /usr/local/bin/printf '\u4e2d\u6587'
+@end example
+
+will be output correctly in all chinese locales (GB2312, BIG5, UTF-8, etc).
+
+Note that in these examples, the full pathname of @code{printf} has been
+given, to distinguish it from the GNU bash builtin function @code{printf}.
+
+For larger strings, you don't need to look up the hexadecimal code values of
+each character one by one. ASCII characters mixed with \u escape sequences
+is also known as the JAVA source file encoding. You can use GNU recode 3.5c
+(or newer) to convert strings to this encoding. Here is how to convert a
+piece of text into a shell script which will output this text in a locale
+independent way:
+
+@example
+$ LC_CTYPE=zh_CN.big5 /usr/local/bin/printf '\u4e2d\u6587\n' > sample.txt
+$ recode BIG5..JAVA < sample.txt | \
+  sed -e "s|^|/usr/local/bin/printf '|" -e "s|$|\\\\n'|" > sample.sh
+@end example
+
 
 @node yes invocation
 @section @code{yes}: Print a string until interrupted
author	Jim Meyering <jim@meyering.net>	2000-03-02 12:24:00 +0000
committer	Jim Meyering <jim@meyering.net>	2000-03-02 12:24:00 +0000
commit	b1307f5aff61788898697da537ed8b6e8fcdabc6 (patch)
tree	c46ae1e36b93ba1fc8ca17052af96ce8e84c836f /doc
parent	3af9591bb824b314b0dd1a4251bb6ceb08fa1594 (diff)
download	coreutils-b1307f5aff61788898697da537ed8b6e8fcdabc6.tar.xz