From b1307f5aff61788898697da537ed8b6e8fcdabc6 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Thu, 2 Mar 2000 12:24:00 +0000 Subject: (printf invocation): Describe new unicode syntax. From Bruno Haible. --- doc/sh-utils.texi | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) (limited to 'doc') diff --git a/doc/sh-utils.texi b/doc/sh-utils.texi index 5cbf5abe3..60b70448a 100644 --- a/doc/sh-utils.texi +++ b/doc/sh-utils.texi @@ -320,6 +320,21 @@ the @var{format} string. and @samp{\xhhh} as a hexadecimal number (if @var{hhh} is 1 to 3 hex digits) specifying a character to print. +@kindex \uhhhh +@kindex \Uhhhhhhhh +@code{printf} interprets two character syntaxes introduced in ISO C 99: +@samp{\u} for 16-bit Unicode characters, specified as 4 hex digits +@var{hhhh}, and @samp{\U} for 32-bit Unicode characters, specified as 8 hex +digits @var{hhhhhhhh}. @code{printf} outputs the Unicode characters +according to the LC_CTYPE part of the current locale, i.e. depending +on the values of the environment variables @code{LC_ALL}, @code{LC_CTYPE}, +@code{LANG}. + +The processing of @samp{\u} and @samp{\U} requires a full-featured +@code{iconv} facility. It is activated on systems with glibc 2.2 (or newer), +or when @code{libiconv} is installed prior to the sh-utils. Otherwise the +use of @samp{\u} and @samp{\U} will give an error message. + @kindex \c An additional escape, @samp{\c}, causes @code{printf} to produce no further output. @@ -327,6 +342,38 @@ further output. The only options are a lone @samp{--help} or @samp{--version}. @xref{Common options}. +The Unicode character syntaxes are useful for writing strings in a locale +independent way. For example, a string containing the Euro currency symbol + +@example +$ /usr/local/bin/printf '\u20AC 14.95' +@end example + +will be output correctly in all locales supporting the Euro symbol +(ISO-8859-15, UTF-8, and others). Similarly, a Chinese string + +@example +$ /usr/local/bin/printf '\u4e2d\u6587' +@end example + +will be output correctly in all chinese locales (GB2312, BIG5, UTF-8, etc). + +Note that in these examples, the full pathname of @code{printf} has been +given, to distinguish it from the GNU bash builtin function @code{printf}. + +For larger strings, you don't need to look up the hexadecimal code values of +each character one by one. ASCII characters mixed with \u escape sequences +is also known as the JAVA source file encoding. You can use GNU recode 3.5c +(or newer) to convert strings to this encoding. Here is how to convert a +piece of text into a shell script which will output this text in a locale +independent way: + +@example +$ LC_CTYPE=zh_CN.big5 /usr/local/bin/printf '\u4e2d\u6587\n' > sample.txt +$ recode BIG5..JAVA < sample.txt | \ + sed -e "s|^|/usr/local/bin/printf '|" -e "s|$|\\\\n'|" > sample.sh +@end example + @node yes invocation @section @code{yes}: Print a string until interrupted -- cgit v1.2.3-54-g00ecf