summaryrefslogtreecommitdiff
path: root/imap/docs/rfc/rfc4790.txt
diff options
context:
space:
mode:
Diffstat (limited to 'imap/docs/rfc/rfc4790.txt')
-rw-r--r--imap/docs/rfc/rfc4790.txt1459
1 files changed, 0 insertions, 1459 deletions
diff --git a/imap/docs/rfc/rfc4790.txt b/imap/docs/rfc/rfc4790.txt
deleted file mode 100644
index d58191c0..00000000
--- a/imap/docs/rfc/rfc4790.txt
+++ /dev/null
@@ -1,1459 +0,0 @@
-
-
-
-
-
-
-Network Working Group C. Newman
-Request for Comments: 4790 Sun Microsystems
-Category: Standards Track M. Duerst
- Aoyama Gakuin University
- A. Gulbrandsen
- Oryx
- March 2007
-
-
- Internet Application Protocol Collation Registry
-
-Status of This Memo
-
- This document specifies an Internet standards track protocol for the
- Internet community, and requests discussion and suggestions for
- improvements. Please refer to the current edition of the "Internet
- Official Protocol Standards" (STD 1) for the standardization state
- and status of this protocol. Distribution of this memo is unlimited.
-
-Copyright Notice
-
- Copyright (C) The IETF Trust (2007).
-
-Abstract
-
- Many Internet application protocols include string-based lookup,
- searching, or sorting operations. However, the problem space for
- searching and sorting international strings is large, not fully
- explored, and is outside the area of expertise for the Internet
- Engineering Task Force (IETF). Rather than attempt to solve such a
- large problem, this specification creates an abstraction framework so
- that application protocols can precisely identify a comparison
- function, and the repertoire of comparison functions can be extended
- in the future.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 1]
-
-RFC 4790 Collation Registry March 2007
-
-
-Table of Contents
-
- 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
- 1.1. Conventions Used in This Document . . . . . . . . . . . . 4
- 2. Collation Definition and Purpose . . . . . . . . . . . . . . . 4
- 2.1. Definition . . . . . . . . . . . . . . . . . . . . . . . . 4
- 2.2. Purpose . . . . . . . . . . . . . . . . . . . . . . . . . 4
- 2.3. Some Other Terms Used in this Document . . . . . . . . . . 5
- 2.4. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 5
- 3. Collation Identifier Syntax . . . . . . . . . . . . . . . . . 6
- 3.1. Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . 6
- 3.2. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 6
- 3.3. Ordering Direction . . . . . . . . . . . . . . . . . . . . 7
- 3.4. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
- 3.5. Naming Guidelines . . . . . . . . . . . . . . . . . . . . 7
- 4. Collation Specification Requirements . . . . . . . . . . . . . 8
- 4.1. Collation/Server Interface . . . . . . . . . . . . . . . . 8
- 4.2. Operations Supported . . . . . . . . . . . . . . . . . . . 8
- 4.2.1. Validity . . . . . . . . . . . . . . . . . . . . . . . 9
- 4.2.2. Equality . . . . . . . . . . . . . . . . . . . . . . . 9
- 4.2.3. Substring . . . . . . . . . . . . . . . . . . . . . . 9
- 4.2.4. Ordering . . . . . . . . . . . . . . . . . . . . . . . 10
- 4.3. Sort Keys . . . . . . . . . . . . . . . . . . . . . . . . 10
- 4.4. Use of Lookup Tables . . . . . . . . . . . . . . . . . . . 11
- 5. Application Protocol Requirements . . . . . . . . . . . . . . 11
- 5.1. Character Encoding . . . . . . . . . . . . . . . . . . . . 11
- 5.2. Operations . . . . . . . . . . . . . . . . . . . . . . . . 11
- 5.3. Wildcards . . . . . . . . . . . . . . . . . . . . . . . . 12
- 5.4. String Comparison . . . . . . . . . . . . . . . . . . . . 12
- 5.5. Disconnected Clients . . . . . . . . . . . . . . . . . . . 12
- 5.6. Error Codes . . . . . . . . . . . . . . . . . . . . . . . 13
- 5.7. Octet Collation . . . . . . . . . . . . . . . . . . . . . 13
- 6. Use by Existing Protocols . . . . . . . . . . . . . . . . . . 13
- 7. Collation Registration . . . . . . . . . . . . . . . . . . . . 14
- 7.1. Collation Registration Procedure . . . . . . . . . . . . . 14
- 7.2. Collation Registration Format . . . . . . . . . . . . . . 15
- 7.2.1. Registration Template . . . . . . . . . . . . . . . . 15
- 7.2.2. The Collation Element . . . . . . . . . . . . . . . . 15
- 7.2.3. The Identifier Element . . . . . . . . . . . . . . . . 16
- 7.2.4. The Title Element . . . . . . . . . . . . . . . . . . 16
- 7.2.5. The Operations Element . . . . . . . . . . . . . . . . 16
- 7.2.6. The Specification Element . . . . . . . . . . . . . . 16
- 7.2.7. The Submitter Element . . . . . . . . . . . . . . . . 16
- 7.2.8. The Owner Element . . . . . . . . . . . . . . . . . . 16
- 7.2.9. The Version Element . . . . . . . . . . . . . . . . . 17
- 7.2.10. The Variable Element . . . . . . . . . . . . . . . . . 17
- 7.3. Structure of Collation Registry . . . . . . . . . . . . . 17
- 7.4. Example Initial Registry Summary . . . . . . . . . . . . . 18
-
-
-
-Newman, et al. Standards Track [Page 2]
-
-RFC 4790 Collation Registry March 2007
-
-
- 8. Guidelines for Expert Reviewer . . . . . . . . . . . . . . . . 18
- 9. Initial Collations . . . . . . . . . . . . . . . . . . . . . . 19
- 9.1. ASCII Numeric Collation . . . . . . . . . . . . . . . . . 20
- 9.1.1. ASCII Numeric Collation Description . . . . . . . . . 20
- 9.1.2. ASCII Numeric Collation Registration . . . . . . . . . 20
- 9.2. ASCII Casemap Collation . . . . . . . . . . . . . . . . . 21
- 9.2.1. ASCII Casemap Collation Description . . . . . . . . . 21
- 9.2.2. ASCII Casemap Collation Registration . . . . . . . . . 22
- 9.3. Octet Collation . . . . . . . . . . . . . . . . . . . . . 22
- 9.3.1. Octet Collation Description . . . . . . . . . . . . . 22
- 9.3.2. Octet Collation Registration . . . . . . . . . . . . . 23
- 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23
- 11. Security Considerations . . . . . . . . . . . . . . . . . . . 23
- 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
- 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
- 13.1. Normative References . . . . . . . . . . . . . . . . . . . 24
- 13.2. Informative References . . . . . . . . . . . . . . . . . . 24
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 3]
-
-RFC 4790 Collation Registry March 2007
-
-
-1. Introduction
-
- The Application Configuration Access Protocol ACAP [11] specification
- introduced the concept of a comparator (which we call collation in
- this document), but failed to create an IANA registry. With the
- introduction of stringprep [6] and the Unicode Collation Algorithm
- [7], it is now time to create that registry and populate it with some
- initial values appropriate for an international community. This
- specification replaces and generalizes the definition of a comparator
- in ACAP, and creates a collation registry.
-
-1.1. Conventions Used in This Document
-
- The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
- in this document are to be interpreted as defined in "Key words for
- use in RFCs to Indicate Requirement Levels" [1].
-
- The attribute syntax specifications use the Augmented Backus-Naur
- Form (ABNF) [2] notation, including the core rules defined in
- Appendix A. The ABNF production "Language-tag" is imported from
- Language Tags [5] and "reg-name" from URI: Generic Syntax [4].
-
-2. Collation Definition and Purpose
-
-2.1. Definition
-
- A collation is a named function which takes two arbitrary length
- strings as input and can be used to perform one or more of three
- basic comparison operations: equality test, substring match, and
- ordering test.
-
-2.2. Purpose
-
- Collations are an abstraction for comparison functions so that these
- comparison functions can be used in multiple protocols. The details
- of a particular comparison operation can be specified by someone with
- appropriate expertise, independent of the application protocols that
- use that collation. This is similar to the way a charset [13]
- separates the details of octet to character mapping from a protocol
- specification, such as MIME [9], or the way SASL [10] separates the
- details of an authentication mechanism from a protocol specification,
- such as ACAP [11].
-
-
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 4]
-
-RFC 4790 Collation Registry March 2007
-
-
- Here is a small diagram to help illustrate the value of this
- abstraction:
-
- +-------------------+ +-----------------+
- | IMAP i18n SEARCH |--+ | Basic |
- +-------------------+ | +--| Collation Spec |
- | | +-----------------+
- +-------------------+ | +-------------+ | +-----------------+
- | ACAP i18n SEARCH |--+--| Collation |--+--| A stringprep |
- +-------------------+ | | Registry | | | Collation Spec |
- | +-------------+ | +-----------------+
- +-------------------+ | | +-----------------+
- | ...other protocol |--+ | | locale-specific |
- +-------------------+ +--| Collation Spec |
- +-----------------+
-
- Thus IMAP, ACAP, and future application protocols with international
- search capability simply specify how to interface to the collation
- registry instead of each protocol specification having to specify all
- the collations it supports.
-
-2.3. Some Other Terms Used in this Document
-
- The terms client, server, and protocol are used in somewhat unusual
- senses.
-
- Client means a user, or a program acting directly on behalf of a
- user. This may be a mail reader acting as an IMAP client, or it may
- be an interactive shell, where the user can type protocol commands/
- requests directly, or it may be a script or program written by the
- user.
-
- Server means a program that performs services requested by the
- client. This may be a traditional server such as an HTTP server, or
- it may be a Sieve [14] interpreter running a Sieve script written by
- a user. A server needs to use the operations provided by collations
- in order to fulfill the client's requests.
-
- The protocol describes how the client tells the server what it wants
- done, and (if applicable) how the server tells the client about the
- results. IMAP is a protocol by this definition, and so is the Sieve
- language.
-
-2.4. Sort Keys
-
- One component of a collation is a transformation, which turns a
- string into a sort key, which is then used while sorting.
-
-
-
-
-Newman, et al. Standards Track [Page 5]
-
-RFC 4790 Collation Registry March 2007
-
-
- The transformation can range from an identity mapping (e.g., the
- i;octet collation Section 9.3) to a mapping that makes the string
- unreadable to a human.
-
- This is an implementation detail of collations or servers. A
- protocol SHOULD NOT expose it to clients, since some collations leave
- the sort key's format up to the implementation, and current
- conformant implementations are known to use different formats.
-
-3. Collation Identifier Syntax
-
-3.1. Basic Syntax
-
- The collation identifier itself is a single US-ASCII string. The
- identifier MUST NOT be longer than 254 characters, and obeys the
- following grammar:
-
- collation-char = ALPHA / DIGIT / "-" / ";" / "=" / "."
-
- collation-id = collation-prefix ";" collation-core-name
- *collation-arg
-
- collation-scope = Language-tag / "vnd-" reg-name
-
- collation-core-name = ALPHA *( ALPHA / DIGIT / "-" )
-
- collation-arg = ";" ALPHA *( ALPHA / DIGIT ) "="
- 1*( ALPHA / DIGIT / "." )
-
-
- Note: the ABNF production "Language-tag" is imported from Language
- Tags [5] and "reg-name" from URI: Generic Syntax [4].
-
- There is a special identifier called "default". For protocols that
- have a default collation, "default" refers to that collation. For
- other protocols, the identifier "default" MUST match no collations,
- and servers SHOULD treat it in the same way as they treat nonexistent
- collations.
-
-3.2. Wildcards
-
- The string a client uses to select a collation MAY contain one or
- more wildcard ("*") characters that match zero or more collation-
- chars. Wildcard characters MUST NOT be adjacent. If the wildcard
- string matches multiple collations, the server SHOULD attempt to
- select a widely useful collation in preference to a narrowly useful
- one.
-
-
-
-
-Newman, et al. Standards Track [Page 6]
-
-RFC 4790 Collation Registry March 2007
-
-
- collation-wild = ("*" / (ALPHA ["*"])) *(collation-char ["*"])
- ; MUST NOT exceed 254 characters total
-
-3.3. Ordering Direction
-
- When used as a protocol element for ordering, the collation
- identifier MAY be prefixed by either "+" or "-" to explicitly specify
- an ordering direction. "+" has no effect on the ordering operation,
- while "-" inverts the result of the ordering operation. In general,
- collation-order is used when a client requests a collation, and
- collation-selected is used when the server informs the client of the
- selected collation.
-
- collation-selected = ["+" / "-"] collation-id
-
- collation-order = ["+" / "-"] collation-wild
-
-3.4. URIs
-
- Some protocols are designed to use URIs [4] to refer to collations
- rather than simple tokens. A special section of the IANA URL space
- is reserved for such usage. The "collation-uri" form is used to
- refer to a specific named collation (the collation registration may
- not actually be present). The "collation-auri" form is an abstract
- name for an ordering, a collation pattern or a vendor private
- collator.
-
- collation-uri = "http://www.iana.org/assignments/collation/"
- collation-id ".xml"
-
- collation-auri = ( "http://www.iana.org/assignments/collation/"
- collation-order ".xml" ) / other-uri
-
- other-uri = <absoluteURI>
- ; excluding the IANA collation namespace.
-
-3.5. Naming Guidelines
-
- While this specification makes no absolute requirements on the
- structure of collation identifiers, naming consistency is important,
- so the following initial guidelines are provided.
-
- Collation identifiers with an international audience typically begin
- with "i;". Collation identifiers intended for a particular language
- or locale typically begin with a language tag [5] followed by a ";".
- After the first ";" is normally the name of the general collation
- algorithm, followed by a series of algorithm modifications separated
- by the ";" delimiter. Parameterized modifications will use "=" to
-
-
-
-Newman, et al. Standards Track [Page 7]
-
-RFC 4790 Collation Registry March 2007
-
-
- delimit the parameter from the value. The version numbers of any
- lookup tables used by the algorithm SHOULD be present as
- parameterized modifications.
-
- Collation identifiers of the form *;vnd-hostname;* are reserved for
- vendor-specific collations created by the owner of the hostname
- following the "vnd-" prefix (e.g., vnd-example.com for the vendor
- example.com). Registration of such collations (or the name space as
- a whole), with intended use of the "Vendor", is encouraged when a
- public specification or open-source implementation is available, but
- is not required.
-
-4. Collation Specification Requirements
-
-4.1. Collation/Server Interface
-
- The collation itself defines what it operates on. Most collations
- are expected to operate on character strings. The i;octet
- (Section 9.3) collation operates on octet strings. The i;ascii-
- numeric (Section 9.1) operation operates on numbers.
-
- This specification defines the collation interface in terms of octet
- strings. However, implementations may choose to use character
- strings instead. Such implementations may not be able to implement
- e.g., i;octet. Since i;octet is not currently mandatory to implement
- for any protocol, this should not be a problem.
-
-4.2. Operations Supported
-
- A collation specification MUST state which of the three basic
- operations are supported (equality, substring, ordering) and how to
- perform each of the supported operations on any two input character
- strings, including empty strings. Collations must be deterministic,
- i.e., given a collation with a specific identifier, and any two fixed
- input strings, the result MUST be the same for the same operation.
-
- In general, collation operations should behave as their names
- suggest. While a collation may be new, the operations are not, so
- the new collation's operations should be similar to those of older
- collations. For example, a date/time collation should not provide a
- "substring" operation that would morph IMAP substring SEARCH into
- e.g., a date-range search.
-
- A non-obvious consequence of the rules for each collation operation
- is that, for any single collation, either none or all of the
- operations can return "undefined". For example, it is not possible
- to have an equality operation that never returns "undefined", and a
- substring operation that occasionally does.
-
-
-
-Newman, et al. Standards Track [Page 8]
-
-RFC 4790 Collation Registry March 2007
-
-
-4.2.1. Validity
-
- The validity test takes one string as argument. It returns valid if
- its input string is a valid input to the collation's other
- operations, and invalid if not. (In other words, a string is valid
- if it is equal to itself according to the collation's equality
- operation.)
-
- The validity test is provided by all collations. It MUST NOT be
- listed separately in the collation registration.
-
-4.2.2. Equality
-
- The equality test always returns "match" or "no-match" when it is
- supplied valid input, and MAY return "undefined" if one or both input
- strings are not valid.
-
- The equality test MUST be reflexive and symmetric. For valid input,
- it MUST be transitive.
-
- If a collation provides either a substring or an ordering test, it
- MUST also provide an equality test. The substring and/or ordering
- tests MUST be consistent with the equality test.
-
- The return values of the equality test are called "match", "no-match"
- and "undefined" in this document.
-
-4.2.3. Substring
-
- The substring matching operation determines if the first string is a
- substring of the second string, i.e., if one or more substrings of
- the second string is equal to the first, as defined by the
- collation's equality operation.
-
- A collation that supports substring matching will automatically
- support two special cases of substring matching: prefix and suffix
- matching, if those special cases are supported by the application
- protocol. It returns "match" or "no-match" when it is supplied valid
- input and returns "undefined" when supplied invalid input.
-
- Application protocols MAY return position information for substring
- matches. If this is done, the position information SHOULD include
- both the starting offset and the ending offset for each match. This
- is important because more sophisticated collations can match strings
- of unequal length (for example, a pre-composed accented character can
- match a decomposed accented character). In general, overlapping
- matches SHOULD be reported (as when "ana" occurs twice within
- "banana"), although there are cases where a collation may decide not
-
-
-
-Newman, et al. Standards Track [Page 9]
-
-RFC 4790 Collation Registry March 2007
-
-
- to. For example, in a collation which treats all whitespace
- sequences as identical, the substring operation could be defined such
- that " 1 " (SP "1" SP) is reported just once within " 1 " (SP SP
- "1" SP SP), not four times (SP SP "1" SP, SP "1" SP, SP "1" SP SP and
- SP SP "1" SP SP), since the four matches are, in a sense, the same
- match.
-
- A string is a substring of itself. The empty string is a substring
- of all strings.
-
- Note that the substring operation of some collations can match
- strings of unequal length. For example, a pre-composed accented
- character can match a decomposed accented character. The Unicode
- Collation Algorithm [7] discusses this in more detail.
-
- The return values of the substring operation are called "match", "no-
- match", and "undefined" in this document.
-
-4.2.4. Ordering
-
- The ordering operation determines how two strings are ordered. It
- MUST be reflexive. For valid input, it MUST be transitive and
- trichotomous.
-
- Ordering returns "less" if the first string is listed before the
- second string, according to the collation; "greater", if the second
- string is listed before the first string; and "equal", if the two
- strings are equal, as defined by the collation's equality operation.
- If one or both strings are invalid, the result of ordering is
- "undefined".
-
- When the collation is used with a "+" prefix, the behavior is the
- same as when used with no prefix. When the collation is used with a
- "-" prefix, the result of the ordering operation of the collation
- MUST be reversed.
-
- The return values of the ordering operation are called "less",
- "equal", "greater", and "undefined" in this document.
-
-4.3. Sort Keys
-
- A collation specification SHOULD describe the internal transformation
- algorithm to generate sort keys. This algorithm can be applied to
- individual strings, and the result can be stored to potentially
- optimize future comparison operations. A collation MAY specify that
- the sort key is generated by the identity function. The sort key may
- have no meaning to a human. The sort key may not be valid input to
- the collation.
-
-
-
-Newman, et al. Standards Track [Page 10]
-
-RFC 4790 Collation Registry March 2007
-
-
-4.4. Use of Lookup Tables
-
- Some collations use customizable lookup tables, e.g., because the
- tables depend on locale, and may be modified after shipping the
- software. Collations that use more than one customizable lookup
- table in a documented format MUST assign numbers to the tables they
- use. This permits an application protocol command to access the
- tables used by a server collation, so that clients and servers use
- the same tables.
-
-5. Application Protocol Requirements
-
- This section describes the requirements and issues that an
- application protocol needs to consider if it offers searching,
- substring matching and/or sorting, and permits the use of characters
- outside the US-ASCII charset.
-
-5.1. Character Encoding
-
- The protocol specification has to make sure that it is clear on which
- characters (rather than just octets) the collations are used. This
- can be done by specifying the protocol itself in terms of characters
- (e.g., in the case of a query language), by specifying a single
- character encoding for the protocol (e.g., UTF-8 [3]), or by
- carefully describing the relevant issues of character encoding
- labeling and conversion. In the later case, details to consider
- include how to handle unknown charsets, any charsets that are
- mandatory-to-implement, any issues with byte-order that might apply,
- and any transfer encodings that need to be supported.
-
-5.2. Operations
-
- The protocol must specify which of the operations defined in this
- specification (equality matching, substring matching, and ordering)
- can be invoked in the protocol, and how they are invoked. There may
- be more than one way to invoke an operation.
-
- The protocol MUST provide a mechanism for the client to select the
- collation to use with equality matching, substring matching, and
- ordering.
-
- If a protocol needs a total ordering and the collation chosen does
- not provide it because the ordering operation returns "undefined" at
- least once, the recommended fallback is to sort all invalid strings
- after the valid ones, and use i;octet to order the invalid strings.
-
- Although the collation's substring function provides a list of
- matches, a protocol need not provide all that to the client. It may
-
-
-
-Newman, et al. Standards Track [Page 11]
-
-RFC 4790 Collation Registry March 2007
-
-
- provide only the first matching substring, or even just the
- information that the substring search matched. In this way,
- collations can be used with protocols that are defined such that "x
- is a substring of y" returns true-false.
-
- If the protocol provides positional information for the results of a
- substring match, that positional information SHOULD fully specify the
- substring(s) in the result that matches, independent of the length of
- the search string. For example, returning both the starting and
- ending offset of the match would suffice, as would the starting
- offset and a length. Returning just the starting offset is not
- acceptable. This rule is necessary because advanced collations can
- treat strings of different lengths as equal (for example, pre-
- composed and decomposed accented characters).
-
-5.3. Wildcards
-
- The protocol MUST specify whether it allows the use of wildcards in
- collation identifiers. If the protocol allows wildcards, then:
- The protocol MUST specify how comparisons behave in the absence of
- explicit collation negotiation, or when a collation of "default"
- is requested. The protocol MAY specify that the default collation
- used in such circumstances is sensitive to server configuration.
-
- The protocol SHOULD provide a way to list available collations
- matching a given wildcard pattern, or patterns.
-
-5.4. String Comparison
-
- If a protocol compares strings in any nontrivial way, using a
- collation may be appropriate. As an example, many protocols use
- case-independent strings. In many cases, a simple ASCII mapping to
- upper/lower case works well. In other cases, it may be better to use
- a specifiable collation; for example, so that a server can treat "i"
- and "I" as equivalent in Italy, and different in Turkey (Turkish also
- has a dotted upper-case" I" and a dotless lower-case "i").
-
- Protocol designers should consider, in each case, whether to use a
- specifiable collation. Keywords often have other needs than user
- variables, and search arguments may be different again.
-
-5.5. Disconnected Clients
-
- If the protocol supports disconnected clients, and a collation is
- used that can use configurable tables (e.g., to support
- locale-specific extensions), then the client may not be able to
- reproduce the server's collation operations while offline.
-
-
-
-
-Newman, et al. Standards Track [Page 12]
-
-RFC 4790 Collation Registry March 2007
-
-
- A mechanism to download such tables has been discussed. Such a
- mechanism is not included in the present specification, since the
- problem is not yet well understood.
-
-5.6. Error Codes
-
- The protocol specification should consider assigning protocol error
- codes for the following circumstances:
-
- o The client requests the use of a collation by identifier or
- pattern, but no implemented collation matches that pattern.
-
- o The client attempts to use a collation for an operation that is
- not supported by that collation -- for example, attempting to use
- the "i;ascii-numeric" collation for substring matching.
-
- o The client uses an equality or substring matching collation, and
- the result is an error. It may be appropriate to distinguish
- between the two input strings, particularly when one is supplied
- by the client and the other is stored by the server. It might
- also be appropriate to distinguish the specific case of an invalid
- UTF-8 string.
-
-5.7. Octet Collation
-
- The i;octet (Section 9.3) collation is only usable with protocols
- based on octet-strings. Clients and servers MUST NOT use i;octet
- with other protocols.
-
- If the protocol permits the use of collations with data structures
- other than strings, the protocol MUST describe the default behavior
- for a collation with those data structures.
-
-6. Use by Existing Protocols
-
- This section is informative.
-
- Both ACAP [11] and Sieve [14] are standards track specifications that
- used collations prior to the creation of this specification and
- registry. Those standards do not meet all the application protocol
- requirements described in Section 5.
-
- These protocols allow the use of the i;octet (Section 9.3) collation
- working directly on UTF-8 data, as used in these protocols.
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 13]
-
-RFC 4790 Collation Registry March 2007
-
-
- In Sieve, all matches are either true or false. Accordingly, Sieve
- servers must treat "undefined" and "no-match" results of the equality
- and substring operations as false, and only "match" as true.
-
- In ACAP and Sieve, there are no invalid strings. In this document's
- terms, invalid strings sort after valid strings.
-
- IMAP [15] also collates, although that is explicit only when the
- COMPARATOR [17] extension is used. The built-in IMAP substring
- operation and the ordering provided by the SORT [16] extension may
- not meet the requirements made in this document.
-
- Other protocols may be in a similar position.
-
- In IMAP, the default collation is i;ascii-casemap, because its
- operations are understood to match IMAP's built-in operations.
-
-7. Collation Registration
-
-7.1. Collation Registration Procedure
-
- The IETF will create a mailing list, collation@ietf.org, which can be
- used for public discussion of collation proposals prior to
- registration. Use of the mailing list is strongly encouraged. The
- IESG will appoint a designated expert who will monitor the
- collation@ietf.org mailing list and review registrations.
-
- The registration procedure begins when a completed registration
- template is sent to iana@iana.org and collation@ietf.org. The
- designated expert is expected to tell IANA and the submitter of the
- registration within two weeks whether the registration is approved,
- approved with minor changes, or rejected with cause. When a
- registration is rejected with cause, it can be re-submitted if the
- concerns listed in the cause are addressed. Decisions made by the
- designated expert can be appealed to the IESG Applications Area
- Director, then to the IESG. They follow the normal appeals procedure
- for IESG decisions.
-
- Collation registrations in a standards track, BCP, or IESG-approved
- experimental RFC are owned by the IETF, and changes to the
- registration follow normal procedures for updating such documents.
- Collation registrations in other RFCs are owned by the RFC author(s).
- Other collation registrations are owned by the individual(s) listed
- in the contact field of the registration, and IANA will preserve this
- information.
-
- If the registration is a change of an existing collation, it MUST be
- approved by the owner. In the event the owner cannot be contacted
-
-
-
-Newman, et al. Standards Track [Page 14]
-
-RFC 4790 Collation Registry March 2007
-
-
- for a period of one month, and the designated expert deems the change
- necessary, the IESG MAY re-assign ownership to an appropriate party.
-
-7.2. Collation Registration Format
-
- Registration of a collation is done by sending a well-formed XML
- document to collation@ietf.org and iana@iana.org.
-
-7.2.1. Registration Template
-
- Here is a template for the registration:
-
- <?xml version='1.0'?>
- <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
- <collation rfc="YYYY" scope="global" intendedUse="common">
- <identifier>collation identifier</identifier>
- <title>technical title for collation</title>
- <operations>equality order substring</operations>
- <specification>specification reference</specification>
- <owner>email address of owner or IETF</owner>
- <submitter>email address of submitter</submitter>
- <version>1</version>
- </collation>
-
-7.2.2. The Collation Element
-
- The root of the registration document MUST be a <collation> element.
- The collation element contains the other elements in the
- registration, which are described in the following sub-subsections,
- in the order given here.
-
- The <collation> element MAY include an "rfc=" attribute if the
- specification is in an RFC. The "rfc=" attribute gives only the
- number of the RFC, without any prefix, such as "RFC", or suffix, such
- as ".txt".
-
- The <collation> element MUST include a "scope=" attribute, which MUST
- have one of the values "global", "local", or "other".
-
- The <collation> element MUST include an "intendedUse=" attribute,
- which must have one of the values "common", "limited", "vendor", or
- "deprecated". Collation specifications intended for "common" use are
- expected to reference standards from standards bodies with
- significant experience dealing with the details of international
- character sets.
-
- Be aware that future revisions of this specification may add
- additional function types, as well as additional XML attributes,
-
-
-
-Newman, et al. Standards Track [Page 15]
-
-RFC 4790 Collation Registry March 2007
-
-
- values, and elements. Any system that automatically parses these XML
- documents MUST take this into account to preserve future
- compatibility.
-
-7.2.3. The Identifier Element
-
- The <identifier> element gives the precise identifier of the
- collation, e.g., i;ascii-casemap. The <identifier> element is
- mandatory.
-
-7.2.4. The Title Element
-
- The <title> element gives the title of the collation. The <title>
- element is mandatory.
-
-7.2.5. The Operations Element
-
- The <operations> element lists which of the three operations
- ("equality", "order" or "substring") the collation provides,
- separated by single spaces. The <operations> element is mandatory.
-
-7.2.6. The Specification Element
-
- The <specification> element describes where to find the
- specification. The <specification> element is mandatory. It MAY
- have a URI attribute. There may be more than one <specification>
- element, in which case, they together form the specification.
-
- If it is discovered that parts of a collation specification conflict,
- a new revision of the collation is necessary, and the
- collation@ietf.org mailing list should be notified.
-
-7.2.7. The Submitter Element
-
- The <submitter> element provides an RFC 2822 [12] email address for
- the person who submitted the registration. It is optional if the
- <owner> element contains an email address.
-
- There may be more than one <submitter> element.
-
-7.2.8. The Owner Element
-
- The <owner> element contains either the four letters "IETF" or an
- email address of the owner of the registration. The <owner> element
- is mandatory. There may be more than one <owner> element. If so,
- all owners are equal. Each owner can speak for all.
-
-
-
-
-
-Newman, et al. Standards Track [Page 16]
-
-RFC 4790 Collation Registry March 2007
-
-
-7.2.9. The Version Element
-
- The <version> element MUST be included when the registration is
- likely to be revised, or has been revised in such a way that the
- results change for one or more input strings. The <version> element
- is optional.
-
-7.2.10. The Variable Element
-
- The <variable> element specifies an optional variable to control the
- collation's behaviour, for example whether it is case sensitive. The
- <variable> element is optional. When <variable> is used, it must
- contain <name> and <default> elements, and it may contain one or more
- <value> elements.
-
-7.2.10.1. The Name Element
-
- The <name> element specifies the name value of a variable. The
- <name> element is mandatory.
-
-7.2.10.2. The Default Element
-
- The <default> element specifies the default value of a variable. The
- <default> element is mandatory.
-
-7.2.10.3. The Value Element
-
- The <value> element specifies a legal value of a variable. The
- <value> element is optional. If one or more <value> elements are
- present, only those values are legal. If none are, then the
- variable's legal values do not form an enumerated set, and the rules
- MUST be specified in an RFC accompanying the registration.
-
-7.3. Structure of Collation Registry
-
- Once the registration is approved, IANA will store each XML
- registration document in a URL of the form
- http://www.iana.org/assignments/collation/collation-id.xml, where
- collation-id is the content of the identifier element in the
- registration. Both the submitter and the designated expert are
- responsible for verifying that the XML is well-formed. The
- registration document should avoid using new elements. If any are
- necessary, it is important to be consistent with other registrations.
-
- IANA will also maintain a text summary of the registry under the name
- http://www.iana.org/assignments/collation/collation-index.html. This
- summary is divided into four sections. The first section is for
- collations intended for common use. This section is intended for
-
-
-
-Newman, et al. Standards Track [Page 17]
-
-RFC 4790 Collation Registry March 2007
-
-
- collation registrations published in IESG-approved RFCs, or for
- locally scoped collations from the primary standards body for that
- locale. The designated expert is encouraged to reject collation
- registrations with an intended use of "common" if the expert believes
- it should be "limited", as it is desirable to keep the number of
- "common" registrations small and of high quality. The second section
- is reserved for limited-use collations. The third section is
- reserved for registered vendor-specific collations. The final
- section is reserved for deprecated collations.
-
-7.4. Example Initial Registry Summary
-
- The following is an example of how IANA might structure the initial
- registry summary.html file:
-
- Collation Functions Scope Reference
- --------- --------- ----- ---------
- Common Use Collations:
- i;ascii-casemap e, o, s Local [RFC 4790]
-
- Limited Use Collations:
- i;octet e, o, s Other [RFC 4790]
- i;ascii-numeric e, o Other [RFC 4790]
-
- Vendor Collations:
-
- Deprecated Collations:
-
-
- References
- ----------
- [RFC 4790] Newman, C., Duerst, M., Gulbrandsen, A., "Internet
- Application Protocol Collation Registry", RFC 4790,
- Sun Microsystems, March 2007.
-
-8. Guidelines for Expert Reviewer
-
- The expert reviewer appointed by the IESG has fairly broad latitude
- for this registry. While a number of collations are expected
- (particularly customizations of the UCA for localized use), an
- explosion of collations (particularly common-use collations) is not
- desirable for widespread interoperability. However, it is important
- for the expert reviewer to provide cause when rejecting a
- registration, and, when possible, to describe corrective action to
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 18]
-
-RFC 4790 Collation Registry March 2007
-
-
- permit the registration to proceed. The following table includes
- some example reasons to reject a registration with cause:
-
- o The registration is not a well-formed XML document.
-
- o The registration has an intended use of "common", but there is no
- evidence the collation will be widely deployed, so it should be
- listed as "limited".
-
- o The registration has an intended use of "common", but it is
- redundant with the functionality of a previously registered
- "common" collation.
-
- o The registration has an intended use of "common", but the
- specification is not detailed enough to allow interoperable
- implementations by others.
-
- o The collation identifier fails to precisely identify the version
- numbers of relevant tables to use.
-
- o The registration fails to meet one of the "MUST" requirements in
- Section 4.
-
- o The collation identifier fails to meet the syntax in Section 3.
-
- o The collation specification referenced in the registration is
- vague or has optional features without a clear behavior specified.
-
- o The referenced specification does not adequately address security
- considerations specific to that collation.
-
- o The registration's operations are needlessly different from those
- of traditional operations.
-
- o The registration's XML is needlessly different from that of
- already registered collations.
-
-9. Initial Collations
-
- This section registers the three collations that were originally
- defined in [11], and are implemented in most [14] engines. Some of
- the behavior of these collations is perhaps not ideal, such as
- i;ascii-casemap accepting non-ASCII input. Compatibility with widely
- deployed code was judged more important than fixing the collations.
- Some of the aspects of these collations are necessary to maintain
- compatibility with widely deployed code.
-
-
-
-
-
-Newman, et al. Standards Track [Page 19]
-
-RFC 4790 Collation Registry March 2007
-
-
-9.1. ASCII Numeric Collation
-
-9.1.1. ASCII Numeric Collation Description
-
- The "i;ascii-numeric" collation is a simple collation intended for
- use with arbitrarily-sized, unsigned decimal integer numbers stored
- as octet strings. US-ASCII digits (0x30 to 0x39) represent digits of
- the numbers. Before converting from string to integer, the input
- string is truncated at the first non-digit character. All input is
- valid; strings that do not start with a digit represent positive
- infinity.
-
- The collation supports equality and ordering, but does not support
- the substring operation.
-
- The equality operation returns "match" if the two strings represent
- the same number (i.e., leading zeroes and trailing non-digits are
- disregarded), and "no-match" if the two strings represent different
- numbers.
-
- The ordering operation returns "less" if the first string represents
- a smaller number than the second, "equal" if they represent the same
- number, and "greater" if the first string represents a larger number
- than the second.
-
- Some examples: "0" is less than "1", and "1" is less than
- "4294967298". "4294967298", "04294967298", and "4294967298b" are all
- equal. "04294967298" is less than "". "", "x", and "y" are equal.
-
-9.1.2. ASCII Numeric Collation Registration
-
- <?xml version='1.0'?>
- <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
- <collation rfc="4790" scope="other" intendedUse="limited">
- <identifier>i;ascii-numeric</identifier>
- <title>ASCII Numeric</title>
- <operations>equality order</operations>
- <specification>RFC 4790</specification>
- <owner>IETF</owner>
- <submitter>chris.newman@sun.com</submitter>
- </collation>
-
-
-
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 20]
-
-RFC 4790 Collation Registry March 2007
-
-
-9.2. ASCII Casemap Collation
-
-9.2.1. ASCII Casemap Collation Description
-
- The "i;ascii-casemap" collation is a simple collation that operates
- on octet strings and treats US-ASCII letters case-insensitively. It
- provides equality, substring, and ordering operations. All input is
- valid. Note that letters outside ASCII are not treated case-
- insensitively.
-
- Its equality, ordering, and substring operations are as for i;octet,
- except that at first, the lower-case letters (octet values 97-122) in
- each input string are changed to upper case (octet values 65-90).
-
- Care should be taken when using OS-supplied functions to implement
- this collation, as it is not locale sensitive. Functions, such as
- strcasecmp and toupper, are sometimes locale sensitive, and may
- inappropriately map lower-case letters other than a-z to upper case.
-
- The i;ascii-casemap collation is well-suited for use with many
- Internet protocols and computer languages. Use with natural language
- is often inappropriate; even though the collation apparently supports
- languages such as Swahili and English, in real-world use, it tends to
- mis-sort a number of types of string:
-
- o people and place names containing non-ASCII,
-
- o words such as "naive" (if spelled with an accent, the accented
- character could push the word to the wrong spot in a sorted list),
-
- o names such as "Lloyd" (which, in Welsh, sorts after "Lyon", unlike
- in English),
-
- o strings containing euro and pound sterling symbols, quotation
- marks other than '"', dashes/hyphens, etc.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 21]
-
-RFC 4790 Collation Registry March 2007
-
-
-9.2.2. ASCII Casemap Collation Registration
-
- <?xml version='1.0'?>
- <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
- <collation rfc="4790" scope="local" intendedUse="common">
- <identifier>i;ascii-casemap</identifier>
- <title>ASCII Casemap</title>
- <operations>equality order substring</operations>
- <specification>RFC 4790</specification>
- <owner>IETF</owner>
- <submitter>chris.newman@sun.com</submitter>
- </collation>
-
-9.3. Octet Collation
-
-9.3.1. Octet Collation Description
-
- The "i;octet" collation is a simple and fast collation intended for
- use on binary octet strings rather than on character data. Protocols
- that want to make this collation available have to do so by
- explicitly allowing it. If not explicitly allowed, it MUST NOT be
- used. It never returns an "undefined" result. It provides equality,
- substring, and ordering operations.
-
- The ordering algorithm is as follows:
-
- 1. If both strings are the empty string, return the result "equal".
-
- 2. If the first string is empty and the second is not, return the
- result "less".
-
- 3. If the second string is empty and the first is not, return the
- result "greater".
-
- 4. If both strings begin with the same octet value, remove the first
- octet from both strings and repeat this algorithm from step 1.
-
- 5. If the unsigned value (0 to 255) of the first octet of the first
- string is less than the unsigned value of the first octet of the
- second string, then return "less".
-
- 6. If this step is reached, return "greater".
-
- This algorithm is roughly equivalent to the C library function
- memcmp, with appropriate length checks added.
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 22]
-
-RFC 4790 Collation Registry March 2007
-
-
- The matching operation returns "match" if the sorting algorithm would
- return "equal". Otherwise, the matching operation returns "no-
- match".
-
- The substring operation returns "match" if the first string is the
- empty string, or if there exists a substring of the second string of
- length equal to the length of the first string, which would result in
- a "match" result from the equality function. Otherwise, the
- substring operation returns "no-match".
-
-9.3.2. Octet Collation Registration
-
- This collation is defined with intendedUse="limited" because it can
- only be used by protocols that explicitly allow it.
-
- <?xml version='1.0'?>
- <!DOCTYPE collation SYSTEM 'collationreg.dtd'>
- <collation rfc="4790" scope="global" intendedUse="limited">
- <identifier>i;octet</identifier>
- <title>Octet</title>
- <operations>equality order substring</operations>
- <specification>RFC 4790</specification>
- <owner>IETF</owner>
- <submitter>chris.newman@sun.com</submitter>
- </collation>
-
-10. IANA Considerations
-
- Section 7 defines how to register collations with IANA. Section 9
- defines a list of predefined collations that have been registered
- with IANA.
-
-11. Security Considerations
-
- Collations will normally be used with UTF-8 strings. Thus, the
- security considerations for UTF-8 [3], stringprep [6], and Unicode
- TR-36 [8] also apply, and are normative to this specification.
-
-12. Acknowledgements
-
- The authors want to thank all who have contributed to this document,
- including Brian Carpenter, John Cowan, Dave Cridland, Mark Davis,
- Spencer Dawkins, Lisa Dusseault, Lars Eggert, Frank Ellermann, Philip
- Guenther, Tony Hansen, Ted Hardie, Sam Hartman, Kjetil Torgrim Homme,
- Michael Kay, John Klensin, Alexey Melnikov, Jim Melton, and Abhijit
- Menon-Sen.
-
-
-
-
-
-Newman, et al. Standards Track [Page 23]
-
-RFC 4790 Collation Registry March 2007
-
-
-13. References
-
-13.1. Normative References
-
- [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
- Levels", BCP 14, RFC 2119, March 1997.
-
- [2] Crocker, D. and P. Overell, "Augmented BNF for Syntax
- Specifications: ABNF", RFC 4234, October 2005.
-
- [3] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
- STD 63, RFC 3629, November 2003.
-
- [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
- Resource Identifier (URI): Generic Syntax", RFC 3986,
- January 2005.
-
- [5] Phillips, A. and M. Davis, "Tags for Identifying Languages",
- BCP 47, RFC 4646, September 2006.
-
- [6] Hoffman, P. and M. Blanchet, "Preparation of Internationalized
- Strings ("stringprep")", RFC 3454, December 2002.
-
- [7] Davis, M. and K. Whistler, "Unicode Collation Algorithm version
- 14", May 2005,
- <http://www.unicode.org/reports/tr10/tr10-14.html>.
-
- [8] Davis, M. and M. Suignard, "Unicode Security Considerations",
- February 2006, <http://www.unicode.org/reports/tr36/>.
-
-13.2. Informative References
-
- [9] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
- Extensions (MIME) Part One: Format of Internet Message Bodies",
- RFC 2045, November 1996.
-
- [10] Melnikov, A., "Simple Authentication and Security Layer
- (SASL)", RFC 4422, June 2006.
-
- [11] Newman, C. and J. Myers, "ACAP -- Application Configuration
- Access Protocol", RFC 2244, November 1997.
-
- [12] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
-
- [13] Freed, N. and J. Postel, "IANA Charset Registration
- Procedures", BCP 19, RFC 2978, October 2000.
-
-
-
-
-
-Newman, et al. Standards Track [Page 24]
-
-RFC 4790 Collation Registry March 2007
-
-
- [14] Showalter, T., "Sieve: A Mail Filtering Language", RFC 3028,
- January 2001.
-
- [15] Crispin, M., "Internet Message Access Protocol - Version
- 4rev1", RFC 3501, March 2003.
-
- [16] Crispin, M. and K. Murchison, "Internet Message Access Protocol
- - Sort and Thread Extensions", Work in Progress, May 2004.
-
- [17] Newman, C. and A. Gulbrandsen, "Internet Message Access
- Protocol Internationalization", Work in Progress, January 2006.
-
-Authors' Addresses
-
- Chris Newman
- Sun Microsystems
- 1050 Lakes Drive
- West Covina, CA 91790
- USA
-
- EMail: chris.newman@sun.com
-
-
- Martin Duerst
- Aoyama Gakuin University
- 5-10-1 Fuchinobe
- Sagamihara, Kanagawa 229-8558
- Japan
-
- Phone: +81 42 759 6329
- Fax: +81 42 759 6495
- EMail: duerst@it.aoyama.ac.jp
- URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/
-
- Note: Please write "Duerst" with u-umlaut wherever possible, for
- example as "D&#252;rst" in XML and HTML.
-
-
- Arnt Gulbrandsen
- Oryx Mail Systems GmbH
- Schweppermannstr. 8
- 81671 Munich
- Germany
-
- Fax: +49 89 4502 9758
- EMail: arnt@oryx.com
- URI: http://www.oryx.com/arnt/
-
-
-
-
-Newman, et al. Standards Track [Page 25]
-
-RFC 4790 Collation Registry March 2007
-
-
-Full Copyright Statement
-
- Copyright (C) The IETF Trust (2007).
-
- This document is subject to the rights, licenses and restrictions
- contained in BCP 78, and except as set forth therein, the authors
- retain all their rights.
-
- This document and the information contained herein are provided on an
- "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
- OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
- THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
- OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
- THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
- WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
-
-Intellectual Property
-
- The IETF takes no position regarding the validity or scope of any
- Intellectual Property Rights or other rights that might be claimed to
- pertain to the implementation or use of the technology described in
- this document or the extent to which any license under such rights
- might or might not be available; nor does it represent that it has
- made any independent effort to identify any such rights. Information
- on the procedures with respect to rights in RFC documents can be
- found in BCP 78 and BCP 79.
-
- Copies of IPR disclosures made to the IETF Secretariat and any
- assurances of licenses to be made available, or the result of an
- attempt made to obtain a general license or permission for the use of
- such proprietary rights by implementers or users of this
- specification can be obtained from the IETF on-line IPR repository at
- http://www.ietf.org/ipr.
-
- The IETF invites any interested party to bring to its attention any
- copyrights, patents or patent applications, or other proprietary
- rights that may cover technology that may be required to implement
- this standard. Please address the information to the IETF at
- ietf-ipr@ietf.org.
-
-Acknowledgement
-
- Funding for the RFC Editor function is currently provided by the
- Internet Society.
-
-
-
-
-
-
-
-Newman, et al. Standards Track [Page 26]
-