rohrpost

A commandline mail client to change the world as we see it.
git clone git://r-36.net/rohrpost
Log | Files | Refs | README | LICENSE

rfc2231.txt (19280B)


      1 
      2 
      3 
      4 
      5 
      6 
      7 Network Working Group                                         N. Freed
      8 Request for Comments: 2231                                    Innosoft
      9 Updates: 2045, 2047, 2183                                     K. Moore
     10 Obsoletes: 2184                                University of Tennessee
     11 Category: Standards Track                                November 1997
     12 
     13 
     14            MIME Parameter Value and Encoded Word Extensions:
     15               Character Sets, Languages, and Continuations
     16 
     17 
     18 Status of this Memo
     19 
     20    This document specifies an Internet standards track protocol for the
     21    Internet community, and requests discussion and suggestions for
     22    improvements.  Please refer to the current edition of the "Internet
     23    Official Protocol Standards" (STD 1) for the standardization state
     24    and status of this protocol.  Distribution of this memo is unlimited.
     25 
     26 Copyright Notice
     27 
     28    Copyright (C) The Internet Society (1997).  All Rights Reserved.
     29 
     30 1.  Abstract
     31 
     32    This memo defines extensions to the RFC 2045 media type and RFC 2183
     33    disposition parameter value mechanisms to provide
     34 
     35     (1)   a means to specify parameter values in character sets
     36           other than US-ASCII,
     37 
     38     (2)   to specify the language to be used should the value be
     39           displayed, and
     40 
     41     (3)   a continuation mechanism for long parameter values to
     42           avoid problems with header line wrapping.
     43 
     44    This memo also defines an extension to the encoded words defined in
     45    RFC 2047 to allow the specification of the language to be used for
     46    display as well as the character set.
     47 
     48 2.  Introduction
     49 
     50    The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
     51    2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
     52    allows for:
     53 
     54 
     55 
     56 
     57 
     58 Freed & Moore               Standards Track                     [Page 1]
     59 
     60 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
     61 
     62 
     63     (1)   textual message bodies in character sets other than
     64           US-ASCII,
     65 
     66     (2)   non-textual message bodies,
     67 
     68     (3)   multi-part message bodies, and
     69 
     70     (4)   textual header information in character sets other than
     71           US-ASCII.
     72 
     73    MIME is now widely deployed and is used by a variety of Internet
     74    protocols, including, of course, Internet email.  However, MIME's
     75    success has resulted in the need for additional mechanisms that were
     76    not provided in the original protocol specification.
     77 
     78    In particular, existing MIME mechanisms provide for named media type
     79    (content-type field) parameters as well as named disposition
     80    (content-disposition field).  A MIME media type may specify any
     81    number of parameters associated with all of its subtypes, and any
     82    specific subtype may specify additional parameters for its own use. A
     83    MIME disposition value may specify any number of associated
     84    parameters, the most important of which is probably the attachment
     85    disposition's filename parameter.
     86 
     87    These parameter names and values end up appearing in the content-type
     88    and content-disposition header fields in Internet email.  This
     89    inherently imposes three crucial limitations:
     90 
     91     (1)   Lines in Internet email header fields are folded
     92           according to RFC 822 folding rules.  This makes long
     93           parameter values problematic.
     94 
     95     (2)   MIME headers, like the RFC 822 headers they often
     96           appear in, are limited to 7bit US-ASCII, and the
     97           encoded-word mechanisms of RFC 2047 are not available
     98           to parameter values.  This makes it impossible to have
     99           parameter values in character sets other than US-ASCII
    100           without specifying some sort of private per-parameter
    101           encoding.
    102 
    103     (3)   It has recently become clear that character set
    104           information is not sufficient to properly display some
    105           sorts of information -- language information is also
    106           needed [RFC-2130].  For example, support for
    107           handicapped users may require reading text string
    108 
    109 
    110 
    111 
    112 
    113 
    114 Freed & Moore               Standards Track                     [Page 2]
    115 
    116 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    117 
    118 
    119           aloud. The language the text is written in is needed
    120           for this to be done correctly.  Some parameter values
    121           may need to be displayed, hence there is a need to
    122           allow for the inclusion of language information.
    123 
    124    The last problem on this list is also an issue for the encoded words
    125    defined by RFC 2047, as encoded words are intended primarily for
    126    display purposes.
    127 
    128    This document defines extensions that address all of these
    129    limitations. All of these extensions are implemented in a fashion
    130    that is completely compatible at a syntactic level with existing MIME
    131    implementations. In addition, the extensions are designed to have as
    132    little impact as possible on existing uses of MIME.
    133 
    134    IMPORTANT NOTE:  These mechanisms end up being somewhat gibbous when
    135    they actually are used. As such, these mechanisms should not be used
    136    lightly; they should be reserved for situations where a real need for
    137    them exists.
    138 
    139 2.1.  Requirements notation
    140 
    141    This document occasionally uses terms that appear in capital letters.
    142    When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
    143    appear capitalized, they are being used to indicate particular
    144    requirements of this specification. A discussion of the meanings of
    145    these terms appears in [RFC- 2119].
    146 
    147 3.  Parameter Value Continuations
    148 
    149    Long MIME media type or disposition parameter values do not interact
    150    well with header line wrapping conventions.  In particular, proper
    151    header line wrapping depends on there being places where linear
    152    whitespace (LWSP) is allowed, which may or may not be present in a
    153    parameter value, and even if present may not be recognizable as such
    154    since specific knowledge of parameter value syntax may not be
    155    available to the agent doing the line wrapping. The result is that
    156    long parameter values may end up getting truncated or otherwise
    157    damaged by incorrect line wrapping implementations.
    158 
    159    A mechanism is therefore needed to break up parameter values into
    160    smaller units that are amenable to line wrapping. Any such mechanism
    161    MUST be compatible with existing MIME processors. This means that
    162 
    163     (1)   the mechanism MUST NOT change the syntax of MIME media
    164           type and disposition lines, and
    165 
    166 
    167 
    168 
    169 
    170 Freed & Moore               Standards Track                     [Page 3]
    171 
    172 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    173 
    174 
    175     (2)   the mechanism MUST NOT depend on parameter ordering
    176           since MIME states that parameters are not order
    177           sensitive.  Note that while MIME does prohibit
    178           modification of MIME headers during transport, it is
    179           still possible that parameters will be reordered when
    180           user agent level processing is done.
    181 
    182    The obvious solution, then, is to use multiple parameters to contain
    183    a single parameter value and to use some kind of distinguished name
    184    to indicate when this is being done.  And this obvious solution is
    185    exactly what is specified here: The asterisk character ("*") followed
    186    by a decimal count is employed to indicate that multiple parameters
    187    are being used to encapsulate a single parameter value.  The count
    188    starts at 0 and increments by 1 for each subsequent section of the
    189    parameter value.  Decimal values are used and neither leading zeroes
    190    nor gaps in the sequence are allowed.
    191 
    192    The original parameter value is recovered by concatenating the
    193    various sections of the parameter, in order.  For example, the
    194    content-type field
    195 
    196         Content-Type: message/external-body; access-type=URL;
    197          URL*0="ftp://";
    198          URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
    199 
    200    is semantically identical to
    201 
    202         Content-Type: message/external-body; access-type=URL;
    203           URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
    204 
    205    Note that quotes around parameter values are part of the value
    206    syntax; they are NOT part of the value itself.  Furthermore, it is
    207    explicitly permitted to have a mixture of quoted and unquoted
    208    continuation fields.
    209 
    210 4.  Parameter Value Character Set and Language Information
    211 
    212    Some parameter values may need to be qualified with character set or
    213    language information.  It is clear that a distinguished parameter
    214    name is needed to identify when this information is present along
    215    with a specific syntax for the information in the value itself.  In
    216    addition, a lightweight encoding mechanism is needed to accommodate 8
    217    bit information in parameter values.
    218 
    219 
    220 
    221 
    222 
    223 
    224 
    225 
    226 Freed & Moore               Standards Track                     [Page 4]
    227 
    228 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    229 
    230 
    231    Asterisks ("*") are reused to provide the indicator that language and
    232    character set information is present and encoding is being used. A
    233    single quote ("'") is used to delimit the character set and language
    234    information at the beginning of the parameter value. Percent signs
    235    ("%") are used as the encoding flag, which agrees with RFC 2047.
    236 
    237    Specifically, an asterisk at the end of a parameter name acts as an
    238    indicator that character set and language information may appear at
    239    the beginning of the parameter value. A single quote is used to
    240    separate the character set, language, and actual value information in
    241    the parameter value string, and an percent sign is used to flag
    242    octets encoded in hexadecimal.  For example:
    243 
    244         Content-Type: application/x-stuff;
    245          title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
    246 
    247    Note that it is perfectly permissible to leave either the character
    248    set or language field blank.  Note also that the single quote
    249    delimiters MUST be present even when one of the field values is
    250    omitted.  This is done when either character set, language, or both
    251    are not relevant to the parameter value at hand.  This MUST NOT be
    252    done in order to indicate a default character set or language --
    253    parameter field definitions MUST NOT assign a default character set
    254    or language.
    255 
    256 4.1.  Combining Character Set, Language, and Parameter Continuations
    257 
    258    Character set and language information may be combined with the
    259    parameter continuation mechanism. For example:
    260 
    261    Content-Type: application/x-stuff
    262     title*0*=us-ascii'en'This%20is%20even%20more%20
    263     title*1*=%2A%2A%2Afun%2A%2A%2A%20
    264     title*2="isn't it!"
    265 
    266    Note that:
    267 
    268     (1)   Language and character set information only appear at
    269           the beginning of a given parameter value.
    270 
    271     (2)   Continuations do not provide a facility for using more
    272           than one character set or language in the same
    273           parameter value.
    274 
    275     (3)   A value presented using multiple continuations may
    276           contain a mixture of encoded and unencoded segments.
    277 
    278 
    279 
    280 
    281 
    282 Freed & Moore               Standards Track                     [Page 5]
    283 
    284 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    285 
    286 
    287     (4)   The first segment of a continuation MUST be encoded if
    288           language and character set information are given.
    289 
    290     (5)   If the first segment of a continued parameter value is
    291           encoded the language and character set field delimiters
    292           MUST be present even when the fields are left blank.
    293 
    294 5.  Language specification in Encoded Words
    295 
    296    RFC 2047 provides support for non-US-ASCII character sets in RFC 822
    297    message header comments, phrases, and any unstructured text field.
    298    This is done by defining an encoded word construct which can appear
    299    in any of these places.  Given that these are fields intended for
    300    display, it is sometimes necessary to associate language information
    301    with encoded words as well as just the character set.  This
    302    specification extends the definition of an encoded word to allow the
    303    inclusion of such information.  This is simply done by suffixing the
    304    character set specification with an asterisk followed by the language
    305    tag.  For example:
    306 
    307           From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
    308 
    309 6.  IMAP4 Handling of Parameter Values
    310 
    311    IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
    312    when generating the BODY and BODYSTRUCTURE fetch attributes.
    313 
    314 7.  Modifications to MIME ABNF
    315 
    316    The ABNF for MIME parameter values given in RFC 2045 is:
    317 
    318    parameter := attribute "=" value
    319 
    320    attribute := token
    321                 ; Matching of attributes
    322                 ; is ALWAYS case-insensitive.
    323 
    324    This specification changes this ABNF to:
    325 
    326    parameter := regular-parameter / extended-parameter
    327 
    328    regular-parameter := regular-parameter-name "=" value
    329 
    330    regular-parameter-name := attribute [section]
    331 
    332    attribute := 1*attribute-char
    333 
    334 
    335 
    336 
    337 
    338 Freed & Moore               Standards Track                     [Page 6]
    339 
    340 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    341 
    342 
    343    attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
    344                      "*", "'", "%", or tspecials>
    345 
    346    section := initial-section / other-sections
    347 
    348    initial-section := "*0"
    349 
    350    other-sections := "*" ("1" / "2" / "3" / "4" / "5" /
    351                           "6" / "7" / "8" / "9") *DIGIT)
    352 
    353    extended-parameter := (extended-initial-name "="
    354                           extended-value) /
    355                          (extended-other-names "="
    356                           extended-other-values)
    357 
    358    extended-initial-name := attribute [initial-section] "*"
    359 
    360    extended-other-names := attribute other-sections "*"
    361 
    362    extended-initial-value := [charset] "'" [language] "'"
    363                              extended-other-values
    364 
    365    extended-other-values := *(ext-octet / attribute-char)
    366 
    367    ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
    368 
    369    charset := <registered character set name>
    370 
    371    language := <registered language tag [RFC-1766]>
    372 
    373    The ABNF given in RFC 2047 for encoded-words is:
    374 
    375    encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
    376 
    377    This specification changes this ABNF to:
    378 
    379    encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
    380 
    381 8.  Character sets which allow specification of language
    382 
    383    In the future it is likely that some character sets will provide
    384    facilities for inline language labeling. Such facilities are
    385    inherently more flexible than those defined here as they allow for
    386    language switching in the middle of a string.
    387 
    388 
    389 
    390 
    391 
    392 
    393 
    394 Freed & Moore               Standards Track                     [Page 7]
    395 
    396 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    397 
    398 
    399    If and when such facilities are developed they SHOULD be used in
    400    preference to the language labeling facilities specified here. Note
    401    that all the mechanisms defined here allow for the omission of
    402    language labels so as to be able to accommodate this possible future
    403    usage.
    404 
    405 9.  Security Considerations
    406 
    407    This RFC does not discuss security issues and is not believed to
    408    raise any security issues not already endemic in electronic mail and
    409    present in fully conforming implementations of MIME.
    410 
    411 10.  References
    412 
    413    [RFC-822]
    414         Crocker, D., "Standard for the Format of ARPA Internet
    415         Text Messages", STD 11, RFC 822 August 1982.
    416 
    417    [RFC-1766]
    418         Alvestrand, H., "Tags for the Identification of
    419         Languages", RFC 1766, March 1995.
    420 
    421    [RFC-2045]
    422         Freed, N., and N. Borenstein, "Multipurpose Internet Mail
    423         Extensions (MIME) Part One: Format of Internet Message
    424         Bodies", RFC 2045, December 1996.
    425 
    426    [RFC-2046]
    427         Freed, N. and N. Borenstein, "Multipurpose Internet Mail
    428         Extensions (MIME) Part Two: Media Types", RFC 2046,
    429         December 1996.
    430 
    431    [RFC-2047]
    432         Moore, K., "Multipurpose Internet Mail Extensions (MIME)
    433         Part Three: Representation of Non-ASCII Text in Internet
    434         Message Headers", RFC 2047, December 1996.
    435 
    436    [RFC-2048]
    437         Freed, N., Klensin, J. and J. Postel, "Multipurpose
    438         Internet Mail Extensions (MIME) Part Four: MIME
    439         Registration Procedures", RFC 2048, December 1996.
    440 
    441    [RFC-2049]
    442         Freed, N. and N. Borenstein, "Multipurpose Internet Mail
    443         Extensions (MIME) Part Five: Conformance Criteria and
    444         Examples", RFC 2049, December 1996.
    445 
    446 
    447 
    448 
    449 
    450 Freed & Moore               Standards Track                     [Page 8]
    451 
    452 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    453 
    454 
    455    [RFC-2060]
    456         Crispin, M., "Internet Message Access Protocol - Version
    457         4rev1", RFC 2060, December 1996.
    458 
    459    [RFC-2119]
    460         Bradner, S., "Key words for use in RFCs to Indicate
    461         Requirement Levels", RFC 2119, March 1997.
    462 
    463    [RFC-2130]
    464         Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
    465         Atkinson, R., Crispin, M., and P. Svanberg, "Report from the
    466         IAB Character Set Workshop", RFC 2130, April 1997.
    467 
    468    [RFC-2183]
    469         Troost, R., Dorner, S. and K. Moore, "Communicating
    470         Presentation Information in Internet Messages:  The
    471         Content-Disposition Header", RFC 2183, August 1997.
    472 
    473 11.  Authors' Addresses
    474 
    475    Ned Freed
    476    Innosoft International, Inc.
    477    1050 Lakes Drive
    478    West Covina, CA 91790
    479    USA
    480 
    481    Phone: +1 626 919 3600
    482    Fax:   +1 626 919 3614
    483    EMail: ned.freed@innosoft.com
    484 
    485 
    486    Keith Moore
    487    Computer Science Dept.
    488    University of Tennessee
    489    107 Ayres Hall
    490    Knoxville, TN 37996-1301
    491    USA
    492 
    493    EMail: moore@cs.utk.edu
    494 
    495 
    496 
    497 
    498 
    499 
    500 
    501 
    502 
    503 
    504 
    505 
    506 Freed & Moore               Standards Track                     [Page 9]
    507 
    508 RFC 2231         MIME Value and Encoded Word Extensions    November 1997
    509 
    510 
    511 12.  Full Copyright Statement
    512 
    513    Copyright (C) The Internet Society (1997).  All Rights Reserved.
    514 
    515    This document and translations of it may be copied and furnished to
    516    others, and derivative works that comment on or otherwise explain it
    517    or assist in its implementation may be prepared, copied, published
    518    and distributed, in whole or in part, without restriction of any
    519    kind, provided that the above copyright notice and this paragraph are
    520    included on all such copies and derivative works.  However, this
    521    document itself may not be modified in any way, such as by removing
    522    the copyright notice or references to the Internet Society or other
    523    Internet organizations, except as needed for the purpose of
    524    developing Internet standards in which case the procedures for
    525    copyrights defined in the Internet Standards process must be
    526    followed, or as required to translate it into languages other than
    527    English.
    528 
    529    The limited permissions granted above are perpetual and will not be
    530    revoked by the Internet Society or its successors or assigns.
    531 
    532    This document and the information contained herein is provided on an
    533    "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
    534    TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
    535    BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
    536    HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
    537    MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
    538 
    539 
    540 
    541 
    542 
    543 
    544 
    545 
    546 
    547 
    548 
    549 
    550 
    551 
    552 
    553 
    554 
    555 
    556 
    557 
    558 
    559 
    560 
    561 
    562 Freed & Moore               Standards Track                    [Page 10]
    563