rohrpost

A commandline mail client to change the world as we see it.
git clone git://r-36.net/rohrpost
Log | Files | Refs | README | LICENSE

rfc2047.txt (33262B)


      1 
      2 
      3 
      4 
      5 
      6 
      7 Network Working Group                                           K. Moore
      8 Request for Comments: 2047                       University of Tennessee
      9 Obsoletes: 1521, 1522, 1590                                November 1996
     10 Category: Standards Track
     11 
     12 
     13         MIME (Multipurpose Internet Mail Extensions) Part Three:
     14               Message Header Extensions for Non-ASCII Text
     15 
     16 Status of this Memo
     17 
     18    This document specifies an Internet standards track protocol for the
     19    Internet community, and requests discussion and suggestions for
     20    improvements.  Please refer to the current edition of the "Internet
     21    Official Protocol Standards" (STD 1) for the standardization state
     22    and status of this protocol.  Distribution of this memo is unlimited.
     23 
     24 Abstract
     25 
     26    STD 11, RFC 822, defines a message representation protocol specifying
     27    considerable detail about US-ASCII message headers, and leaves the
     28    message content, or message body, as flat US-ASCII text.  This set of
     29    documents, collectively called the Multipurpose Internet Mail
     30    Extensions, or MIME, redefines the format of messages to allow for
     31 
     32    (1) textual message bodies in character sets other than US-ASCII,
     33 
     34    (2) an extensible set of different formats for non-textual message
     35        bodies,
     36 
     37    (3) multi-part message bodies, and
     38 
     39    (4) textual header information in character sets other than US-ASCII.
     40 
     41    These documents are based on earlier work documented in RFC 934, STD
     42    11, and RFC 1049, but extends and revises them.  Because RFC 822 said
     43    so little about message bodies, these documents are largely
     44    orthogonal to (rather than a revision of) RFC 822.
     45 
     46    This particular document is the third document in the series.  It
     47    describes extensions to RFC 822 to allow non-US-ASCII text data in
     48    Internet mail header fields.
     49 
     50 
     51 
     52 
     53 
     54 
     55 
     56 
     57 
     58 Moore                       Standards Track                     [Page 1]
     59 
     60 RFC 2047               Message Header Extensions           November 1996
     61 
     62 
     63    Other documents in this series include:
     64 
     65    + RFC 2045, which specifies the various headers used to describe
     66      the structure of MIME messages.
     67 
     68    + RFC 2046, which defines the general structure of the MIME media
     69      typing system and defines an initial set of media types,
     70 
     71    + RFC 2048, which specifies various IANA registration procedures
     72      for MIME-related facilities, and
     73 
     74    + RFC 2049, which describes MIME conformance criteria and
     75      provides some illustrative examples of MIME message formats,
     76      acknowledgements, and the bibliography.
     77 
     78    These documents are revisions of RFCs 1521, 1522, and 1590, which
     79    themselves were revisions of RFCs 1341 and 1342.  An appendix in RFC
     80    2049 describes differences and changes from previous versions.
     81 
     82 1. Introduction
     83 
     84    RFC 2045 describes a mechanism for denoting textual body parts which
     85    are coded in various character sets, as well as methods for encoding
     86    such body parts as sequences of printable US-ASCII characters.  This
     87    memo describes similar techniques to allow the encoding of non-ASCII
     88    text in various portions of a RFC 822 [2] message header, in a manner
     89    which is unlikely to confuse existing message handling software.
     90 
     91    Like the encoding techniques described in RFC 2045, the techniques
     92    outlined here were designed to allow the use of non-ASCII characters
     93    in message headers in a way which is unlikely to be disturbed by the
     94    quirks of existing Internet mail handling programs.  In particular,
     95    some mail relaying programs are known to (a) delete some message
     96    header fields while retaining others, (b) rearrange the order of
     97    addresses in To or Cc fields, (c) rearrange the (vertical) order of
     98    header fields, and/or (d) "wrap" message headers at different places
     99    than those in the original message.  In addition, some mail reading
    100    programs are known to have difficulty correctly parsing message
    101    headers which, while legal according to RFC 822, make use of
    102    backslash-quoting to "hide" special characters such as "<", ",", or
    103    ":", or which exploit other infrequently-used features of that
    104    specification.
    105 
    106    While it is unfortunate that these programs do not correctly
    107    interpret RFC 822 headers, to "break" these programs would cause
    108    severe operational problems for the Internet mail system.  The
    109    extensions described in this memo therefore do not rely on little-
    110    used features of RFC 822.
    111 
    112 
    113 
    114 Moore                       Standards Track                     [Page 2]
    115 
    116 RFC 2047               Message Header Extensions           November 1996
    117 
    118 
    119    Instead, certain sequences of "ordinary" printable ASCII characters
    120    (known as "encoded-words") are reserved for use as encoded data.  The
    121    syntax of encoded-words is such that they are unlikely to
    122    "accidentally" appear as normal text in message headers.
    123    Furthermore, the characters used in encoded-words are restricted to
    124    those which do not have special meanings in the context in which the
    125    encoded-word appears.
    126 
    127    Generally, an "encoded-word" is a sequence of printable ASCII
    128    characters that begins with "=?", ends with "?=", and has two "?"s in
    129    between.  It specifies a character set and an encoding method, and
    130    also includes the original text encoded as graphic ASCII characters,
    131    according to the rules for that encoding method.
    132 
    133    A mail composer that implements this specification will provide a
    134    means of inputting non-ASCII text in header fields, but will
    135    translate these fields (or appropriate portions of these fields) into
    136    encoded-words before inserting them into the message header.
    137 
    138    A mail reader that implements this specification will recognize
    139    encoded-words when they appear in certain portions of the message
    140    header.  Instead of displaying the encoded-word "as is", it will
    141    reverse the encoding and display the original text in the designated
    142    character set.
    143 
    144 NOTES
    145 
    146    This memo relies heavily on notation and terms defined RFC 822 and
    147    RFC 2045.  In particular, the syntax for the ABNF used in this memo
    148    is defined in RFC 822, as well as many of the terminal or nonterminal
    149    symbols from RFC 822 are used in the grammar for the header
    150    extensions defined here.  Among the symbols defined in RFC 822 and
    151    referenced in this memo are: 'addr-spec', 'atom', 'CHAR', 'comment',
    152    'CTLs', 'ctext', 'linear-white-space', 'phrase', 'quoted-pair'.
    153    'quoted-string', 'SPACE', and 'word'.  Successful implementation of
    154    this protocol extension requires careful attention to the RFC 822
    155    definitions of these terms.
    156 
    157    When the term "ASCII" appears in this memo, it refers to the "7-Bit
    158    American Standard Code for Information Interchange", ANSI X3.4-1986.
    159    The MIME charset name for this character set is "US-ASCII".  When not
    160    specifically referring to the MIME charset name, this document uses
    161    the term "ASCII", both for brevity and for consistency with RFC 822.
    162    However, implementors are warned that the character set name must be
    163    spelled "US-ASCII" in MIME message and body part headers.
    164 
    165 
    166 
    167 
    168 
    169 
    170 Moore                       Standards Track                     [Page 3]
    171 
    172 RFC 2047               Message Header Extensions           November 1996
    173 
    174 
    175    This memo specifies a protocol for the representation of non-ASCII
    176    text in message headers.  It specifically DOES NOT define any
    177    translation between "8-bit headers" and pure ASCII headers, nor is
    178    any such translation assumed to be possible.
    179 
    180 2. Syntax of encoded-words
    181 
    182    An 'encoded-word' is defined by the following ABNF grammar.  The
    183    notation of RFC 822 is used, with the exception that white space
    184    characters MUST NOT appear between components of an 'encoded-word'.
    185 
    186    encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
    187 
    188    charset = token    ; see section 3
    189 
    190    encoding = token   ; see section 4
    191 
    192    token = 1*<Any CHAR except SPACE, CTLs, and especials>
    193 
    194    especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
    195                <"> / "/" / "[" / "]" / "?" / "." / "="
    196 
    197    encoded-text = 1*<Any printable ASCII character other than "?"
    198                      or SPACE>
    199                   ; (but see "Use of encoded-words in message
    200                   ; headers", section 5)
    201 
    202    Both 'encoding' and 'charset' names are case-independent.  Thus the
    203    charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the
    204    encoding named "Q" may be spelled either "Q" or "q".
    205 
    206    An 'encoded-word' may not be more than 75 characters long, including
    207    'charset', 'encoding', 'encoded-text', and delimiters.  If it is
    208    desirable to encode more text than will fit in an 'encoded-word' of
    209    75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
    210    be used.
    211 
    212    While there is no limit to the length of a multiple-line header
    213    field, each line of a header field that contains one or more
    214    'encoded-word's is limited to 76 characters.
    215 
    216    The length restrictions are included both to ease interoperability
    217    through internetwork mail gateways, and to impose a limit on the
    218    amount of lookahead a header parser must employ (while looking for a
    219    final ?= delimiter) before it can decide whether a token is an
    220    "encoded-word" or something else.
    221 
    222 
    223 
    224 
    225 
    226 Moore                       Standards Track                     [Page 4]
    227 
    228 RFC 2047               Message Header Extensions           November 1996
    229 
    230 
    231    IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
    232    by an RFC 822 parser.  As a consequence, unencoded white space
    233    characters (such as SPACE and HTAB) are FORBIDDEN within an
    234    'encoded-word'.  For example, the character sequence
    235 
    236       =?iso-8859-1?q?this is some text?=
    237 
    238    would be parsed as four 'atom's, rather than as a single 'atom' (by
    239    an RFC 822 parser) or 'encoded-word' (by a parser which understands
    240    'encoded-words').  The correct way to encode the string "this is some
    241    text" is to encode the SPACE characters as well, e.g.
    242 
    243       =?iso-8859-1?q?this=20is=20some=20text?=
    244 
    245    The characters which may appear in 'encoded-text' are further
    246    restricted by the rules in section 5.
    247 
    248 3. Character sets
    249 
    250    The 'charset' portion of an 'encoded-word' specifies the character
    251    set associated with the unencoded text.  A 'charset' can be any of
    252    the character set names allowed in an MIME "charset" parameter of a
    253    "text/plain" body part, or any character set name registered with
    254    IANA for use with the MIME text/plain content-type.
    255 
    256    Some character sets use code-switching techniques to switch between
    257    "ASCII mode" and other modes.  If unencoded text in an 'encoded-word'
    258    contains a sequence which causes the charset interpreter to switch
    259    out of ASCII mode, it MUST contain additional control codes such that
    260    ASCII mode is again selected at the end of the 'encoded-word'.  (This
    261    rule applies separately to each 'encoded-word', including adjacent
    262    'encoded-word's within a single header field.)
    263 
    264    When there is a possibility of using more than one character set to
    265    represent the text in an 'encoded-word', and in the absence of
    266    private agreements between sender and recipients of a message, it is
    267    recommended that members of the ISO-8859-* series be used in
    268    preference to other character sets.
    269 
    270 4. Encodings
    271 
    272    Initially, the legal values for "encoding" are "Q" and "B".  These
    273    encodings are described below.  The "Q" encoding is recommended for
    274    use when most of the characters to be encoded are in the ASCII
    275    character set; otherwise, the "B" encoding should be used.
    276    Nevertheless, a mail reader which claims to recognize 'encoded-word's
    277    MUST be able to accept either encoding for any character set which it
    278    supports.
    279 
    280 
    281 
    282 Moore                       Standards Track                     [Page 5]
    283 
    284 RFC 2047               Message Header Extensions           November 1996
    285 
    286 
    287    Only a subset of the printable ASCII characters may be used in
    288    'encoded-text'.  Space and tab characters are not allowed, so that
    289    the beginning and end of an 'encoded-word' are obvious.  The "?"
    290    character is used within an 'encoded-word' to separate the various
    291    portions of the 'encoded-word' from one another, and thus cannot
    292    appear in the 'encoded-text' portion.  Other characters are also
    293    illegal in certain contexts.  For example, an 'encoded-word' in a
    294    'phrase' preceding an address in a From header field may not contain
    295    any of the "specials" defined in RFC 822.  Finally, certain other
    296    characters are disallowed in some contexts, to ensure reliability for
    297    messages that pass through internetwork mail gateways.
    298 
    299    The "B" encoding automatically meets these requirements.  The "Q"
    300    encoding allows a wide range of printable characters to be used in
    301    non-critical locations in the message header (e.g., Subject), with
    302    fewer characters available for use in other locations.
    303 
    304 4.1. The "B" encoding
    305 
    306    The "B" encoding is identical to the "BASE64" encoding defined by RFC
    307    2045.
    308 
    309 4.2. The "Q" encoding
    310 
    311    The "Q" encoding is similar to the "Quoted-Printable" content-
    312    transfer-encoding defined in RFC 2045.  It is designed to allow text
    313    containing mostly ASCII characters to be decipherable on an ASCII
    314    terminal without decoding.
    315 
    316    (1) Any 8-bit value may be represented by a "=" followed by two
    317        hexadecimal digits.  For example, if the character set in use
    318        were ISO-8859-1, the "=" character would thus be encoded as
    319        "=3D", and a SPACE by "=20".  (Upper case should be used for
    320        hexadecimal digits "A" through "F".)
    321 
    322    (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
    323        represented as "_" (underscore, ASCII 95.).  (This character may
    324        not pass through some internetwork mail gateways, but its use
    325        will greatly enhance readability of "Q" encoded data with mail
    326        readers that do not support this encoding.)  Note that the "_"
    327        always represents hexadecimal 20, even if the SPACE character
    328        occupies a different code position in the character set in use.
    329 
    330    (3) 8-bit values which correspond to printable ASCII characters other
    331        than "=", "?", and "_" (underscore), MAY be represented as those
    332        characters.  (But see section 5 for restrictions.)  In
    333        particular, SPACE and TAB MUST NOT be represented as themselves
    334        within encoded words.
    335 
    336 
    337 
    338 Moore                       Standards Track                     [Page 6]
    339 
    340 RFC 2047               Message Header Extensions           November 1996
    341 
    342 
    343 5. Use of encoded-words in message headers
    344 
    345    An 'encoded-word' may appear in a message header or body part header
    346    according to the following rules:
    347 
    348 (1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822)
    349     in any Subject or Comments header field, any extension message
    350     header field, or any MIME body part field for which the field body
    351     is defined as '*text'.  An 'encoded-word' may also appear in any
    352     user-defined ("X-") message or body part header field.
    353 
    354     Ordinary ASCII text and 'encoded-word's may appear together in the
    355     same header field.  However, an 'encoded-word' that appears in a
    356     header field defined as '*text' MUST be separated from any adjacent
    357     'encoded-word' or 'text' by 'linear-white-space'.
    358 
    359 (2) An 'encoded-word' may appear within a 'comment' delimited by "(" and
    360     ")", i.e., wherever a 'ctext' is allowed.  More precisely, the RFC
    361     822 ABNF definition for 'comment' is amended as follows:
    362 
    363     comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"
    364 
    365     A "Q"-encoded 'encoded-word' which appears in a 'comment' MUST NOT
    366     contain the characters "(", ")" or "
    367     'encoded-word' that appears in a 'comment' MUST be separated from
    368     any adjacent 'encoded-word' or 'ctext' by 'linear-white-space'.
    369 
    370     It is important to note that 'comment's are only recognized inside
    371     "structured" field bodies.  In fields whose bodies are defined as
    372     '*text', "(" and ")" are treated as ordinary characters rather than
    373     comment delimiters, and rule (1) of this section applies.  (See RFC
    374     822, sections 3.1.2 and 3.1.3)
    375 
    376 (3) As a replacement for a 'word' entity within a 'phrase', for example,
    377     one that precedes an address in a From, To, or Cc header.  The ABNF
    378     definition for 'phrase' from RFC 822 thus becomes:
    379 
    380     phrase = 1*( encoded-word / word )
    381 
    382     In this case the set of characters that may be used in a "Q"-encoded
    383     'encoded-word' is restricted to: <upper and lower case ASCII
    384     letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_"
    385     (underscore, ASCII 95.)>.  An 'encoded-word' that appears within a
    386     'phrase' MUST be separated from any adjacent 'word', 'text' or
    387     'special' by 'linear-white-space'.
    388 
    389 
    390 
    391 
    392 
    393 
    394 Moore                       Standards Track                     [Page 7]
    395 
    396 RFC 2047               Message Header Extensions           November 1996
    397 
    398 
    399    These are the ONLY locations where an 'encoded-word' may appear.  In
    400    particular:
    401 
    402    + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'.
    403 
    404    + An 'encoded-word' MUST NOT appear within a 'quoted-string'.
    405 
    406    + An 'encoded-word' MUST NOT be used in a Received header field.
    407 
    408    + An 'encoded-word' MUST NOT be used in parameter of a MIME
    409      Content-Type or Content-Disposition field, or in any structured
    410      field body except within a 'comment' or 'phrase'.
    411 
    412    The 'encoded-text' in an 'encoded-word' must be self-contained;
    413    'encoded-text' MUST NOT be continued from one 'encoded-word' to
    414    another.  This implies that the 'encoded-text' portion of a "B"
    415    'encoded-word' will be a multiple of 4 characters long; for a "Q"
    416    'encoded-word', any "=" character that appears in the 'encoded-text'
    417    portion will be followed by two hexadecimal characters.
    418 
    419    Each 'encoded-word' MUST encode an integral number of octets.  The
    420    'encoded-text' in each 'encoded-word' must be well-formed according
    421    to the encoding specified; the 'encoded-text' may not be continued in
    422    the next 'encoded-word'.  (For example, "=?charset?Q?=?=
    423    =?charset?Q?AB?=" would be illegal, because the two hex digits "AB"
    424    must follow the "=" in the same 'encoded-word'.)
    425 
    426    Each 'encoded-word' MUST represent an integral number of characters.
    427    A multi-octet character may not be split across adjacent 'encoded-
    428    word's.
    429 
    430    Only printable and white space character data should be encoded using
    431    this scheme.  However, since these encoding schemes allow the
    432    encoding of arbitrary octet values, mail readers that implement this
    433    decoding should also ensure that display of the decoded data on the
    434    recipient's terminal will not cause unwanted side-effects.
    435 
    436    Use of these methods to encode non-textual data (e.g., pictures or
    437    sounds) is not defined by this memo.  Use of 'encoded-word's to
    438    represent strings of purely ASCII characters is allowed, but
    439    discouraged.  In rare cases it may be necessary to encode ordinary
    440    text that looks like an 'encoded-word'.
    441 
    442 
    443 
    444 
    445 
    446 
    447 
    448 
    449 
    450 Moore                       Standards Track                     [Page 8]
    451 
    452 RFC 2047               Message Header Extensions           November 1996
    453 
    454 
    455 6. Support of 'encoded-word's by mail readers
    456 
    457 6.1. Recognition of 'encoded-word's in message headers
    458 
    459    A mail reader must parse the message and body part headers according
    460    to the rules in RFC 822 to correctly recognize 'encoded-word's.
    461 
    462    'encoded-word's are to be recognized as follows:
    463 
    464    (1) Any message or body part header field defined as '*text', or any
    465        user-defined header field, should be parsed as follows: Beginning
    466        at the start of the field-body and immediately following each
    467        occurrence of 'linear-white-space', each sequence of up to 75
    468        printable characters (not containing any 'linear-white-space')
    469        should be examined to see if it is an 'encoded-word' according to
    470        the syntax rules in section 2.  Any other sequence of printable
    471        characters should be treated as ordinary ASCII text.
    472 
    473    (2) Any header field not defined as '*text' should be parsed
    474        according to the syntax rules for that header field.  However,
    475        any 'word' that appears within a 'phrase' should be treated as an
    476        'encoded-word' if it meets the syntax rules in section 2.
    477        Otherwise it should be treated as an ordinary 'word'.
    478 
    479    (3) Within a 'comment', any sequence of up to 75 printable characters
    480        (not containing 'linear-white-space'), that meets the syntax
    481        rules in section 2, should be treated as an 'encoded-word'.
    482        Otherwise it should be treated as normal comment text.
    483 
    484    (4) A MIME-Version header field is NOT required to be present for
    485        'encoded-word's to be interpreted according to this
    486        specification.  One reason for this is that the mail reader is
    487        not expected to parse the entire message header before displaying
    488        lines that may contain 'encoded-word's.
    489 
    490 6.2. Display of 'encoded-word's
    491 
    492    Any 'encoded-word's so recognized are decoded, and if possible, the
    493    resulting unencoded text is displayed in the original character set.
    494 
    495    NOTE: Decoding and display of encoded-words occurs *after* a
    496    structured field body is parsed into tokens.  It is therefore
    497    possible to hide 'special' characters in encoded-words which, when
    498    displayed, will be indistinguishable from 'special' characters in the
    499    surrounding text.  For this and other reasons, it is NOT generally
    500    possible to translate a message header containing 'encoded-word's to
    501    an unencoded form which can be parsed by an RFC 822 mail reader.
    502 
    503 
    504 
    505 
    506 Moore                       Standards Track                     [Page 9]
    507 
    508 RFC 2047               Message Header Extensions           November 1996
    509 
    510 
    511    When displaying a particular header field that contains multiple
    512    'encoded-word's, any 'linear-white-space' that separates a pair of
    513    adjacent 'encoded-word's is ignored.  (This is to allow the use of
    514    multiple 'encoded-word's to represent long strings of unencoded text,
    515    without having to separate 'encoded-word's where spaces occur in the
    516    unencoded text.)
    517 
    518    In the event other encodings are defined in the future, and the mail
    519    reader does not support the encoding used, it may either (a) display
    520    the 'encoded-word' as ordinary text, or (b) substitute an appropriate
    521    message indicating that the text could not be decoded.
    522 
    523    If the mail reader does not support the character set used, it may
    524    (a) display the 'encoded-word' as ordinary text (i.e., as it appears
    525    in the header), (b) make a "best effort" to display using such
    526    characters as are available, or (c) substitute an appropriate message
    527    indicating that the decoded text could not be displayed.
    528 
    529    If the character set being used employs code-switching techniques,
    530    display of the encoded text implicitly begins in "ASCII mode".  In
    531    addition, the mail reader must ensure that the output device is once
    532    again in "ASCII mode" after the 'encoded-word' is displayed.
    533 
    534 6.3. Mail reader handling of incorrectly formed 'encoded-word's
    535 
    536    It is possible that an 'encoded-word' that is legal according to the
    537    syntax defined in section 2, is incorrectly formed according to the
    538    rules for the encoding being used.   For example:
    539 
    540    (1) An 'encoded-word' which contains characters which are not legal
    541        for a particular encoding (for example, a "-" in the "B"
    542        encoding, or a SPACE or HTAB in either the "B" or "Q" encoding),
    543        is incorrectly formed.
    544 
    545    (2) Any 'encoded-word' which encodes a non-integral number of
    546        characters or octets is incorrectly formed.
    547 
    548    A mail reader need not attempt to display the text associated with an
    549    'encoded-word' that is incorrectly formed.  However, a mail reader
    550    MUST NOT prevent the display or handling of a message because an
    551    'encoded-word' is incorrectly formed.
    552 
    553 7. Conformance
    554 
    555    A mail composing program claiming compliance with this specification
    556    MUST ensure that any string of non-white-space printable ASCII
    557    characters within a '*text' or '*ctext' that begins with "=?" and
    558    ends with "?=" be a valid 'encoded-word'.  ("begins" means: at the
    559 
    560 
    561 
    562 Moore                       Standards Track                    [Page 10]
    563 
    564 RFC 2047               Message Header Extensions           November 1996
    565 
    566 
    567    start of the field-body, immediately following 'linear-white-space',
    568    or immediately following a "(" for an 'encoded-word' within '*ctext';
    569    "ends" means: at the end of the field-body, immediately preceding
    570    'linear-white-space', or immediately preceding a ")" for an
    571    'encoded-word' within '*ctext'.)  In addition, any 'word' within a
    572    'phrase' that begins with "=?" and ends with "?=" must be a valid
    573    'encoded-word'.
    574 
    575    A mail reading program claiming compliance with this specification
    576    must be able to distinguish 'encoded-word's from 'text', 'ctext', or
    577    'word's, according to the rules in section 6, anytime they appear in
    578    appropriate places in message headers.  It must support both the "B"
    579    and "Q" encodings for any character set which it supports.  The
    580    program must be able to display the unencoded text if the character
    581    set is "US-ASCII".  For the ISO-8859-* character sets, the mail
    582    reading program must at least be able to display the characters which
    583    are also in the ASCII set.
    584 
    585 8. Examples
    586 
    587    The following are examples of message headers containing 'encoded-
    588    word's:
    589 
    590    From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
    591    To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
    592    CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
    593    Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    594     =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
    595 
    596       Note: In the first 'encoded-word' of the Subject field above, the
    597       last "=" at the end of the 'encoded-text' is necessary because each
    598       'encoded-word' must be self-contained (the "=" character completes a
    599       group of 4 base64 characters representing 2 octets).  An additional
    600       octet could have been encoded in the first 'encoded-word' (so that
    601       the encoded-word would contain an exact multiple of 3 encoded
    602       octets), except that the second 'encoded-word' uses a different
    603       'charset' than the first one.
    604 
    605    From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se>
    606    To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se
    607    Subject: Time for ISO 10646?
    608 
    609    To: Dave Crocker <dcrocker@mordor.stanford.edu>
    610    Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se
    611    From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se>
    612    Subject: Re: RFC-HDR care and feeding
    613 
    614 
    615 
    616 
    617 
    618 Moore                       Standards Track                    [Page 11]
    619 
    620 RFC 2047               Message Header Extensions           November 1996
    621 
    622 
    623    From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
    624          (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)
    625    To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed
    626       <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu>
    627    Subject: Test of new header generator
    628    MIME-Version: 1.0
    629    Content-type: text/plain; charset=ISO-8859-1
    630 
    631    The following examples illustrate how text containing 'encoded-word's
    632    which appear in a structured field body.  The rules are slightly
    633    different for fields defined as '*text' because "(" and ")" are not
    634    recognized as 'comment' delimiters.  [Section 5, paragraph (1)].
    635 
    636    In each of the following examples, if the same sequence were to occur
    637    in a '*text' field, the "displayed as" form would NOT be treated as
    638    encoded words, but be identical to the "encoded form".  This is
    639    because each of the encoded-words in the following examples is
    640    adjacent to a "(" or ")" character.
    641 
    642    encoded form                                displayed as
    643    ---------------------------------------------------------------------
    644    (=?ISO-8859-1?Q?a?=)                        (a)
    645 
    646    (=?ISO-8859-1?Q?a?= b)                      (a b)
    647 
    648            Within a 'comment', white space MUST appear between an
    649            'encoded-word' and surrounding text.  [Section 5,
    650            paragraph (2)].  However, white space is not needed between
    651            the initial "(" that begins the 'comment', and the
    652            'encoded-word'.
    653 
    654 
    655    (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)     (ab)
    656 
    657            White space between adjacent 'encoded-word's is not
    658            displayed.
    659 
    660    (=?ISO-8859-1?Q?a?=  =?ISO-8859-1?Q?b?=)    (ab)
    661 
    662         Even multiple SPACEs between 'encoded-word's are ignored
    663         for the purpose of display.
    664 
    665    (=?ISO-8859-1?Q?a?=                         (ab)
    666        =?ISO-8859-1?Q?b?=)
    667 
    668            Any amount of linear-space-white between 'encoded-word's,
    669            even if it includes a CRLF followed by one or more SPACEs,
    670            is ignored for the purposes of display.
    671 
    672 
    673 
    674 Moore                       Standards Track                    [Page 12]
    675 
    676 RFC 2047               Message Header Extensions           November 1996
    677 
    678 
    679    (=?ISO-8859-1?Q?a_b?=)                      (a b)
    680 
    681            In order to cause a SPACE to be displayed within a portion
    682            of encoded text, the SPACE MUST be encoded as part of the
    683            'encoded-word'.
    684 
    685    (=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=)    (a b)
    686 
    687            In order to cause a SPACE to be displayed between two strings
    688            of encoded text, the SPACE MAY be encoded as part of one of
    689            the 'encoded-word's.
    690 
    691 9. References
    692 
    693    [RFC 822] Crocker, D., "Standard for the Format of ARPA Internet Text
    694        Messages", STD 11, RFC 822, UDEL, August 1982.
    695 
    696    [RFC 2049] Borenstein, N., and N. Freed, "Multipurpose Internet Mail
    697        Extensions (MIME) Part Five: Conformance Criteria and Examples",
    698        RFC 2049, November 1996.
    699 
    700    [RFC 2045] Borenstein, N., and N. Freed, "Multipurpose Internet Mail
    701        Extensions (MIME) Part One: Format of Internet Message Bodies",
    702        RFC 2045, November 1996.
    703 
    704    [RFC 2046] Borenstein N., and N. Freed, "Multipurpose Internet Mail
    705        Extensions (MIME) Part Two: Media Types", RFC 2046,
    706        November 1996.
    707 
    708    [RFC 2048] Freed, N., Klensin, J., and J. Postel, "Multipurpose
    709        Internet Mail Extensions (MIME) Part Four: Registration
    710        Procedures", RFC 2048, November 1996.
    711 
    712 
    713 
    714 
    715 
    716 
    717 
    718 
    719 
    720 
    721 
    722 
    723 
    724 
    725 
    726 
    727 
    728 
    729 
    730 Moore                       Standards Track                    [Page 13]
    731 
    732 RFC 2047               Message Header Extensions           November 1996
    733 
    734 
    735 10. Security Considerations
    736 
    737    Security issues are not discussed in this memo.
    738 
    739 11. Acknowledgements
    740 
    741    The author wishes to thank Nathaniel Borenstein, Issac Chan, Lutz
    742    Donnerhacke, Paul Eggert, Ned Freed, Andreas M. Kirchwitz, Olle
    743    Jarnefors, Mike Rosin, Yutaka Sato, Bart Schaefer, and Kazuhiko
    744    Yamamoto, for their helpful advice, insightful comments, and
    745    illuminating questions in response to earlier versions of this
    746    specification.
    747 
    748 12. Author's Address
    749 
    750    Keith Moore
    751    University of Tennessee
    752    107 Ayres Hall
    753    Knoxville TN 37996-1301
    754 
    755    EMail: moore@cs.utk.edu
    756 
    757 
    758 
    759 
    760 
    761 
    762 
    763 
    764 
    765 
    766 
    767 
    768 
    769 
    770 
    771 
    772 
    773 
    774 
    775 
    776 
    777 
    778 
    779 
    780 
    781 
    782 
    783 
    784 
    785 
    786 Moore                       Standards Track                    [Page 14]
    787 
    788 RFC 2047               Message Header Extensions           November 1996
    789 
    790 
    791 Appendix - changes since RFC 1522 (in no particular order)
    792 
    793    + explicitly state that the MIME-Version is not requried to use
    794      'encoded-word's.
    795 
    796    + add explicit note that SPACEs and TABs are not allowed within
    797      'encoded-word's, explaining that an 'encoded-word' must look like an
    798      'atom' to an RFC822 parser.values, to be precise).
    799 
    800    + add examples from Olle Jarnefors (thanks!) which illustrate how
    801      encoded-words with adjacent linear-white-space are displayed.
    802 
    803    + explicitly list terms defined in RFC822 and referenced in this memo
    804 
    805    + fix transcription typos that caused one or two lines and a couple of
    806      characters to disappear in the resulting text, due to nroff quirks.
    807 
    808    + clarify that encoded-words are allowed in '*text' fields in both
    809      RFC822 headers and MIME body part headers, but NOT as parameter
    810      values.
    811 
    812    + clarify the requirement to switch back to ASCII within the encoded
    813      portion of an 'encoded-word', for any charset that uses code switching
    814      sequences.
    815 
    816    + add a note about 'encoded-word's being delimited by "(" and ")"
    817      within a comment, but not in a *text (how bizarre!).
    818 
    819    + fix the Andre Pirard example to get rid of the trailing "_" after
    820      the =E9.  (no longer needed post-1342).
    821 
    822    + clarification: an 'encoded-word' may appear immediately following
    823      the initial "(" or immediately before the final ")" that delimits a
    824      comment, not just adjacent to "(" and ")" *within* *ctext.
    825 
    826    + add a note to explain that a "B" 'encoded-word' will always have a
    827      multiple of 4 characters in the 'encoded-text' portion.
    828 
    829    + add note about the "=" in the examples
    830 
    831    + note that processing of 'encoded-word's occurs *after* parsing, and
    832      some of the implications thereof.
    833 
    834    + explicitly state that you can't expect to translate between
    835      1522 and either vanilla 822 or so-called "8-bit headers".
    836 
    837    + explicitly state that 'encoded-word's are not valid within a
    838      'quoted-string'.
    839 
    840 
    841 
    842 Moore                       Standards Track                    [Page 15]
    843