rohrpost

A commandline mail client to change the world as we see it.
git clone git://r-36.net/rohrpost
Log | Files | Refs | LICENSE

commit a070db77245acdab7cee5aa2d67e145959aed08c
parent c1a076f8bfe69c7c5f741798f8831de09f9f102a
Author: Christoph Lohmann <20h@r-36.net>
Date:   Fri, 21 Dec 2012 18:01:59 +0100

Moving the RFCs to their own rfc folder.

Diffstat:
add.c | 2+-
proto/rfc1341.txt | 5265-------------------------------------------------------------------------------
proto/rfc2045.txt | 1739-------------------------------------------------------------------------------
proto/rfc2046.txt | 2467------------------------------------------------------------------------------
proto/rfc2047.txt | 843-------------------------------------------------------------------------------
proto/rfc2048.txt | 1180-------------------------------------------------------------------------------
proto/rfc2049.txt | 1347-------------------------------------------------------------------------------
proto/rfc2183.txt | 675-------------------------------------------------------------------------------
proto/rfc2231.txt | 563-------------------------------------------------------------------------------
proto/rfc2387.txt | 563-------------------------------------------------------------------------------
proto/rfc2425.txt | 1851-------------------------------------------------------------------------------
proto/rfc2426.txt | 2355-------------------------------------------------------------------------------
proto/rfc2595.txt | 843-------------------------------------------------------------------------------
proto/rfc2646.txt | 787-------------------------------------------------------------------------------
proto/rfc2822.txt | 2859-------------------------------------------------------------------------------
proto/rfc3501.txt | 6051-------------------------------------------------------------------------------
proto/rfc4616.txt | 619-------------------------------------------------------------------------------
proto/rfc5256.txt | 1067-------------------------------------------------------------------------------
proto/rfc5322.txt | 3195-------------------------------------------------------------------------------
proto/rfc5804.txt | 2747-------------------------------------------------------------------------------
proto/rfc822.txt | 2901-------------------------------------------------------------------------------
proto/sieve/rfc3028.txt | 2019-------------------------------------------------------------------------------
proto/sieve/rfc3431.txt | 451-------------------------------------------------------------------------------
proto/sieve/rfc5231.txt | 507-------------------------------------------------------------------------------
proto/sieve/rfc5260.txt | 731-------------------------------------------------------------------------------
proto/sieve/rfc5437.txt | 787-------------------------------------------------------------------------------
proto/sieve/rfc5804.txt | 2747-------------------------------------------------------------------------------
rfc/rfc1341.txt | 5265+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2045.txt | 1739+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2046.txt | 2467++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2047.txt | 843+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2048.txt | 1180+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2049.txt | 1347+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2183.txt | 675+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2231.txt | 563+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2387.txt | 563+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2425.txt | 1851+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2426.txt | 2355+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2595.txt | 843+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2646.txt | 787+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2821.txt | 4427+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc2822.txt | 2859+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc3501.txt | 6051+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc4616.txt | 619+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc5256.txt | 1067+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc5322.txt | 3195+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc5804.txt | 2747+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/rfc822.txt | 2901+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/sieve/rfc3028.txt | 2019+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/sieve/rfc3431.txt | 451+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/sieve/rfc5231.txt | 507+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/sieve/rfc5260.txt | 731+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/sieve/rfc5437.txt | 787+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rfc/sieve/rfc5804.txt | 2747+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
54 files changed, 51587 insertions(+), 47160 deletions(-)

diff --git a/add.c b/add.c @@ -88,7 +88,7 @@ addmain(int argc, char *argv[]) if (flags != NULL) { flagl = flag_sanitize(flags); if (flagl == NULL) - die("Flag parameter seems to be invalid."); + die("Flag parameter seems to be invalid.\n"); } cfg = config_init(cfgn); diff --git a/proto/rfc1341.txt b/proto/rfc1341.txt @@ -1,5265 +0,0 @@ - - - - - - - Network Working Group N. Borenstein, Bellcore - Request for Comments: 1341 N. Freed, Innosoft - June 1992 - - - - MIME (Multipurpose Internet Mail Extensions): - - - Mechanisms for Specifying and Describing - the Format of Internet Message Bodies - - - Status of this Memo - - This RFC specifies an IAB standards track protocol for the - Internet community, and requests discussion and suggestions - for improvements. Please refer to the current edition of - the "IAB Official Protocol Standards" for the - standardization state and status of this protocol. - Distribution of this memo is unlimited. - - Abstract - - RFC 822 defines a message representation protocol which - specifies considerable detail about message headers, but - which leaves the message content, or message body, as flat - ASCII text. This document redefines the format of message - bodies to allow multi-part textual and non-textual message - bodies to be represented and exchanged without loss of - information. This is based on earlier work documented in - RFC 934 and RFC 1049, but extends and revises that work. - Because RFC 822 said so little about message bodies, this - document is largely orthogonal to (rather than a revision - of) RFC 822. - - In particular, this document is designed to provide - facilities to include multiple objects in a single message, - to represent body text in character sets other than US- - ASCII, to represent formatted multi-font text messages, to - represent non-textual material such as images and audio - fragments, and generally to facilitate later extensions - defining new types of Internet mail for use by cooperating - mail agents. - - This document does NOT extend Internet mail header fields to - permit anything other than US-ASCII text data. It is - recognized that such extensions are necessary, and they are - the subject of a companion document [RFC -1342]. - - A table of contents appears at the end of this document. - - - - - - - Borenstein & Freed [Page i] - - - - - - - - 1 Introduction - - Since its publication in 1982, RFC 822 [RFC-822] has defined - the standard format of textual mail messages on the - Internet. Its success has been such that the RFC 822 format - has been adopted, wholly or partially, well beyond the - confines of the Internet and the Internet SMTP transport - defined by RFC 821 [RFC-821]. As the format has seen wider - use, a number of limitations have proven increasingly - restrictive for the user community. - - RFC 822 was intended to specify a format for text messages. - As such, non-text messages, such as multimedia messages that - might include audio or images, are simply not mentioned. - Even in the case of text, however, RFC 822 is inadequate for - the needs of mail users whose languages require the use of - character sets richer than US ASCII [US-ASCII]. Since RFC - 822 does not specify mechanisms for mail containing audio, - video, Asian language text, or even text in most European - languages, additional specifications are needed - - One of the notable limitations of RFC 821/822 based mail - systems is the fact that they limit the contents of - electronic mail messages to relatively short lines of - seven-bit ASCII. This forces users to convert any non- - textual data that they may wish to send into seven-bit bytes - representable as printable ASCII characters before invoking - a local mail UA (User Agent, a program with which human - users send and receive mail). Examples of such encodings - currently used in the Internet include pure hexadecimal, - uuencode, the 3-in-4 base 64 scheme specified in RFC 1113, - the Andrew Toolkit Representation [ATK], and many others. - - The limitations of RFC 822 mail become even more apparent as - gateways are designed to allow for the exchange of mail - messages between RFC 822 hosts and X.400 hosts. X.400 [X400] - specifies mechanisms for the inclusion of non-textual body - parts within electronic mail messages. The current - standards for the mapping of X.400 messages to RFC 822 - messages specify that either X.400 non-textual body parts - should be converted to (not encoded in) an ASCII format, or - that they should be discarded, notifying the RFC 822 user - that discarding has occurred. This is clearly undesirable, - as information that a user may wish to receive is lost. - Even though a user's UA may not have the capability of - dealing with the non-textual body part, the user might have - some mechanism external to the UA that can extract useful - information from the body part. Moreover, it does not allow - for the fact that the message may eventually be gatewayed - back into an X.400 message handling system (i.e., the X.400 - message is "tunneled" through Internet mail), where the - non-textual information would definitely become useful - again. - - - - - Borenstein & Freed [Page 1] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - This document describes several mechanisms that combine to - solve most of these problems without introducing any serious - incompatibilities with the existing world of RFC 822 mail. - In particular, it describes: - - 1. A MIME-Version header field, which uses a version number - to declare a message to be conformant with this - specification and allows mail processing agents to - distinguish between such messages and those generated - by older or non-conformant software, which is presumed - to lack such a field. - - 2. A Content-Type header field, generalized from RFC 1049 - [RFC-1049], which can be used to specify the type and - subtype of data in the body of a message and to fully - specify the native representation (encoding) of such - data. - - 2.a. A "text" Content-Type value, which can be used to - represent textual information in a number of - character sets and formatted text description - languages in a standardized manner. - - 2.b. A "multipart" Content-Type value, which can be - used to combine several body parts, possibly of - differing types of data, into a single message. - - 2.c. An "application" Content-Type value, which can be - used to transmit application data or binary data, - and hence, among other uses, to implement an - electronic mail file transfer service. - - 2.d. A "message" Content-Type value, for encapsulating - a mail message. - - 2.e An "image" Content-Type value, for transmitting - still image (picture) data. - - 2.f. An "audio" Content-Type value, for transmitting - audio or voice data. - - 2.g. A "video" Content-Type value, for transmitting - video or moving image data, possibly with audio as - part of the composite video data format. - - 3. A Content-Transfer-Encoding header field, which can be - used to specify an auxiliary encoding that was applied - to the data in order to allow it to pass through mail - transport mechanisms which may have data or character - set limitations. - - 4. Two optional header fields that can be used to further - describe the data in a message body, the Content-ID and - Content-Description header fields. - - - - Borenstein & Freed [Page 2] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - MIME has been carefully designed as an extensible mechanism, - and it is expected that the set of content-type/subtype - pairs and their associated parameters will grow - significantly with time. Several other MIME fields, notably - including character set names, are likely to have new values - defined over time. In order to ensure that the set of such - values is developed in an orderly, well-specified, and - public manner, MIME defines a registration process which - uses the Internet Assigned Numbers Authority (IANA) as a - central registry for such values. Appendix F provides - details about how IANA registration is accomplished. - - Finally, to specify and promote interoperability, Appendix A - of this document provides a basic applicability statement - for a subset of the above mechanisms that defines a minimal - level of "conformance" with this document. - - HISTORICAL NOTE: Several of the mechanisms described in - this document may seem somewhat strange or even baroque at - first reading. It is important to note that compatibility - with existing standards AND robustness across existing - practice were two of the highest priorities of the working - group that developed this document. In particular, - compatibility was always favored over elegance. - - 2 Notations, Conventions, and Generic BNF Grammar - - This document is being published in two versions, one as - plain ASCII text and one as PostScript. The latter is - recommended, though the textual contents are identical. An - Andrew-format copy of this document is also available from - the first author (Borenstein). - - Although the mechanisms specified in this document are all - described in prose, most are also described formally in the - modified BNF notation of RFC 822. Implementors will need to - be familiar with this notation in order to understand this - specification, and are referred to RFC 822 for a complete - explanation of the modified BNF notation. - - Some of the modified BNF in this document makes reference to - syntactic entities that are defined in RFC 822 and not in - this document. A complete formal grammar, then, is obtained - by combining the collected grammar appendix of this document - with that of RFC 822. - - The term CRLF, in this document, refers to the sequence of - the two ASCII characters CR (13) and LF (10) which, taken - together, in this order, denote a line break in RFC 822 - mail. - - The term "character set", wherever it is used in this - document, refers to a coded character set, in the sense of - ISO character set standardization work, and must not be - - - - Borenstein & Freed [Page 3] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - misinterpreted as meaning "a set of characters." - - The term "message", when not further qualified, means either - the (complete or "top-level") message being transferred on a - network, or a message encapsulated in a body of type - "message". - - The term "body part", in this document, means one of the - parts of the body of a multipart entity. A body part has a - header and a body, so it makes sense to speak about the body - of a body part. - - The term "entity", in this document, means either a message - or a body part. All kinds of entities share the property - that they have a header and a body. - - The term "body", when not further qualified, means the body - of an entity, that is the body of either a message or of a - body part. - - Note : the previous four definitions are clearly circular. - This is unavoidable, since the overal structure of a MIME - message is indeed recursive. - - In this document, all numeric and octet values are given in - decimal notation. - - It must be noted that Content-Type values, subtypes, and - parameter names as defined in this document are case- - insensitive. However, parameter values are case-sensitive - unless otherwise specified for the specific parameter. - - FORMATTING NOTE: This document has been carefully formatted - for ease of reading. The PostScript version of this - document, in particular, places notes like this one, which - may be skipped by the reader, in a smaller, italicized, - font, and indents it as well. In the text version, only the - indentation is preserved, so if you are reading the text - version of this you might consider using the PostScript - version instead. However, all such notes will be indented - and preceded by "NOTE:" or some similar introduction, even - in the text version. - - The primary purpose of these non-essential notes is to - convey information about the rationale of this document, or - to place this document in the proper historical or - evolutionary context. Such information may be skipped by - those who are focused entirely on building a compliant - implementation, but may be of use to those who wish to - understand why this document is written as it is. - - For ease of recognition, all BNF definitions have been - placed in a fixed-width font in the PostScript version of - this document. - - - - Borenstein & Freed [Page 4] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 3 The MIME-Version Header Field - - Since RFC 822 was published in 1982, there has really been - only one format standard for Internet messages, and there - has been little perceived need to declare the format - standard in use. This document is an independent document - that complements RFC 822. Although the extensions in this - document have been defined in such a way as to be compatible - with RFC 822, there are still circumstances in which it - might be desirable for a mail-processing agent to know - whether a message was composed with the new standard in - mind. - - Therefore, this document defines a new header field, "MIME- - Version", which is to be used to declare the version of the - Internet message body format standard in use. - - Messages composed in accordance with this document MUST - include such a header field, with the following verbatim - text: - - MIME-Version: 1.0 - - The presence of this header field is an assertion that the - message has been composed in compliance with this document. - - Since it is possible that a future document might extend the - message format standard again, a formal BNF is given for the - content of the MIME-Version field: - - MIME-Version := text - - Thus, future format specifiers, which might replace or - extend "1.0", are (minimally) constrained by the definition - of "text", which appears in RFC 822. - - Note that the MIME-Version header field is required at the - top level of a message. It is not required for each body - part of a multipart entity. It is required for the embedded - headers of a body of type "message" if and only if the - embedded message is itself claimed to be MIME-compliant. - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 5] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 4 The Content-Type Header Field - - The purpose of the Content-Type field is to describe the - data contained in the body fully enough that the receiving - user agent can pick an appropriate agent or mechanism to - present the data to the user, or otherwise deal with the - data in an appropriate manner. - - HISTORICAL NOTE: The Content-Type header field was first - defined in RFC 1049. RFC 1049 Content-types used a simpler - and less powerful syntax, but one that is largely compatible - with the mechanism given here. - - The Content-Type header field is used to specify the nature - of the data in the body of an entity, by giving type and - subtype identifiers, and by providing auxiliary information - that may be required for certain types. After the type and - subtype names, the remainder of the header field is simply a - set of parameters, specified in an attribute/value notation. - The set of meaningful parameters differs for the different - types. The ordering of parameters is not significant. - Among the defined parameters is a "charset" parameter by - which the character set used in the body may be declared. - Comments are allowed in accordance with RFC 822 rules for - structured header fields. - - In general, the top-level Content-Type is used to declare - the general type of data, while the subtype specifies a - specific format for that type of data. Thus, a Content-Type - of "image/xyz" is enough to tell a user agent that the data - is an image, even if the user agent has no knowledge of the - specific image format "xyz". Such information can be used, - for example, to decide whether or not to show a user the raw - data from an unrecognized subtype -- such an action might be - reasonable for unrecognized subtypes of text, but not for - unrecognized subtypes of image or audio. For this reason, - registered subtypes of audio, image, text, and video, should - not contain embedded information that is really of a - different type. Such compound types should be represented - using the "multipart" or "application" types. - - Parameters are modifiers of the content-subtype, and do not - fundamentally affect the requirements of the host system. - Although most parameters make sense only with certain - content-types, others are "global" in the sense that they - might apply to any subtype. For example, the "boundary" - parameter makes sense only for the "multipart" content-type, - but the "charset" parameter might make sense with several - content-types. - - An initial set of seven Content-Types is defined by this - document. This set of top-level names is intended to be - substantially complete. It is expected that additions to - the larger set of supported types can generally be - - - - Borenstein & Freed [Page 6] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - accomplished by the creation of new subtypes of these - initial types. In the future, more top-level types may be - defined only by an extension to this standard. If another - primary type is to be used for any reason, it must be given - a name starting with "X-" to indicate its non-standard - status and to avoid a potential conflict with a future - official name. - - In the Extended BNF notation of RFC 822, a Content-Type - header field value is defined as follows: - - Content-Type := type "/" subtype *[";" parameter] - - type := "application" / "audio" - / "image" / "message" - / "multipart" / "text" - / "video" / x-token - - x-token := <The two characters "X-" followed, with no - intervening white space, by any token> - - subtype := token - - parameter := attribute "=" value - - attribute := token - - value := token / quoted-string - - token := 1*<any CHAR except SPACE, CTLs, or tspecials> - - tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in - / "," / ";" / ":" / "\" / <"> ; quoted-string, - / "/" / "[" / "]" / "?" / "." ; to use within - / "=" ; parameter values - - Note that the definition of "tspecials" is the same as the - RFC 822 definition of "specials" with the addition of the - three characters "/", "?", and "=". - - Note also that a subtype specification is MANDATORY. There - are no default subtypes. - - The type, subtype, and parameter names are not case - sensitive. For example, TEXT, Text, and TeXt are all - equivalent. Parameter values are normally case sensitive, - but certain parameters are interpreted to be case- - insensitive, depending on the intended use. (For example, - multipart boundaries are case-sensitive, but the "access- - type" for message/External-body is not case-sensitive.) - - Beyond this syntax, the only constraint on the definition of - subtype names is the desire that their uses must not - conflict. That is, it would be undesirable to have two - - - - Borenstein & Freed [Page 7] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - different communities using "Content-Type: - application/foobar" to mean two different things. The - process of defining new content-subtypes, then, is not - intended to be a mechanism for imposing restrictions, but - simply a mechanism for publicizing the usages. There are, - therefore, two acceptable mechanisms for defining new - Content-Type subtypes: - - 1. Private values (starting with "X-") may be - defined bilaterally between two cooperating - agents without outside registration or - standardization. - - 2. New standard values must be documented, - registered with, and approved by IANA, as - described in Appendix F. Where intended for - public use, the formats they refer to must - also be defined by a published specification, - and possibly offered for standardization. - - The seven standard initial predefined Content-Types are - detailed in the bulk of this document. They are: - - text -- textual information. The primary subtype, - "plain", indicates plain (unformatted) text. No - special software is required to get the full - meaning of the text, aside from support for the - indicated character set. Subtypes are to be used - for enriched text in forms where application - software may enhance the appearance of the text, - but such software must not be required in order to - get the general idea of the content. Possible - subtypes thus include any readable word processor - format. A very simple and portable subtype, - richtext, is defined in this document. - multipart -- data consisting of multiple parts of - independent data types. Four initial subtypes - are defined, including the primary "mixed" - subtype, "alternative" for representing the same - data in multiple formats, "parallel" for parts - intended to be viewed simultaneously, and "digest" - for multipart entities in which each part is of - type "message". - message -- an encapsulated message. A body of - Content-Type "message" is itself a fully formatted - RFC 822 conformant message which may contain its - own different Content-Type header field. The - primary subtype is "rfc822". The "partial" - subtype is defined for partial messages, to permit - the fragmented transmission of bodies that are - thought to be too large to be passed through mail - transport facilities. Another subtype, - "External-body", is defined for specifying large - bodies by reference to an external data source. - - - - Borenstein & Freed [Page 8] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - image -- image data. Image requires a display device - (such as a graphical display, a printer, or a FAX - machine) to view the information. Initial - subtypes are defined for two widely-used image - formats, jpeg and gif. - audio -- audio data, with initial subtype "basic". - Audio requires an audio output device (such as a - speaker or a telephone) to "display" the contents. - video -- video data. Video requires the capability to - display moving images, typically including - specialized hardware and software. The initial - subtype is "mpeg". - application -- some other kind of data, typically - either uninterpreted binary data or information to - be processed by a mail-based application. The - primary subtype, "octet-stream", is to be used in - the case of uninterpreted binary data, in which - case the simplest recommended action is to offer - to write the information into a file for the user. - Two additional subtypes, "ODA" and "PostScript", - are defined for transporting ODA and PostScript - documents in bodies. Other expected uses for - "application" include spreadsheets, data for - mail-based scheduling systems, and languages for - "active" (computational) email. (Note that active - email entails several securityconsiderations, - which are discussed later in this memo, - particularly in the context of - application/PostScript.) - - Default RFC 822 messages are typed by this protocol as plain - text in the US-ASCII character set, which can be explicitly - specified as "Content-type: text/plain; charset=us-ascii". - If no Content-Type is specified, either by error or by an - older user agent, this default is assumed. In the presence - of a MIME-Version header field, a receiving User Agent can - also assume that plain US-ASCII text was the sender's - intent. In the absence of a MIME-Version specification, - plain US-ASCII text must still be assumed, but the sender's - intent might have been otherwise. - - RATIONALE: In the absence of any Content-Type header field - or MIME-Version header field, it is impossible to be certain - that a message is actually text in the US-ASCII character - set, since it might well be a message that, using the - conventions that predate this document, includes text in - another character set or non-textual data in a manner that - cannot be automatically recognized (e.g., a uuencoded - compressed UNIX tar file). Although there is no fully - acceptable alternative to treating such untyped messages as - "text/plain; charset=us-ascii", implementors should remain - aware that if a message lacks both the MIME-Version and the - Content-Type header fields, it may in practice contain - almost anything. - - - - Borenstein & Freed [Page 9] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - It should be noted that the list of Content-Type values - given here may be augmented in time, via the mechanisms - described above, and that the set of subtypes is expected to - grow substantially. - - When a mail reader encounters mail with an unknown Content- - type value, it should generally treat it as equivalent to - "application/octet-stream", as described later in this - document. - - 5 The Content-Transfer-Encoding Header Field - - Many Content-Types which could usefully be transported via - email are represented, in their "natural" format, as 8-bit - character or binary data. Such data cannot be transmitted - over some transport protocols. For example, RFC 821 - restricts mail messages to 7-bit US-ASCII data with 1000 - character lines. - - It is necessary, therefore, to define a standard mechanism - for re-encoding such data into a 7-bit short-line format. - This document specifies that such encodings will be - indicated by a new "Content-Transfer-Encoding" header field. - The Content-Transfer-Encoding field is used to indicate the - type of transformation that has been used in order to - represent the body in an acceptable manner for transport. - - Unlike Content-Types, a proliferation of Content-Transfer- - Encoding values is undesirable and unnecessary. However, - establishing only a single Content-Transfer-Encoding - mechanism does not seem possible. There is a tradeoff - between the desire for a compact and efficient encoding of - largely-binary data and the desire for a readable encoding - of data that is mostly, but not entirely, 7-bit data. For - this reason, at least two encoding mechanisms are necessary: - a "readable" encoding and a "dense" encoding. - - The Content-Transfer-Encoding field is designed to specify - an invertible mapping between the "native" representation of - a type of data and a representation that can be readily - exchanged using 7 bit mail transport protocols, such as - those defined by RFC 821 (SMTP). This field has not been - defined by any previous standard. The field's value is a - single token specifying the type of encoding, as enumerated - below. Formally: - - Content-Transfer-Encoding := "BASE64" / "QUOTED-PRINTABLE" / - "8BIT" / "7BIT" / - "BINARY" / x-token - - These values are not case sensitive. That is, Base64 and - BASE64 and bAsE64 are all equivalent. An encoding type of - 7BIT requires that the body is already in a seven-bit mail- - ready representation. This is the default value -- that is, - - - - Borenstein & Freed [Page 10] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - "Content-Transfer-Encoding: 7BIT" is assumed if the - Content-Transfer-Encoding header field is not present. - - The values "8bit", "7bit", and "binary" all imply that NO - encoding has been performed. However, they are potentially - useful as indications of the kind of data contained in the - object, and therefore of the kind of encoding that might - need to be performed for transmission in a given transport - system. "7bit" means that the data is all represented as - short lines of US-ASCII data. "8bit" means that the lines - are short, but there may be non-ASCII characters (octets - with the high-order bit set). "Binary" means that not only - may non-ASCII characters be present, but also that the lines - are not necessarily short enough for SMTP transport. - - The difference between "8bit" (or any other conceivable - bit-width token) and the "binary" token is that "binary" - does not require adherence to any limits on line length or - to the SMTP CRLF semantics, while the bit-width tokens do - require such adherence. If the body contains data in any - bit-width other than 7-bit, the appropriate bit-width - Content-Transfer-Encoding token must be used (e.g., "8bit" - for unencoded 8 bit wide data). If the body contains binary - data, the "binary" Content-Transfer-Encoding token must be - used. - - NOTE: The distinction between the Content-Transfer-Encoding - values of "binary," "8bit," etc. may seem unimportant, in - that all of them really mean "none" -- that is, there has - been no encoding of the data for transport. However, clear - labeling will be of enormous value to gateways between - future mail transport systems with differing capabilities in - transporting data that do not meet the restrictions of RFC - 821 transport. - - As of the publication of this document, there are no - standardized Internet transports for which it is legitimate - to include unencoded 8-bit or binary data in mail bodies. - Thus there are no circumstances in which the "8bit" or - "binary" Content-Transfer-Encoding is actually legal on the - Internet. However, in the event that 8-bit or binary mail - transport becomes a reality in Internet mail, or when this - document is used in conjunction with any other 8-bit or - binary-capable transport mechanism, 8-bit or binary bodies - should be labeled as such using this mechanism. - - NOTE: The five values defined for the Content-Transfer- - Encoding field imply nothing about the Content-Type other - than the algorithm by which it was encoded or the transport - system requirements if unencoded. - - Implementors may, if necessary, define new Content- - Transfer-Encoding values, but must use an x-token, which is - a name prefixed by "X-" to indicate its non-standard status, - - - - Borenstein & Freed [Page 11] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - e.g., "Content-Transfer-Encoding: x-my-new-encoding". - However, unlike Content-Types and subtypes, the creation of - new Content-Transfer-Encoding values is explicitly and - strongly discouraged, as it seems likely to hinder - interoperability with little potential benefit. Their use - is allowed only as the result of an agreement between - cooperating user agents. - - If a Content-Transfer-Encoding header field appears as part - of a message header, it applies to the entire body of that - message. If a Content-Transfer-Encoding header field - appears as part of a body part's headers, it applies only to - the body of that body part. If an entity is of type - "multipart" or "message", the Content-Transfer-Encoding is - not permitted to have any value other than a bit width - (e.g., "7bit", "8bit", etc.) or "binary". - - It should be noted that email is character-oriented, so that - the mechanisms described here are mechanisms for encoding - arbitrary byte streams, not bit streams. If a bit stream is - to be encoded via one of these mechanisms, it must first be - converted to an 8-bit byte stream using the network standard - bit order ("big-endian"), in which the earlier bits in a - stream become the higher-order bits in a byte. A bit stream - not ending at an 8-bit boundary must be padded with zeroes. - This document provides a mechanism for noting the addition - of such padding in the case of the application Content-Type, - which has a "padding" parameter. - - The encoding mechanisms defined here explicitly encode all - data in ASCII. Thus, for example, suppose an entity has - header fields such as: - - Content-Type: text/plain; charset=ISO-8859-1 - Content-transfer-encoding: base64 - - This should be interpreted to mean that the body is a base64 - ASCII encoding of data that was originally in ISO-8859-1, - and will be in that character set again after decoding. - - The following sections will define the two standard encoding - mechanisms. The definition of new content-transfer- - encodings is explicitly discouraged and should only occur - when absolutely necessary. All content-transfer-encoding - namespace except that beginning with "X-" is explicitly - reserved to the IANA for future use. Private agreements - about content-transfer-encodings are also explicitly - discouraged. - - Certain Content-Transfer-Encoding values may only be used on - certain Content-Types. In particular, it is expressly - forbidden to use any encodings other than "7bit", "8bit", or - "binary" with any Content-Type that recursively includes - other Content-Type fields, notably the "multipart" and - - - - Borenstein & Freed [Page 12] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - "message" Content-Types. All encodings that are desired for - bodies of type multipart or message must be done at the - innermost level, by encoding the actual body that needs to - be encoded. - - NOTE ON ENCODING RESTRICTIONS: Though the prohibition - against using content-transfer-encodings on data of type - multipart or message may seem overly restrictive, it is - necessary to prevent nested encodings, in which data are - passed through an encoding algorithm multiple times, and - must be decoded multiple times in order to be properly - viewed. Nested encodings add considerable complexity to - user agents: aside from the obvious efficiency problems - with such multiple encodings, they can obscure the basic - structure of a message. In particular, they can imply that - several decoding operations are necessary simply to find out - what types of objects a message contains. Banning nested - encodings may complicate the job of certain mail gateways, - but this seems less of a problem than the effect of nested - encodings on user agents. - - NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT- - TRANSFER-ENCODING: It may seem that the Content-Transfer- - Encoding could be inferred from the characteristics of the - Content-Type that is to be encoded, or, at the very least, - that certain Content-Transfer-Encodings could be mandated - for use with specific Content-Types. There are several - reasons why this is not the case. First, given the varying - types of transports used for mail, some encodings may be - appropriate for some Content-Type/transport combinations and - not for others. (For example, in an 8-bit transport, no - encoding would be required for text in certain character - sets, while such encodings are clearly required for 7-bit - SMTP.) Second, certain Content-Types may require different - types of transfer encoding under different circumstances. - For example, many PostScript bodies might consist entirely - of short lines of 7-bit data and hence require little or no - encoding. Other PostScript bodies (especially those using - Level 2 PostScript's binary encoding mechanism) may only be - reasonably represented using a binary transport encoding. - Finally, since Content-Type is intended to be an open-ended - specification mechanism, strict specification of an - association between Content-Types and encodings effectively - couples the specification of an application protocol with a - specific lower-level transport. This is not desirable since - the developers of a Content-Type should not have to be aware - of all the transports in use and what their limitations are. - - NOTE ON TRANSLATING ENCODINGS: The quoted-printable and - base64 encodings are designed so that conversion between - them is possible. The only issue that arises in such a - conversion is the handling of line breaks. When converting - from quoted-printable to base64 a line break must be - converted into a CRLF sequence. Similarly, a CRLF sequence - - - - Borenstein & Freed [Page 13] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - in base64 data should be converted to a quoted-printable - line break, but ONLY when converting text data. - - NOTE ON CANONICAL ENCODING MODEL: There was some - confusion, in earlier drafts of this memo, regarding the - model for when email data was to be converted to canonical - form and encoded, and in particular how this process would - affect the treatment of CRLFs, given that the representation - of newlines varies greatly from system to system. For this - reason, a canonical model for encoding is presented as - Appendix H. - - 5.1 Quoted-Printable Content-Transfer-Encoding - - The Quoted-Printable encoding is intended to represent data - that largely consists of octets that correspond to printable - characters in the ASCII character set. It encodes the data - in such a way that the resulting octets are unlikely to be - modified by mail transport. If the data being encoded are - mostly ASCII text, the encoded form of the data remains - largely recognizable by humans. A body which is entirely - ASCII may also be encoded in Quoted-Printable to ensure the - integrity of the data should the message pass through a - character-translating, and/or line-wrapping gateway. - - In this encoding, octets are to be represented as determined - by the following rules: - - Rule #1: (General 8-bit representation) Any octet, - except those indicating a line break according to the - newline convention of the canonical form of the data - being encoded, may be represented by an "=" followed by - a two digit hexadecimal representation of the octet's - value. The digits of the hexadecimal alphabet, for this - purpose, are "0123456789ABCDEF". Uppercase letters must - be - used when sending hexadecimal data, though a robust - implementation may choose to recognize lowercase - letters on receipt. Thus, for example, the value 12 - (ASCII form feed) can be represented by "=0C", and the - value 61 (ASCII EQUAL SIGN) can be represented by - "=3D". Except when the following rules allow an - alternative encoding, this rule is mandatory. - - Rule #2: (Literal representation) Octets with decimal - values of 33 through 60 inclusive, and 62 through 126, - inclusive, MAY be represented as the ASCII characters - which correspond to those octets (EXCLAMATION POINT - through LESS THAN, and GREATER THAN through TILDE, - respectively). - - Rule #3: (White Space): Octets with values of 9 and 32 - MAY be represented as ASCII TAB (HT) and SPACE - characters, respectively, but MUST NOT be so - - - - Borenstein & Freed [Page 14] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - represented at the end of an encoded line. Any TAB (HT) - or SPACE characters on an encoded line MUST thus be - followed on that line by a printable character. In - particular, an "=" at the end of an encoded line, - indicating a soft line break (see rule #5) may follow - one or more TAB (HT) or SPACE characters. It follows - that an octet with value 9 or 32 appearing at the end - of an encoded line must be represented according to - Rule #1. This rule is necessary because some MTAs - (Message Transport Agents, programs which transport - messages from one user to another, or perform a part of - such transfers) are known to pad lines of text with - SPACEs, and others are known to remove "white space" - characters from the end of a line. Therefore, when - decoding a Quoted-Printable body, any trailing white - space on a line must be deleted, as it will necessarily - have been added by intermediate transport agents. - - Rule #4 (Line Breaks): A line break in a text body - part, independent of what its representation is - following the canonical representation of the data - being encoded, must be represented by a (RFC 822) line - break, which is a CRLF sequence, in the Quoted- - Printable encoding. If isolated CRs and LFs, or LF CR - and CR LF sequences are allowed to appear in binary - data according to the canonical form, they must be - represented using the "=0D", "=0A", "=0A=0D" and - "=0D=0A" notations respectively. - - Note that many implementation may elect to encode the - local representation of various content types directly. - In particular, this may apply to plain text material on - systems that use newline conventions other than CRLF - delimiters. Such an implementation is permissible, but - the generation of line breaks must be generalized to - account for the case where alternate representations of - newline sequences are used. - - Rule #5 (Soft Line Breaks): The Quoted-Printable - encoding REQUIRES that encoded lines be no more than 76 - characters long. If longer lines are to be encoded with - the Quoted-Printable encoding, 'soft' line breaks must - be used. An equal sign as the last character on a - encoded line indicates such a non-significant ('soft') - line break in the encoded text. Thus if the "raw" form - of the line is a single unencoded line that says: - - Now's the time for all folk to come to the aid of - their country. - - This can be represented, in the Quoted-Printable - encoding, as - - - - - - Borenstein & Freed [Page 15] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Now's the time = - for all folk to come= - to the aid of their country. - - This provides a mechanism with which long lines are - encoded in such a way as to be restored by the user - agent. The 76 character limit does not count the - trailing CRLF, but counts all other characters, - including any equal signs. - - Since the hyphen character ("-") is represented as itself in - the Quoted-Printable encoding, care must be taken, when - encapsulating a quoted-printable encoded body in a multipart - entity, to ensure that the encapsulation boundary does not - appear anywhere in the encoded body. (A good strategy is to - choose a boundary that includes a character sequence such as - "=_" which can never appear in a quoted-printable body. See - the definition of multipart messages later in this - document.) - - NOTE: The quoted-printable encoding represents something of - a compromise between readability and reliability in - transport. Bodies encoded with the quoted-printable - encoding will work reliably over most mail gateways, but may - not work perfectly over a few gateways, notably those - involving translation into EBCDIC. (In theory, an EBCDIC - gateway could decode a quoted-printable body and re-encode - it using base64, but such gateways do not yet exist.) A - higher level of confidence is offered by the base64 - Content-Transfer-Encoding. A way to get reasonably reliable - transport through EBCDIC gateways is to also quote the ASCII - characters - - !"#$@[\]^`{|}~ - - according to rule #1. See Appendix B for more information. - - Because quoted-printable data is generally assumed to be - line-oriented, it is to be expected that the breaks between - the lines of quoted printable data may be altered in - transport, in the same manner that plain text mail has - always been altered in Internet mail when passing between - systems with differing newline conventions. If such - alterations are likely to constitute a corruption of the - data, it is probably more sensible to use the base64 - encoding rather than the quoted-printable encoding. - - - - - - - - - - - - Borenstein & Freed [Page 16] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 5.2 Base64 Content-Transfer-Encoding - - The Base64 Content-Transfer-Encoding is designed to - represent arbitrary sequences of octets in a form that is - not humanly readable. The encoding and decoding algorithms - are simple, but the encoded data are consistently only about - 33 percent larger than the unencoded data. This encoding is - based on the one used in Privacy Enhanced Mail applications, - as defined in RFC 1113. The base64 encoding is adapted - from RFC 1113, with one change: base64 eliminates the "*" - mechanism for embedded clear text. - - A 65-character subset of US-ASCII is used, enabling 6 bits - to be represented per printable character. (The extra 65th - character, "=", is used to signify a special processing - function.) - - NOTE: This subset has the important property that it is - represented identically in all versions of ISO 646, - including US ASCII, and all characters in the subset are - also represented identically in all versions of EBCDIC. - Other popular encodings, such as the encoding used by the - UUENCODE utility and the base85 encoding specified as part - of Level 2 PostScript, do not share these properties, and - thus do not fulfill the portability requirements a binary - transport encoding for mail must meet. - - The encoding process represents 24-bit groups of input bits - as output strings of 4 encoded characters. Proceeding from - left to right, a 24-bit input group is formed by - concatenating 3 8-bit input groups. These 24 bits are then - treated as 4 concatenated 6-bit groups, each of which is - translated into a single digit in the base64 alphabet. When - encoding a bit stream via the base64 encoding, the bit - stream must be presumed to be ordered with the most- - significant-bit first. That is, the first bit in the stream - will be the high-order bit in the first byte, and the eighth - bit will be the low-order bit in the first byte, and so on. - - Each 6-bit group is used as an index into an array of 64 - printable characters. The character referenced by the index - is placed in the output string. These characters, identified - in Table 1, below, are selected so as to be universally - representable, and the set excludes characters with - particular significance to SMTP (e.g., ".", "CR", "LF") and - to the encapsulation boundaries defined in this document - (e.g., "-"). - - - - - - - - - - - Borenstein & Freed [Page 17] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Table 1: The Base64 Alphabet - - Value Encoding Value Encoding Value Encoding Value - Encoding - 0 A 17 R 34 i 51 z - 1 B 18 S 35 j 52 0 - 2 C 19 T 36 k 53 1 - 3 D 20 U 37 l 54 2 - 4 E 21 V 38 m 55 3 - 5 F 22 W 39 n 56 4 - 6 G 23 X 40 o 57 5 - 7 H 24 Y 41 p 58 6 - 8 I 25 Z 42 q 59 7 - 9 J 26 a 43 r 60 8 - 10 K 27 b 44 s 61 9 - 11 L 28 c 45 t 62 + - 12 M 29 d 46 u 63 / - 13 N 30 e 47 v - 14 O 31 f 48 w (pad) = - 15 P 32 g 49 x - 16 Q 33 h 50 y - - The output stream (encoded bytes) must be represented in - lines of no more than 76 characters each. All line breaks - or other characters not found in Table 1 must be ignored by - decoding software. In base64 data, characters other than - those in Table 1, line breaks, and other white space - probably indicate a transmission error, about which a - warning message or even a message rejection might be - appropriate under some circumstances. - - Special processing is performed if fewer than 24 bits are - available at the end of the data being encoded. A full - encoding quantum is always completed at the end of a body. - When fewer than 24 input bits are available in an input - group, zero bits are added (on the right) to form an - integral number of 6-bit groups. Output character positions - which are not required to represent actual input data are - set to the character "=". Since all base64 input is an - integral number of octets, only the following cases can - arise: (1) the final quantum of encoding input is an - integral multiple of 24 bits; here, the final unit of - encoded output will be an integral multiple of 4 characters - with no "=" padding, (2) the final quantum of encoding input - is exactly 8 bits; here, the final unit of encoded output - will be two characters followed by two "=" padding - characters, or (3) the final quantum of encoding input is - exactly 16 bits; here, the final unit of encoded output will - be three characters followed by one "=" padding character. - - Care must be taken to use the proper octets for line breaks - if base64 encoding is applied directly to text material that - has not been converted to canonical form. In particular, - text line breaks should be converted into CRLF sequences - - - - Borenstein & Freed [Page 18] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - prior to base64 encoding. The important thing to note is - that this may be done directly by the encoder rather than in - a prior canonicalization step in some implementations. - - NOTE: There is no need to worry about quoting apparent - encapsulation boundaries within base64-encoded parts of - multipart entities because no hyphen characters are used in - the base64 encoding. - - 6 Additional Optional Content- Header Fields - - 6.1 Optional Content-ID Header Field - - In constructing a high-level user agent, it may be desirable - to allow one body to make reference to another. - Accordingly, bodies may be labeled using the "Content-ID" - header field, which is syntactically identical to the - "Message-ID" header field: - - Content-ID := msg-id - - Like the Message-ID values, Content-ID values must be - generated to be as unique as possible. - - 6.2 Optional Content-Description Header Field - - The ability to associate some descriptive information with a - given body is often desirable. For example, it may be useful - to mark an "image" body as "a picture of the Space Shuttle - Endeavor." Such text may be placed in the Content- - Description header field. - - Content-Description := *text - - The description is presumed to be given in the US-ASCII - character set, although the mechanism specified in [RFC- - 1342] may be used for non-US-ASCII Content-Description - values. - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 19] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7 The Predefined Content-Type Values - - This document defines seven initial Content-Type values and - an extension mechanism for private or experimental types. - Further standard types must be defined by new published - specifications. It is expected that most innovation in new - types of mail will take place as subtypes of the seven types - defined here. The most essential characteristics of the - seven content-types are summarized in Appendix G. - - 7.1 The Text Content-Type - - The text Content-Type is intended for sending material which - is principally textual in form. It is the default Content- - Type. A "charset" parameter may be used to indicate the - character set of the body text. The primary subtype of text - is "plain". This indicates plain (unformatted) text. The - default Content-Type for Internet mail is "text/plain; - charset=us-ascii". - - Beyond plain text, there are many formats for representing - what might be known as "extended text" -- text with embedded - formatting and presentation information. An interesting - characteristic of many such representations is that they are - to some extent readable even without the software that - interprets them. It is useful, then, to distinguish them, - at the highest level, from such unreadable data as images, - audio, or text represented in an unreadable form. In the - absence of appropriate interpretation software, it is - reasonable to show subtypes of text to the user, while it is - not reasonable to do so with most nontextual data. - - Such formatted textual data should be represented using - subtypes of text. Plausible subtypes of text are typically - given by the common name of the representation format, e.g., - "text/richtext". - - 7.1.1 The charset parameter - - A critical parameter that may be specified in the Content- - Type field for text data is the character set. This is - specified with a "charset" parameter, as in: - - Content-type: text/plain; charset=us-ascii - - Unlike some other parameter values, the values of the - charset parameter are NOT case sensitive. The default - character set, which must be assumed in the absence of a - charset parameter, is US-ASCII. - - An initial list of predefined character set names can be - found at the end of this section. Additional character sets - may be registered with IANA as described in Appendix F, - although the standardization of their use requires the usual - - - - Borenstein & Freed [Page 20] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - IAB review and approval. Note that if the specified - character set includes 8-bit data, a Content-Transfer- - Encoding header field and a corresponding encoding on the - data are required in order to transmit the body via some - mail transfer protocols, such as SMTP. - - The default character set, US-ASCII, has been the subject of - some confusion and ambiguity in the past. Not only were - there some ambiguities in the definition, there have been - wide variations in practice. In order to eliminate such - ambiguity and variations in the future, it is strongly - recommended that new user agents explicitly specify a - character set via the Content-Type header field. "US-ASCII" - does not indicate an arbitrary seven-bit character code, but - specifies that the body uses character coding that uses the - exact correspondence of codes to characters specified in - ASCII. National use variations of ISO 646 [ISO-646] are NOT - ASCII and their use in Internet mail is explicitly - discouraged. The omission of the ISO 646 character set is - deliberate in this regard. The character set name of "US- - ASCII" explicitly refers to ANSI X3.4-1986 [US-ASCII] only. - The character set name "ASCII" is reserved and must not be - used for any purpose. - - NOTE: RFC 821 explicitly specifies "ASCII", and references - an earlier version of the American Standard. Insofar as one - of the purposes of specifying a Content-Type and character - set is to permit the receiver to unambiguously determine how - the sender intended the coded message to be interpreted, - assuming anything other than "strict ASCII" as the default - would risk unintentional and incompatible changes to the - semantics of messages now being transmitted. This also - implies that messages containing characters coded according - to national variations on ISO 646, or using code-switching - procedures (e.g., those of ISO 2022), as well as 8-bit or - multiple octet character encodings MUST use an appropriate - character set specification to be consistent with this - specification. - - The complete US-ASCII character set is listed in [US-ASCII]. - Note that the control characters including DEL (0-31, 127) - have no defined meaning apart from the combination CRLF - (ASCII values 13 and 10) indicating a new line. Two of the - characters have de facto meanings in wide use: FF (12) often - means "start subsequent text on the beginning of a new - page"; and TAB or HT (9) often (though not always) means - "move the cursor to the next available column after the - current position where the column number is a multiple of 8 - (counting the first column as column 0)." Apart from this, - any use of the control characters or DEL in a body must be - part of a private agreement between the sender and - recipient. Such private agreements are discouraged and - should be replaced by the other capabilities of this - document. - - - - Borenstein & Freed [Page 21] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - NOTE: Beyond US-ASCII, an enormous proliferation of - character sets is possible. It is the opinion of the IETF - working group that a large number of character sets is NOT a - good thing. We would prefer to specify a single character - set that can be used universally for representing all of the - world's languages in electronic mail. Unfortunately, - existing practice in several communities seems to point to - the continued use of multiple character sets in the near - future. For this reason, we define names for a small number - of character sets for which a strong constituent base - exists. It is our hope that ISO 10646 or some other - effort will eventually define a single world character set - which can then be specified for use in Internet mail, but in - the advance of that definition we cannot specify the use of - ISO 10646, Unicode, or any other character set whose - definition is, as of this writing, incomplete. - - The defined charset values are: - - US-ASCII -- as defined in [US-ASCII]. - - ISO-8859-X -- where "X" is to be replaced, as - necessary, for the parts of ISO-8859 [ISO- - 8859]. Note that the ISO 646 character sets - have deliberately been omitted in favor of - their 8859 replacements, which are the - designated character sets for Internet mail. - As of the publication of this document, the - legitimate values for "X" are the digits 1 - through 9. - - Note that the character set used, if anything other than - US-ASCII, must always be explicitly specified in the - Content-Type field. - - No other character set name may be used in Internet mail - without the publication of a formal specification and its - registration with IANA as described in Appendix F, or by - private agreement, in which case the character set name must - begin with "X-". - - Implementors are discouraged from defining new character - sets for mail use unless absolutely necessary. - - The "charset" parameter has been defined primarily for the - purpose of textual data, and is described in this section - for that reason. However, it is conceivable that non- - textual data might also wish to specify a charset value for - some purpose, in which case the same syntax and values - should be used. - - In general, mail-sending software should always use the - "lowest common denominator" character set possible. For - example, if a body contains only US-ASCII characters, it - - - - Borenstein & Freed [Page 22] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - should be marked as being in the US-ASCII character set, not - ISO-8859-1, which, like all the ISO-8859 family of character - sets, is a superset of US-ASCII. More generally, if a - widely-used character set is a subset of another character - set, and a body contains only characters in the widely-used - subset, it should be labeled as being in that subset. This - will increase the chances that the recipient will be able to - view the mail correctly. - - 7.1.2 The Text/plain subtype - - The primary subtype of text is "plain". This indicates - plain (unformatted) text. The default Content-Type for - Internet mail, "text/plain; charset=us-ascii", describes - existing Internet practice, that is, it is the type of body - defined by RFC 822. - - 7.1.3 The Text/richtext subtype - - In order to promote the wider interoperability of simple - formatted text, this document defines an extremely simple - subtype of "text", the "richtext" subtype. This subtype was - designed to meet the following criteria: - - 1. The syntax must be extremely simple to parse, - so that even teletype-oriented mail systems can - easily strip away the formatting information and - leave only the readable text. - - 2. The syntax must be extensible to allow for new - formatting commands that are deemed essential. - - 3. The capabilities must be extremely limited, to - ensure that it can represent no more than is - likely to be representable by the user's primary - word processor. While this limits what can be - sent, it increases the likelihood that what is - sent can be properly displayed. - - 4. The syntax must be compatible with SGML, so - that, with an appropriate DTD (Document Type - Definition, the standard mechanism for defining a - document type using SGML), a general SGML parser - could be made to parse richtext. However, despite - this compatibility, the syntax should be far - simpler than full SGML, so that no SGML knowledge - is required in order to implement it. - - The syntax of "richtext" is very simple. It is assumed, at - the top-level, to be in the US-ASCII character set, unless - of course a different charset parameter was specified in the - Content-type field. All characters represent themselves, - with the exception of the "<" character (ASCII 60), which is - used to mark the beginning of a formatting command. - - - - Borenstein & Freed [Page 23] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Formatting instructions consist of formatting commands - surrounded by angle brackets ("<>", ASCII 60 and 62). Each - formatting command may be no more than 40 characters in - length, all in US-ASCII, restricted to the alphanumeric and - hyphen ("-") characters. Formatting commands may be preceded - by a forward slash or solidus ("/", ASCII 47), making them - negations, and such negations must always exist to balance - the initial opening commands, except as noted below. Thus, - if the formatting command "<bold>" appears at some point, - there must later be a "</bold>" to balance it. There are - only three exceptions to this "balancing" rule: First, the - command "<lt>" is used to represent a literal "<" character. - Second, the command "<nl>" is used to represent a required - line break. (Otherwise, CRLFs in the data are treated as - equivalent to a single SPACE character.) Finally, the - command "<np>" is used to represent a page break. (NOTE: - The 40 character limit on formatting commands does not - include the "<", ">", or "/" characters that might be - attached to such commands.) - - Initially defined formatting commands, not all of which will - be implemented by all richtext implementations, include: - - Bold -- causes the subsequent text to be in a bold - font. - Italic -- causes the subsequent text to be in an italic - font. - Fixed -- causes the subsequent text to be in a fixed - width font. - Smaller -- causes the subsequent text to be in a - smaller font. - Bigger -- causes the subsequent text to be in a bigger - font. - Underline -- causes the subsequent text to be - underlined. - Center -- causes the subsequent text to be centered. - FlushLeft -- causes the subsequent text to be left - justified. - FlushRight -- causes the subsequent text to be right - justified. - Indent -- causes the subsequent text to be indented at - the left margin. - IndentRight -- causes the subsequent text to be - indented at the right margin. - Outdent -- causes the subsequent text to be outdented - at the left margin. - OutdentRight -- causes the subsequent text to be - outdented at the right margin. - SamePage -- causes the subsequent text to be grouped, - if possible, on one page. - Subscript -- causes the subsequent text to be - interpreted as a subscript. - - - - - - Borenstein & Freed [Page 24] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Superscript -- causes the subsequent text to be - interpreted as a superscript. - Heading -- causes the subsequent text to be interpreted - as a page heading. - Footing -- causes the subsequent text to be interpreted - as a page footing. - ISO-8859-X (for any value of X that is legal as a - "charset" parameter) -- causes the subsequent text - to be interpreted as text in the appropriate - character set. - US-ASCII -- causes the subsequent text to be - interpreted as text in the US-ASCII character set. - Excerpt -- causes the subsequent text to be interpreted - as a textual excerpt from another source. - Typically this will be displayed using indentation - and an alternate font, but such decisions are up - to the viewer. - Paragraph -- causes the subsequent text to be - interpreted as a single paragraph, with - appropriate paragraph breaks (typically blank - space) before and after. - Signature -- causes the subsequent text to be - interpreted as a "signature". Some systems may - wish to display signatures in a smaller font or - otherwise set them apart from the main text of the - message. - Comment -- causes the subsequent text to be interpreted - as a comment, and hence not shown to the reader. - No-op -- has no effect on the subsequent text. - lt -- <lt> is replaced by a literal "<" character. No - balancing </lt> is allowed. - nl -- <nl> causes a line break. No balancing </nl> is - allowed. - np -- <np> causes a page break. No balancing </np> is - allowed. - - Each positive formatting command affects all subsequent text - until the matching negative formatting command. Such pairs - of formatting commands must be properly balanced and nested. - Thus, a proper way to describe text in bold italics is: - - <bold><italic>the-text</italic></bold> - - or, alternately, - - <italic><bold>the-text</bold></italic> - - but, in particular, the following is illegal - richtext: - - <bold><italic>the-text</bold></italic> - - NOTE: The nesting requirement for formatting commands - imposes a slightly higher burden upon the composers of - - - - Borenstein & Freed [Page 25] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - richtext bodies, but potentially simplifies richtext - displayers by allowing them to be stack-based. The main - goal of richtext is to be simple enough to make multifont, - formatted email widely readable, so that those with the - capability of sending it will be able to do so with - confidence. Thus slightly increased complexity in the - composing software was deemed a reasonable tradeoff for - simplified reading software. Nonetheless, implementors of - richtext readers are encouraged to follow the general - Internet guidelines of being conservative in what you send - and liberal in what you accept. Those implementations that - can do so are encouraged to deal reasonably with improperly - nested richtext. - - Implementations must regard any unrecognized formatting - command as equivalent to "No-op", thus facilitating future - extensions to "richtext". Private extensions may be defined - using formatting commands that begin with "X-", by analogy - to Internet mail header field names. - - It is worth noting that no special behavior is required for - the TAB (HT) character. It is recommended, however, that, at - least when fixed-width fonts are in use, the common - semantics of the TAB (HT) character should be observed, - namely that it moves to the next column position that is a - multiple of 8. (In other words, if a TAB (HT) occurs in - column n, where the leftmost column is column 0, then that - TAB (HT) should be replaced by 8-(n mod 8) SPACE - characters.) - - Richtext also differentiates between "hard" and "soft" line - breaks. A line break (CRLF) in the richtext data stream is - interpreted as a "soft" line break, one that is included - only for purposes of mail transport, and is to be treated as - white space by richtext interpreters. To include a "hard" - line break (one that must be displayed as such), the "<nl>" - or "<paragraph> formatting constructs should be used. In - general, a soft line break should be treated as white space, - but when soft line breaks immediately follow a <nl> or a - </paragraph> tag they should be ignored rather than treated - as white space. - - Putting all this together, the following "text/richtext" - body fragment: - - <bold>Now</bold> is the time for - <italic>all</italic> good men - <smaller>(and <lt>women>)</smaller> to - <ignoreme></ignoreme> come - - to the aid of their - <nl> - - - - - - Borenstein & Freed [Page 26] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - beloved <nl><nl>country. <comment> Stupid - quote! </comment> -- the end - - represents the following formatted text (which will, no - doubt, look cryptic in the text-only version of this - document): - - Now is the time for all good men (and <women>) to - come to the aid of their - beloved - - country. -- the end - - Richtext conformance: A minimal richtext implementation is - one that simply converts "<lt>" to "<", converts CRLFs to - SPACE, converts <nl> to a newline according to local newline - convention, removes everything between a <comment> command - and the next balancing </comment> command, and removes all - other formatting commands (all text enclosed in angle - brackets). - - NOTE ON THE RELATIONSHIP OF RICHTEXT TO SGML: Richtext is - decidedly not SGML, and must not be used to transport - arbitrary SGML documents. Those who wish to use SGML - document types as a mail transport format must define a new - text or application subtype, e.g., "text/sgml-dtd-whatever" - or "application/sgml-dtd-whatever", depending on the - perceived readability of the DTD in use. Richtext is - designed to be compatible with SGML, and specifically so - that it will be possible to define a richtext DTD if one is - needed. However, this does not imply that arbitrary SGML - can be called richtext, nor that richtext implementors have - any need to understand SGML; the description in this - document is a complete definition of richtext, which is far - simpler than complete SGML. - - NOTE ON THE INTENDED USE OF RICHTEXT: It is recognized that - implementors of future mail systems will want rich text - functionality far beyond that currently defined for - richtext. The intent of richtext is to provide a common - format for expressing that functionality in a form in which - much of it, at least, will be understood by interoperating - software. Thus, in particular, software with a richer - notion of formatted text than richtext can still use - richtext as its basic representation, but can extend it with - new formatting commands and by hiding information specific - to that software system in richtext comments. As such - systems evolve, it is expected that the definition of - richtext will be further refined by future published - specifications, but richtext as defined here provides a - platform on which evolutionary refinements can be based. - - IMPLEMENTATION NOTE: In some environments, it might be - impossible to combine certain richtext formatting commands, - - - - Borenstein & Freed [Page 27] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - whereas in others they might be combined easily. For - example, the combination of <bold> and <italic> might - produce bold italics on systems that support such fonts, but - there exist systems that can make text bold or italicized, - but not both. In such cases, the most recently issued - recognized formatting command should be preferred. - - One of the major goals in the design of richtext was to make - it so simple that even text-only mailers will implement - richtext-to-plain-text translators, thus increasing the - likelihood that multifont text will become "safe" to use - very widely. To demonstrate this simplicity, an extremely - simple 35-line C program that converts richtext input into - plain text output is included in Appendix D. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 28] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7.2 The Multipart Content-Type - - In the case of multiple part messages, in which one or more - different sets of data are combined in a single body, a - "multipart" Content-Type field must appear in the entity's - header. The body must then contain one or more "body parts," - each preceded by an encapsulation boundary, and the last one - followed by a closing boundary. Each part starts with an - encapsulation boundary, and then contains a body part - consisting of header area, a blank line, and a body area. - Thus a body part is similar to an RFC 822 message in syntax, - but different in meaning. - - A body part is NOT to be interpreted as actually being an - RFC 822 message. To begin with, NO header fields are - actually required in body parts. A body part that starts - with a blank line, therefore, is allowed and is a body part - for which all default values are to be assumed. In such a - case, the absence of a Content-Type header field implies - that the encapsulation is plain US-ASCII text. The only - header fields that have defined meaning for body parts are - those the names of which begin with "Content-". All other - header fields are generally to be ignored in body parts. - Although they should generally be retained in mail - processing, they may be discarded by gateways if necessary. - Such other fields are permitted to appear in body parts but - should not be depended on. "X-" fields may be created for - experimental or private purposes, with the recognition that - the information they contain may be lost at some gateways. - - The distinction between an RFC 822 message and a body part - is subtle, but important. A gateway between Internet and - X.400 mail, for example, must be able to tell the difference - between a body part that contains an image and a body part - that contains an encapsulated message, the body of which is - an image. In order to represent the latter, the body part - must have "Content-Type: message", and its body (after the - blank line) must be the encapsulated message, with its own - "Content-Type: image" header field. The use of similar - syntax facilitates the conversion of messages to body parts, - and vice versa, but the distinction between the two must be - understood by implementors. (For the special case in which - all parts actually are messages, a "digest" subtype is also - defined.) - - As stated previously, each body part is preceded by an - encapsulation boundary. The encapsulation boundary MUST NOT - appear inside any of the encapsulated parts. Thus, it is - crucial that the composing agent be able to choose and - specify the unique boundary that will separate the parts. - - All present and future subtypes of the "multipart" type must - use an identical syntax. Subtypes may differ in their - semantics, and may impose additional restrictions on syntax, - - - - Borenstein & Freed [Page 29] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - but must conform to the required syntax for the multipart - type. This requirement ensures that all conformant user - agents will at least be able to recognize and separate the - parts of any multipart entity, even of an unrecognized - subtype. - - As stated in the definition of the Content-Transfer-Encoding - field, no encoding other than "7bit", "8bit", or "binary" is - permitted for entities of type "multipart". The multipart - delimiters and header fields are always 7-bit ASCII in any - case, and data within the body parts can be encoded on a - part-by-part basis, with Content-Transfer-Encoding fields - for each appropriate body part. - - Mail gateways, relays, and other mail handling agents are - commonly known to alter the top-level header of an RFC 822 - message. In particular, they frequently add, remove, or - reorder header fields. Such alterations are explicitly - forbidden for the body part headers embedded in the bodies - of messages of type "multipart." - - 7.2.1 Multipart: The common syntax - - All subtypes of "multipart" share a common syntax, defined - in this section. A simple example of a multipart message - also appears in this section. An example of a more complex - multipart message is given in Appendix C. - - The Content-Type field for multipart entities requires one - parameter, "boundary", which is used to specify the - encapsulation boundary. The encapsulation boundary is - defined as a line consisting entirely of two hyphen - characters ("-", decimal code 45) followed by the boundary - parameter value from the Content-Type header field. - - NOTE: The hyphens are for rough compatibility with the - earlier RFC 934 method of message encapsulation, and for - ease of searching for the boundaries in some - implementations. However, it should be noted that multipart - messages are NOT completely compatible with RFC 934 - encapsulations; in particular, they do not obey RFC 934 - quoting conventions for embedded lines that begin with - hyphens. This mechanism was chosen over the RFC 934 - mechanism because the latter causes lines to grow with each - level of quoting. The combination of this growth with the - fact that SMTP implementations sometimes wrap long lines - made the RFC 934 mechanism unsuitable for use in the event - that deeply-nested multipart structuring is ever desired. - - Thus, a typical multipart Content-Type header field might - look like this: - - Content-Type: multipart/mixed; - - - - - Borenstein & Freed [Page 30] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - boundary=gc0p4Jq0M2Yt08jU534c0p - - This indicates that the entity consists of several parts, - each itself with a structure that is syntactically identical - to an RFC 822 message, except that the header area might be - completely empty, and that the parts are each preceded by - the line - - --gc0p4Jq0M2Yt08jU534c0p - - Note that the encapsulation boundary must occur at the - beginning of a line, i.e., following a CRLF, and that that - initial CRLF is considered to be part of the encapsulation - boundary rather than part of the preceding part. The - boundary must be followed immediately either by another CRLF - and the header fields for the next part, or by two CRLFs, in - which case there are no header fields for the next part (and - it is therefore assumed to be of Content-Type text/plain). - - NOTE: The CRLF preceding the encapsulation line is - considered part of the boundary so that it is possible to - have a part that does not end with a CRLF (line break). - Body parts that must be considered to end with line breaks, - therefore, should have two CRLFs preceding the encapsulation - line, the first of which is part of the preceding body part, - and the second of which is part of the encapsulation - boundary. - - The requirement that the encapsulation boundary begins with - a CRLF implies that the body of a multipart entity must - itself begin with a CRLF before the first encapsulation line - -- that is, if the "preamble" area is not used, the entity - headers must be followed by TWO CRLFs. This is indeed how - such entities should be composed. A tolerant mail reading - program, however, may interpret a body of type multipart - that begins with an encapsulation line NOT initiated by a - CRLF as also being an encapsulation boundary, but a - compliant mail sending program must not generate such - entities. - - Encapsulation boundaries must not appear within the - encapsulations, and must be no longer than 70 characters, - not counting the two leading hyphens. - - The encapsulation boundary following the last body part is a - distinguished delimiter that indicates that no further body - parts will follow. Such a delimiter is identical to the - previous delimiters, with the addition of two more hyphens - at the end of the line: - - --gc0p4Jq0M2Yt08jU534c0p-- - - There appears to be room for additional information prior to - the first encapsulation boundary and following the final - - - - Borenstein & Freed [Page 31] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - boundary. These areas should generally be left blank, and - implementations should ignore anything that appears before - the first boundary or after the last one. - - NOTE: These "preamble" and "epilogue" areas are not used - because of the lack of proper typing of these parts and the - lack of clear semantics for handling these areas at - gateways, particularly X.400 gateways. - - NOTE: Because encapsulation boundaries must not appear in - the body parts being encapsulated, a user agent must - exercise care to choose a unique boundary. The boundary in - the example above could have been the result of an algorithm - designed to produce boundaries with a very low probability - of already existing in the data to be encapsulated without - having to prescan the data. Alternate algorithms might - result in more 'readable' boundaries for a recipient with an - old user agent, but would require more attention to the - possibility that the boundary might appear in the - encapsulated part. The simplest boundary possible is - something like "---", with a closing boundary of "-----". - - As a very simple example, the following multipart message - has two parts, both of them plain text, one of them - explicitly typed and one of them implicitly typed: - - From: Nathaniel Borenstein <nsb@bellcore.com> - To: Ned Freed <ned@innosoft.com> - Subject: Sample message - MIME-Version: 1.0 - Content-type: multipart/mixed; boundary="simple - boundary" - - This is the preamble. It is to be ignored, though it - is a handy place for mail composers to include an - explanatory note to non-MIME compliant readers. - --simple boundary - - This is implicitly typed plain ASCII text. - It does NOT end with a linebreak. - --simple boundary - Content-type: text/plain; charset=us-ascii - - This is explicitly typed plain ASCII text. - It DOES end with a linebreak. - - --simple boundary-- - This is the epilogue. It is also to be ignored. - - The use of a Content-Type of multipart in a body part within - another multipart entity is explicitly allowed. In such - cases, for obvious reasons, care must be taken to ensure - that each nested multipart entity must use a different - boundary delimiter. See Appendix C for an example of nested - - - - Borenstein & Freed [Page 32] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - multipart entities. - - The use of the multipart Content-Type with only a single - body part may be useful in certain contexts, and is - explicitly permitted. - - The only mandatory parameter for the multipart Content-Type - is the boundary parameter, which consists of 1 to 70 - characters from a set of characters known to be very robust - through email gateways, and NOT ending with white space. - (If a boundary appears to end with white space, the white - space must be presumed to have been added by a gateway, and - should be deleted.) It is formally specified by the - following BNF: - - boundary := 0*69<bchars> bcharsnospace - - bchars := bcharsnospace / " " - - bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / - "_" - / "," / "-" / "." / "/" / ":" / "=" / "?" - - Overall, the body of a multipart entity may be specified as - follows: - - multipart-body := preamble 1*encapsulation - close-delimiter epilogue - - encapsulation := delimiter CRLF body-part - - delimiter := CRLF "--" boundary ; taken from Content-Type - field. - ; when content-type is - multipart - ; There must be no space - ; between "--" and boundary. - - close-delimiter := delimiter "--" ; Again, no space before - "--" - - preamble := *text ; to be ignored upon - receipt. - - epilogue := *text ; to be ignored upon - receipt. - - body-part = <"message" as defined in RFC 822, - with all header fields optional, and with the - specified delimiter not occurring anywhere in - the message body, either on a line by itself - or as a substring anywhere. Note that the - - - - - - Borenstein & Freed [Page 33] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - semantics of a part differ from the semantics - of a message, as described in the text.> - - NOTE: Conspicuously missing from the multipart type is a - notion of structured, related body parts. In general, it - seems premature to try to standardize interpart structure - yet. It is recommended that those wishing to provide a more - structured or integrated multipart messaging facility should - define a subtype of multipart that is syntactically - identical, but that always expects the inclusion of a - distinguished part that can be used to specify the structure - and integration of the other parts, probably referring to - them by their Content-ID field. If this approach is used, - other implementations will not recognize the new subtype, - but will treat it as the primary subtype (multipart/mixed) - and will thus be able to show the user the parts that are - recognized. - - 7.2.2 The Multipart/mixed (primary) subtype - - The primary subtype for multipart, "mixed", is intended for - use when the body parts are independent and intended to be - displayed serially. Any multipart subtypes that an - implementation does not recognize should be treated as being - of subtype "mixed". - - 7.2.3 The Multipart/alternative subtype - - The multipart/alternative type is syntactically identical to - multipart/mixed, but the semantics are different. In - particular, each of the parts is an "alternative" version of - the same information. User agents should recognize that the - content of the various parts are interchangeable. The user - agent should either choose the "best" type based on the - user's environment and preferences, or offer the user the - available alternatives. In general, choosing the best type - means displaying only the LAST part that can be displayed. - This may be used, for example, to send mail in a fancy text - format in such a way that it can easily be displayed - anywhere: - - From: Nathaniel Borenstein <nsb@bellcore.com> - To: Ned Freed <ned@innosoft.com> - Subject: Formatted text mail - MIME-Version: 1.0 - Content-Type: multipart/alternative; boundary=boundary42 - - - --boundary42 - Content-Type: text/plain; charset=us-ascii - - ...plain text version of message goes here.... - - - - - - Borenstein & Freed [Page 34] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - --boundary42 - Content-Type: text/richtext - - .... richtext version of same message goes here ... - --boundary42 - Content-Type: text/x-whatever - - .... fanciest formatted version of same message goes here - ... - --boundary42-- - - In this example, users whose mail system understood the - "text/x-whatever" format would see only the fancy version, - while other users would see only the richtext or plain text - version, depending on the capabilities of their system. - - In general, user agents that compose multipart/alternative - entities should place the body parts in increasing order of - preference, that is, with the preferred format last. For - fancy text, the sending user agent should put the plainest - format first and the richest format last. Receiving user - agents should pick and display the last format they are - capable of displaying. In the case where one of the - alternatives is itself of type "multipart" and contains - unrecognized sub-parts, the user agent may choose either to - show that alternative, an earlier alternative, or both. - - NOTE: From an implementor's perspective, it might seem more - sensible to reverse this ordering, and have the plainest - alternative last. However, placing the plainest alternative - first is the friendliest possible option when - mutlipart/alternative entities are viewed using a non-MIME- - compliant mail reader. While this approach does impose some - burden on compliant mail readers, interoperability with - older mail readers was deemed to be more important in this - case. - - It may be the case that some user agents, if they can - recognize more than one of the formats, will prefer to offer - the user the choice of which format to view. This makes - sense, for example, if mail includes both a nicely-formatted - image version and an easily-edited text version. What is - most critical, however, is that the user not automatically - be shown multiple versions of the same data. Either the - user should be shown the last recognized version or should - explicitly be given the choice. - - - - - - - - - - - - Borenstein & Freed [Page 35] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7.2.4 The Multipart/digest subtype - - This document defines a "digest" subtype of the multipart - Content-Type. This type is syntactically identical to - multipart/mixed, but the semantics are different. In - particular, in a digest, the default Content-Type value for - a body part is changed from "text/plain" to - "message/rfc822". This is done to allow a more readable - digest format that is largely compatible (except for the - quoting convention) with RFC 934. - - A digest in this format might, then, look something like - this: - - From: Moderator-Address - MIME-Version: 1.0 - Subject: Internet Digest, volume 42 - Content-Type: multipart/digest; - boundary="---- next message ----" - - - ------ next message ---- - - From: someone-else - Subject: my opinion - - ...body goes here ... - - ------ next message ---- - - From: someone-else-again - Subject: my different opinion - - ... another body goes here... - - ------ next message ------ - - 7.2.5 The Multipart/parallel subtype - - This document defines a "parallel" subtype of the multipart - Content-Type. This type is syntactically identical to - multipart/mixed, but the semantics are different. In - particular, in a parallel entity, all of the parts are - intended to be presented in parallel, i.e., simultaneously, - on hardware and software that are capable of doing so. - Composing agents should be aware that many mail readers will - lack this capability and will show the parts serially in any - event. - - - - - - - - - - Borenstein & Freed [Page 36] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7.3 The Message Content-Type - - It is frequently desirable, in sending mail, to encapsulate - another mail message. For this common operation, a special - Content-Type, "message", is defined. The primary subtype, - message/rfc822, has no required parameters in the Content- - Type field. Additional subtypes, "partial" and "External- - body", do have required parameters. These subtypes are - explained below. - - NOTE: It has been suggested that subtypes of message might - be defined for forwarded or rejected messages. However, - forwarded and rejected messages can be handled as multipart - messages in which the first part contains any control or - descriptive information, and a second part, of type - message/rfc822, is the forwarded or rejected message. - Composing rejection and forwarding messages in this manner - will preserve the type information on the original message - and allow it to be correctly presented to the recipient, and - hence is strongly encouraged. - - As stated in the definition of the Content-Transfer-Encoding - field, no encoding other than "7bit", "8bit", or "binary" is - permitted for messages or parts of type "message". The - message header fields are always US-ASCII in any case, and - data within the body can still be encoded, in which case the - Content-Transfer-Encoding header field in the encapsulated - message will reflect this. Non-ASCII text in the headers of - an encapsulated message can be specified using the - mechanisms described in [RFC-1342]. - - Mail gateways, relays, and other mail handling agents are - commonly known to alter the top-level header of an RFC 822 - message. In particular, they frequently add, remove, or - reorder header fields. Such alterations are explicitly - forbidden for the encapsulated headers embedded in the - bodies of messages of type "message." - - 7.3.1 The Message/rfc822 (primary) subtype - - A Content-Type of "message/rfc822" indicates that the body - contains an encapsulated message, with the syntax of an RFC - 822 message. - - 7.3.2 The Message/Partial subtype - - A subtype of message, "partial", is defined in order to - allow large objects to be delivered as several separate - pieces of mail and automatically reassembled by the - receiving user agent. (The concept is similar to IP - fragmentation/reassembly in the basic Internet Protocols.) - This mechanism can be used when intermediate transport - agents limit the size of individual messages that can be - sent. Content-Type "message/partial" thus indicates that - - - - Borenstein & Freed [Page 37] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - the body contains a fragment of a larger message. - - Three parameters must be specified in the Content-Type field - of type message/partial: The first, "id", is a unique - identifier, as close to a world-unique identifier as - possible, to be used to match the parts together. (In - general, the identifier is essentially a message-id; if - placed in double quotes, it can be any message-id, in - accordance with the BNF for "parameter" given earlier in - this specification.) The second, "number", an integer, is - the part number, which indicates where this part fits into - the sequence of fragments. The third, "total", another - integer, is the total number of parts. This third subfield - is required on the final part, and is optional on the - earlier parts. Note also that these parameters may be given - in any order. - - Thus, part 2 of a 3-part message may have either of the - following header fields: - - Content-Type: Message/Partial; - number=2; total=3; - id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; - - Content-Type: Message/Partial; - id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; - number=2 - - But part 3 MUST specify the total number of parts: - - Content-Type: Message/Partial; - number=3; total=3; - id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; - - Note that part numbering begins with 1, not 0. - - When the parts of a message broken up in this manner are put - together, the result is a complete RFC 822 format message, - which may have its own Content-Type header field, and thus - may contain any other data type. - - Message fragmentation and reassembly: The semantics of a - reassembled partial message must be those of the "inner" - message, rather than of a message containing the inner - message. This makes it possible, for example, to send a - large audio message as several partial messages, and still - have it appear to the recipient as a simple audio message - rather than as an encapsulated message containing an audio - message. That is, the encapsulation of the message is - considered to be "transparent". - - When generating and reassembling the parts of a - message/partial message, the headers of the encapsulated - message must be merged with the headers of the enclosing - - - - Borenstein & Freed [Page 38] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - entities. In this process the following rules must be - observed: - - (1) All of the headers from the initial enclosing - entity (part one), except those that start with - "Content-" and "Message-ID", must be copied, in - order, to the new message. - - (2) Only those headers in the enclosed message - which start with "Content-" and "Message-ID" must - be appended, in order, to the headers of the new - message. Any headers in the enclosed message - which do not start with "Content-" (except for - "Message-ID") will be ignored. - - (3) All of the headers from the second and any - subsequent messages will be ignored. - - For example, if an audio message is broken into two parts, - the first part might look something like this: - - X-Weird-Header-1: Foo - From: Bill@host.com - To: joe@otherhost.com - Subject: Audio mail - Message-ID: id1@host.com - MIME-Version: 1.0 - Content-type: message/partial; - id="ABC@host.com"; - number=1; total=2 - - X-Weird-Header-1: Bar - X-Weird-Header-2: Hello - Message-ID: anotherid@foo.com - Content-type: audio/basic - Content-transfer-encoding: base64 - - ... first half of encoded audio data goes here... - - and the second half might look something like this: - - From: Bill@host.com - To: joe@otherhost.com - Subject: Audio mail - MIME-Version: 1.0 - Message-ID: id2@host.com - Content-type: message/partial; - id="ABC@host.com"; number=2; total=2 - - ... second half of encoded audio data goes here... - - Then, when the fragmented message is reassembled, the - resulting message to be displayed to the user should look - something like this: - - - - Borenstein & Freed [Page 39] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - X-Weird-Header-1: Foo - From: Bill@host.com - To: joe@otherhost.com - Subject: Audio mail - Message-ID: anotherid@foo.com - MIME-Version: 1.0 - Content-type: audio/basic - Content-transfer-encoding: base64 - - ... first half of encoded audio data goes here... - ... second half of encoded audio data goes here... - - It should be noted that, because some message transfer - agents may choose to automatically fragment large messages, - and because such agents may use different fragmentation - thresholds, it is possible that the pieces of a partial - message, upon reassembly, may prove themselves to comprise a - partial message. This is explicitly permitted. - - It should also be noted that the inclusion of a "References" - field in the headers of the second and subsequent pieces of - a fragmented message that references the Message-Id on the - previous piece may be of benefit to mail readers that - understand and track references. However, the generation of - such "References" fields is entirely optional. - - 7.3.3 The Message/External-Body subtype - - The external-body subtype indicates that the actual body - data are not included, but merely referenced. In this case, - the parameters describe a mechanism for accessing the - external data. - - When a message body or body part is of type - "message/external-body", it consists of a header, two - consecutive CRLFs, and the message header for the - encapsulated message. If another pair of consecutive CRLFs - appears, this of course ends the message header for the - encapsulated message. However, since the encapsulated - message's body is itself external, it does NOT appear in the - area that follows. For example, consider the following - message: - - Content-type: message/external-body; access- - type=local-file; - name=/u/nsb/Me.gif - - Content-type: image/gif - - THIS IS NOT REALLY THE BODY! - - The area at the end, which might be called the "phantom - body", is ignored for most external-body messages. However, - it may be used to contain auxilliary information for some - - - - Borenstein & Freed [Page 40] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - such messages, as indeed it is when the access-type is - "mail-server". Of the access-types defined by this - document, the phantom body is used only when the access-type - is "mail-server". In all other cases, the phantom body is - ignored. - - The only always-mandatory parameter for message/external- - body is "access-type"; all of the other parameters may be - mandatory or optional depending on the value of access-type. - - ACCESS-TYPE -- One or more case-insensitive words, - comma-separated, indicating supported access - mechanisms by which the file or data may be - obtained. Values include, but are not limited to, - "FTP", "ANON-FTP", "TFTP", "AFS", "LOCAL-FILE", - and "MAIL-SERVER". Future values, except for - experimental values beginning with "X-", must be - registered with IANA, as described in Appendix F . - - In addition, the following two parameters are optional for - ALL access-types: - - EXPIRATION -- The date (in the RFC 822 "date-time" - syntax, as extended by RFC 1123 to permit 4 digits - in the date field) after which the existence of - the external data is not guaranteed. - - SIZE -- The size (in octets) of the data. The - intent of this parameter is to help the recipient - decide whether or not to expend the necessary - resources to retrieve the external data. - - PERMISSION -- A field that indicates whether or - not it is expected that clients might also attempt - to overwrite the data. By default, or if - permission is "read", the assumption is that they - are not, and that if the data is retrieved once, - it is never needed again. If PERMISSION is "read- - write", this assumption is invalid, and any local - copy must be considered no more than a cache. - "Read" and "Read-write" are the only defined - values of permission. - - The precise semantics of the access-types defined here are - described in the sections that follow. - - 7.3.3.1 The "ftp" and "tftp" access-types - - An access-type of FTP or TFTP indicates that the message - body is accessible as a file using the FTP [RFC-959] or TFTP - [RFC-783] protocols, respectively. For these access-types, - the following additional parameters are mandatory: - - - - - - Borenstein & Freed [Page 41] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - NAME -- The name of the file that contains the - actual body data. - - SITE -- A machine from which the file may be - obtained, using the given protocol - - Before the data is retrieved, using these protocols, the - user will generally need to be asked to provide a login id - and a password for the machine named by the site parameter. - - In addition, the following optional parameters may also - appear when the access-type is FTP or ANON-FTP: - - DIRECTORY -- A directory from which the data named - by NAME should be retrieved. - - MODE -- A transfer mode for retrieving the - information, e.g. "image". - - 7.3.3.2 The "anon-ftp" access-type - - The "anon-ftp" access-type is identical to the "ftp" access - type, except that the user need not be asked to provide a - name and password for the specified site. Instead, the ftp - protocol will be used with login "anonymous" and a password - that corresponds to the user's email address. - - 7.3.3.3 The "local-file" and "afs" access-types - - An access-type of "local-file" indicates that the actual - body is accessible as a file on the local machine. An - access-type of "afs" indicates that the file is accessible - via the global AFS file system. In both cases, only a - single parameter is required: - - NAME -- The name of the file that contains the - actual body data. - - The following optional parameter may be used to describe the - locality of reference for the data, that is, the site or - sites at which the file is expected to be visible: - - SITE -- A domain specifier for a machine or set of - machines that are known to have access to the data - file. Asterisks may be used for wildcard matching - to a part of a domain name, such as - "*.bellcore.com", to indicate a set of machines on - which the data should be directly visible, while a - single asterisk may be used to indicate a file - that is expected to be universally available, - e.g., via a global file system. - - 7.3.3.4 The "mail-server" access-type - - - - - Borenstein & Freed [Page 42] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - The "mail-server" access-type indicates that the actual body - is available from a mail server. The mandatory parameter - for this access-type is: - - SERVER -- The email address of the mail server - from which the actual body data can be obtained. - - Because mail servers accept a variety of syntax, some of - which is multiline, the full command to be sent to a mail - server is not included as a parameter on the content-type - line. Instead, it may be provided as the "phantom body" - when the content-type is message/external-body and the - access-type is mail-server. - - Note that MIME does not define a mail server syntax. - Rather, it allows the inclusion of arbitrary mail server - commands in the phantom body. Implementations should - include the phantom body in the body of the message it sends - to the mail server address to retrieve the relevant data. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 43] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7.3.3.5 Examples and Further Explanations - - With the emerging possibility of very wide-area file - systems, it becomes very hard to know in advance the set of - machines where a file will and will not be accessible - directly from the file system. Therefore it may make sense - to provide both a file name, to be tried directly, and the - name of one or more sites from which the file is known to be - accessible. An implementation can try to retrieve remote - files using FTP or any other protocol, using anonymous file - retrieval or prompting the user for the necessary name and - password. If an external body is accessible via multiple - mechanisms, the sender may include multiple parts of type - message/external-body within an entity of type - multipart/alternative. - - However, the external-body mechanism is not intended to be - limited to file retrieval, as shown by the mail-server - access-type. Beyond this, one can imagine, for example, - using a video server for external references to video clips. - - If an entity is of type "message/external-body", then the - body of the entity will contain the header fields of the - encapsulated message. The body itself is to be found in the - external location. This means that if the body of the - "message/external-body" message contains two consecutive - CRLFs, everything after those pairs is NOT part of the - message itself. For most message/external-body messages, - this trailing area must simply be ignored. However, it is a - convenient place for additional data that cannot be included - in the content-type header field. In particular, if the - "access-type" value is "mail-server", then the trailing area - must contain commands to be sent to the mail server at the - address given by NAME@SITE, where NAME and SITE are the - values of the NAME and SITE parameters, respectively. - - The embedded message header fields which appear in the body - of the message/external-body data can be used to declare the - Content-type of the external body. Thus a complete - message/external-body message, referring to a document in - PostScript format, might look like this: - - From: Whomever - Subject: whatever - MIME-Version: 1.0 - Message-ID: id1@host.com - Content-Type: multipart/alternative; boundary=42 - - - --42 - Content-Type: message/external-body; - name="BodyFormats.ps"; - - - - - - Borenstein & Freed [Page 44] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - site="thumper.bellcore.com"; - access-type=ANON-FTP; - directory="pub"; - mode="image"; - expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" - - Content-type: application/postscript - - --42 - Content-Type: message/external-body; - name="/u/nsb/writing/rfcs/RFC-XXXX.ps"; - site="thumper.bellcore.com"; - access-type=AFS - expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" - - Content-type: application/postscript - - --42 - Content-Type: message/external-body; - access-type=mail-server - server="listserv@bogus.bitnet"; - expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" - - Content-type: application/postscript - - get rfc-xxxx doc - - --42-- - - Like the message/partial type, the message/external-body - type is intended to be transparent, that is, to convey the - data type in the external body rather than to convey a - message with a body of that type. Thus the headers on the - outer and inner parts must be merged using the same rules as - for message/partial. In particular, this means that the - Content-type header is overridden, but the From and Subject - headers are preserved. - - Note that since the external bodies are not transported as - mail, they need not conform to the 7-bit and line length - requirements, but might in fact be binary files. Thus a - Content-Transfer-Encoding is not generally necessary, though - it is permitted. - - Note that the body of a message of type "message/external- - body" is governed by the basic syntax for an RFC 822 - message. In particular, anything before the first - consecutive pair of CRLFs is header information, while - anything after it is body information, which is ignored for - most access-types. - - - - - - - - Borenstein & Freed [Page 45] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7.4 The Application Content-Type - - The "application" Content-Type is to be used for data which - do not fit in any of the other categories, and particularly - for data to be processed by mail-based uses of application - programs. This is information which must be processed by an - application before it is viewable or usable to a user. - Expected uses for Content-Type application include mail- - based file transfer, spreadsheets, data for mail-based - scheduling systems, and languages for "active" - (computational) email. (The latter, in particular, can pose - security problems which should be understood by - implementors, and are considered in detail in the discussion - of the application/PostScript content-type.) - - For example, a meeting scheduler might define a standard - representation for information about proposed meeting dates. - An intelligent user agent would use this information to - conduct a dialog with the user, and might then send further - mail based on that dialog. More generally, there have been - several "active" messaging languages developed in which - programs in a suitably specialized language are sent through - the mail and automatically run in the recipient's - environment. - - Such applications may be defined as subtypes of the - "application" Content-Type. This document defines three - subtypes: octet-stream, ODA, and PostScript. - - In general, the subtype of application will often be the - name of the application for which the data are intended. - This does not mean, however, that any application program - name may be used freely as a subtype of application. Such - usages must be registered with IANA, as described in - Appendix F. - - 7.4.1 The Application/Octet-Stream (primary) subtype - - The primary subtype of application, "octet-stream", may be - used to indicate that a body contains binary data. The set - of possible parameters includes, but is not limited to: - - NAME -- a suggested name for the binary data if - stored as a file. - - TYPE -- the general type or category of binary - data. This is intended as information for the - human recipient rather than for any automatic - processing. - - CONVERSIONS -- the set of operations that have - been performed on the data before putting it in - the mail (and before any Content-Transfer-Encoding - that might have been applied). If multiple - - - - Borenstein & Freed [Page 46] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - conversions have occurred, they must be separated - by commas and specified in the order they were - applied -- that is, the leftmost conversion must - have occurred first, and conversions are undone - from right to left. Note that NO conversion - values are defined by this document. Any - conversion values that that do not begin with "X-" - must be preceded by a published specification and - by registration with IANA, as described in - Appendix F. - - PADDING -- the number of bits of padding that were - appended to the bitstream comprising the actual - contents to produce the enclosed byte-oriented - data. This is useful for enclosing a bitstream in - a body when the total number of bits is not a - multiple of the byte size. - - The values for these attributes are left undefined at - present, but may require specification in the future. An - example of a common (though UNIX-specific) usage might be: - - Content-Type: application/octet-stream; - name=foo.tar.Z; type=tar; - conversions="x-encrypt,x-compress" - - However, it should be noted that the use of such conversions - is explicitly discouraged due to a lack of portability and - standardization. The use of uuencode is particularly - discouraged, in favor of the Content-Transfer-Encoding - mechanism, which is both more standardized and more portable - across mail boundaries. - - The recommended action for an implementation that receives - application/octet-stream mail is to simply offer to put the - data in a file, with any Content-Transfer-Encoding undone, - or perhaps to use it as input to a user-specified process. - - To reduce the danger of transmitting rogue programs through - the mail, it is strongly recommended that implementations - NOT implement a path-search mechanism whereby an arbitrary - program named in the Content-Type parameter (e.g., an - "interpreter=" parameter) is found and executed using the - mail body as input. - - 7.4.2 The Application/PostScript subtype - - A Content-Type of "application/postscript" indicates a - PostScript program. The language is defined in - [POSTSCRIPT]. It is recommended that Postscript as sent - through email should use Postscript document structuring - conventions if at all possible, and correctly. - - - - - - Borenstein & Freed [Page 47] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - The execution of general-purpose PostScript interpreters - entails serious security risks, and implementors are - discouraged from simply sending PostScript email bodies to - "off-the-shelf" interpreters. While it is usually safe to - send PostScript to a printer, where the potential for harm - is greatly constrained, implementors should consider all of - the following before they add interactive display of - PostScript bodies to their mail readers. - - The remainder of this section outlines some, though probably - not all, of the possible problems with sending PostScript - through the mail. - - Dangerous operations in the PostScript language include, but - may not be limited to, the PostScript operators deletefile, - renamefile, filenameforall, and file. File is only - dangerous when applied to something other than standard - input or output. Implementations may also define additional - nonstandard file operators; these may also pose a threat to - security. Filenameforall, the wildcard file search - operator, may appear at first glance to be harmless. Note, - however, that this operator has the potential to reveal - information about what files the recipient has access to, - and this information may itself be sensitive. Message - senders should avoid the use of potentially dangerous file - operators, since these operators are quite likely to be - unavailable in secure PostScript implementations. Message- - receiving and -displaying software should either completely - disable all potentially dangerous file operators or take - special care not to delegate any special authority to their - operation. These operators should be viewed as being done by - an outside agency when interpreting PostScript documents. - Such disabling and/or checking should be done completely - outside of the reach of the PostScript language itself; care - should be taken to insure that no method exists for - reenabling full-function versions of these operators. - - The PostScript language provides facilities for exiting the - normal interpreter, or server, loop. Changes made in this - "outer" environment are customarily retained across - documents, and may in some cases be retained semipermanently - in nonvolatile memory. The operators associated with exiting - the interpreter loop have the potential to interfere with - subsequent document processing. As such, their unrestrained - use constitutes a threat of service denial. PostScript - operators that exit the interpreter loop include, but may - not be limited to, the exitserver and startjob operators. - Message-sending software should not generate PostScript that - depends on exiting the interpreter loop to operate. The - ability to exit will probably be unavailable in secure - PostScript implementations. Message-receiving and - -displaying software should, if possible, disable the - ability to make retained changes to the PostScript - environment. Eliminate the startjob and exitserver commands. - - - - Borenstein & Freed [Page 48] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - If these commands cannot be eliminated, at least set the - password associated with them to a hard-to-guess value. - - PostScript provides operators for setting system-wide and - device-specific parameters. These parameter settings may be - retained across jobs and may potentially pose a threat to - the correct operation of the interpreter. The PostScript - operators that set system and device parameters include, but - may not be limited to, the setsystemparams and setdevparams - operators. Message-sending software should not generate - PostScript that depends on the setting of system or device - parameters to operate correctly. The ability to set these - parameters will probably be unavailable in secure PostScript - implementations. Message-receiving and -displaying software - should, if possible, disable the ability to change system - and device parameters. If these operators cannot be - disabled, at least set the password associated with them to - a hard-to-guess value. - - Some PostScript implementations provide nonstandard - facilities for the direct loading and execution of machine - code. Such facilities are quite obviously open to - substantial abuse. Message-sending software should not - make use of such features. Besides being totally hardware- - specific, they are also likely to be unavailable in secure - implementations of PostScript. Message-receiving and - -displaying software should not allow such operators to be - used if they exist. - - PostScript is an extensible language, and many, if not most, - implementations of it provide a number of their own - extensions. This document does not deal with such extensions - explicitly since they constitute an unknown factor. - Message-sending software should not make use of nonstandard - extensions; they are likely to be missing from some - implementations. Message-receiving and -displaying software - should make sure that any nonstandard PostScript operators - are secure and don't present any kind of threat. - - It is possible to write PostScript that consumes huge - amounts of various system resources. It is also possible to - write PostScript programs that loop infinitely. Both types - of programs have the potential to cause damage if sent to - unsuspecting recipients. Message-sending software should - avoid the construction and dissemination of such programs, - which is antisocial. Message-receiving and -displaying - software should provide appropriate mechanisms to abort - processing of a document after a reasonable amount of time - has elapsed. In addition, PostScript interpreters should be - limited to the consumption of only a reasonable amount of - any given system resource. - - Finally, bugs may exist in some PostScript interpreters - which could possibly be exploited to gain unauthorized - - - - Borenstein & Freed [Page 49] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - access to a recipient's system. Apart from noting this - possibility, there is no specific action to take to prevent - this, apart from the timely correction of such bugs if any - are found. - - 7.4.3 The Application/ODA subtype - - The "ODA" subtype of application is used to indicate that a - body contains information encoded according to the Office - Document Architecture [ODA] standards, using the ODIF - representation format. For application/oda, the Content- - Type line should also specify an attribute/value pair that - indicates the document application profile (DAP), using the - key word "profile". Thus an appropriate header field might - look like this: - - Content-Type: application/oda; profile=Q112 - - Consult the ODA standard [ODA] for further information. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 50] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 7.5 The Image Content-Type - - A Content-Type of "image" indicates that the bodycontains an - image. The subtype names the specific image format. These - names are case insensitive. Two initial subtypes are "jpeg" - for the JPEG format, JFIF encoding, and "gif" for GIF format - [GIF]. - - The list of image subtypes given here is neither exclusive - nor exhaustive, and is expected to grow as more types are - registered with IANA, as described in Appendix F. - - 7.6 The Audio Content-Type - - A Content-Type of "audio" indicates that the body contains - audio data. Although there is not yet a consensus on an - "ideal" audio format for use with computers, there is a - pressing need for a format capable of providing - interoperable behavior. - - The initial subtype of "basic" is specified to meet this - requirement by providing an absolutely minimal lowest common - denominator audio format. It is expected that richer - formats for higher quality and/or lower bandwidth audio will - be defined by a later document. - - The content of the "audio/basic" subtype is audio encoded - using 8-bit ISDN u-law [PCM]. When this subtype is present, - a sample rate of 8000 Hz and a single channel is assumed. - - 7.7 The Video Content-Type - - A Content-Type of "video" indicates that the body contains a - time-varying-picture image, possibly with color and - coordinated sound. The term "video" is used extremely - generically, rather than with reference to any particular - technology or format, and is not meant to preclude subtypes - such as animated drawings encoded compactly. The subtype - "mpeg" refers to video coded according to the MPEG standard - [MPEG]. - - Note that although in general this document strongly - discourages the mixing of multiple media in a single body, - it is recognized that many so-called "video" formats include - a representation for synchronized audio, and this is - explicitly permitted for subtypes of "video". - - 7.8 Experimental Content-Type Values - - A Content-Type value beginning with the characters "X-" is a - private value, to be used by consenting mail systems by - mutual agreement. Any format without a rigorous and public - definition must be named with an "X-" prefix, and publicly - specified values shall never begin with "X-". (Older - - - - Borenstein & Freed [Page 51] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - versions of the widely-used Andrew system use the "X-BE2" - name, so new systems should probably choose a different - name.) - - In general, the use of "X-" top-level types is strongly - discouraged. Implementors should invent subtypes of the - existing types whenever possible. The invention of new - types is intended to be restricted primarily to the - development of new media types for email, such as digital - odors or holography, and not for new data formats in - general. In many cases, a subtype of application will be - more appropriate than a new top-level type. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 52] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Summary - - Using the MIME-Version, Content-Type, and Content-Transfer- - Encoding header fields, it is possible to include, in a - standardized way, arbitrary types of data objects with RFC - 822 conformant mail messages. No restrictions imposed by - either RFC 821 or RFC 822 are violated, and care has been - taken to avoid problems caused by additional restrictions - imposed by the characteristics of some Internet mail - transport mechanisms (see Appendix B). The "multipart" and - "message" Content-Types allow mixing and hierarchical - structuring of objects of different types in a single - message. Further Content-Types provide a standardized - mechanism for tagging messages or body parts as audio, - image, or several other kinds of data. A distinguished - parameter syntax allows further specification of data format - details, particularly the specification of alternate - character sets. Additional optional header fields provide - mechanisms for certain extensions deemed desirable by many - implementors. Finally, a number of useful Content-Types are - defined for general use by consenting user agents, notably - text/richtext, message/partial, and message/external-body. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 53] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Acknowledgements - - This document is the result of the collective effort of a - large number of people, at several IETF meetings, on the - IETF-SMTP and IETF-822 mailing lists, and elsewhere. - Although any enumeration seems doomed to suffer from - egregious omissions, the following are among the many - contributors to this effort: - - Harald Tveit Alvestrand Timo Lehtinen - Randall Atkinson John R. MacMillan - Philippe Brandon Rick McGowan - Kevin Carosso Leo Mclaughlin - Uhhyung Choi Goli Montaser-Kohsari - Cristian Constantinof Keith Moore - Mark Crispin Tom Moore - Dave Crocker Erik Naggum - Terry Crowley Mark Needleman - Walt Daniels John Noerenberg - Frank Dawson Mats Ohrman - Hitoshi Doi Julian Onions - Kevin Donnelly Michael Patton - Keith Edwards David J. Pepper - Chris Eich Blake C. Ramsdell - Johnny Eriksson Luc Rooijakkers - Craig Everhart Marshall T. Rose - Patrik Faeltstroem Jonathan Rosenberg - Erik E. Fair Jan Rynning - Roger Fajman Harri Salminen - Alain Fontaine Michael Sanderson - James M. Galvin Masahiro Sekiguchi - Philip Gladstone Mark Sherman - Thomas Gordon Keld Simonsen - Phill Gross Bob Smart - James Hamilton Peter Speck - Steve Hardcastle-Kille Henry Spencer - David Herron Einar Stefferud - Bruce Howard Michael Stein - Bill Janssen Klaus Steinberger - Olle Jaernefors Peter Svanberg - Risto Kankkunen James Thompson - Phil Karn Steve Uhler - Alan Katz Stuart Vance - Tim Kehres Erik van der Poel - Neil Katin Guido van Rossum - Kyuho Kim Peter Vanderbilt - Anders Klemets Greg Vaudreuil - John Klensin Ed Vielmetti - Valdis Kletniek Ryan Waldron - Jim Knowles Wally Wedel - Stev Knowles Sven-Ove Westberg - Bob Kummerfeld Brian Wideen - - - - - - Borenstein & Freed [Page 54] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Pekka Kytolaakso John Wobus - Stellan Lagerstr.m Glenn Wright - Vincent Lau Rayan Zachariassen - Donald Lindsay David Zimmerman - The authors apologize for any omissions from this list, - which are certainly unintentional. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 55] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix A -- Minimal MIME-Conformance - - The mechanisms described in this document are open-ended. - It is definitely not expected that all implementations will - support all of the Content-Types described, nor that they - will all share the same extensions. In order to promote - interoperability, however, it is useful to define the - concept of "MIME-conformance" to define a certain level of - implementation that allows the useful interworking of - messages with content that differs from US ASCII text. In - this section, we specify the requirements for such - conformance. - - A mail user agent that is MIME-conformant MUST: - - 1. Always generate a "MIME-Version: 1.0" header - field. - - 2. Recognize the Content-Transfer-Encoding header - field, and decode all received data encoded with - either the quoted-printable or base64 - implementations. Encode any data sent that is - not in seven-bit mail-ready representation using - one of these transformations and include the - appropriate Content-Transfer-Encoding header - field, unless the underlying transport mechanism - supports non-seven-bit data, as SMTP does not. - - 3. Recognize and interpret the Content-Type - header field, and avoid showing users raw data - with a Content-Type field other than text. Be - able to send at least text/plain messages, with - the character set specified as a parameter if it - is not US-ASCII. - - 4. Explicitly handle the following Content-Type - values, to at least the following extents: - - Text: - -- Recognize and display "text" mail - with the character set "US-ASCII." - -- Recognize other character sets at - least to the extent of being able - to inform the user about what - character set the message uses. - -- Recognize the "ISO-8859-*" character - sets to the extent of being able to - display those characters that are - common to ISO-8859-* and US-ASCII, - namely all characters represented - by octet values 0-127. - -- For unrecognized subtypes, show or - offer to show the user the "raw" - version of the data. An ability at - - - - Borenstein & Freed [Page 56] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - least to convert "text/richtext" to - plain text, as shown in Appendix D, - is encouraged, but not required for - conformance. - Message: - --Recognize and display at least the - primary (822) encapsulation. - Multipart: - -- Recognize the primary (mixed) - subtype. Display all relevant - information on the message level - and the body part header level and - then display or offer to display - each of the body parts - individually. - -- Recognize the "alternative" subtype, - and avoid showing the user - redundant parts of - multipart/alternative mail. - -- Treat any unrecognized subtypes as if - they were "mixed". - Application: - -- Offer the ability to remove either of - the two types of Content-Transfer- - Encoding defined in this document - and put the resulting information - in a user file. - - 5. Upon encountering any unrecognized Content- - Type, an implementation must treat it as if it had - a Content-Type of "application/octet-stream" with - no parameter sub-arguments. How such data are - handled is up to an implementation, but likely - options for handling such unrecognized data - include offering the user to write it into a file - (decoded from its mail transport format) or - offering the user to name a program to which the - decoded data should be passed as input. - Unrecognized predefined types, which in a MIME- - conformant mailer might still include audio, - image, or video, should also be treated in this - way. - - A user agent that meets the above conditions is said to be - MIME-conformant. The meaning of this phrase is that it is - assumed to be "safe" to send virtually any kind of - properly-marked data to users of such mail systems, because - such systems will at least be able to treat the data as - undifferentiated binary, and will not simply splash it onto - the screen of unsuspecting users. There is another sense - in which it is always "safe" to send data in a format that - is MIME-conformant, which is that such data will not break - or be broken by any known systems that are conformant with - RFC 821 and RFC 822. User agents that are MIME-conformant - - - - Borenstein & Freed [Page 57] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - have the additional guarantee that the user will not be - shown data that were never intended to be viewed as text. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 58] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix B -- General Guidelines For Sending Email Data - - Internet email is not a perfect, homogeneous system. Mail - may become corrupted at several stages in its travel to a - final destination. Specifically, email sent throughout the - Internet may travel across many networking technologies. - Many networking and mail technologies do not support the - full functionality possible in the SMTP transport - environment. Mail traversing these systems is likely to be - modified in such a way that it can be transported. - - There exist many widely-deployed non-conformant MTAs in the - Internet. These MTAs, speaking the SMTP protocol, alter - messages on the fly to take advantage of the internal data - structure of the hosts they are implemented on, or are just - plain broken. - - The following guidelines may be useful to anyone devising a - data format (Content-Type) that will survive the widest - range of networking technologies and known broken MTAs - unscathed. Note that anything encoded in the base64 - encoding will satisfy these rules, but that some well-known - mechanisms, notably the UNIX uuencode facility, will not. - Note also that anything encoded in the Quoted-Printable - encoding will survive most gateways intact, but possibly not - some gateways to systems that use the EBCDIC character set. - - (1) Under some circumstances the encoding used for - data may change as part of normal gateway or user - agent operation. In particular, conversion from - base64 to quoted-printable and vice versa may be - necessary. This may result in the confusion of - CRLF sequences with line breaks in text body - parts. As such, the persistence of CRLF as - something other than a line break should not be - relied on. - - (2) Many systems may elect to represent and store - text data using local newline conventions. Local - newline conventions may not match the RFC822 CRLF - convention -- systems are known that use plain CR, - plain LF, CRLF, or counted records. The result is - that isolated CR and LF characters are not well - tolerated in general; they may be lost or - converted to delimiters on some systems, and hence - should not be relied on. - - (3) TAB (HT) characters may be misinterpreted or - may be automatically converted to variable numbers - of spaces. This is unavoidable in some - environments, notably those not based on the ASCII - character set. Such conversion is STRONGLY - DISCOURAGED, but it may occur, and mail formats - should not rely on the persistence of TAB (HT) - - - - Borenstein & Freed [Page 59] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - characters. - - (4) Lines longer than 76 characters may be wrapped - or truncated in some environments. Line wrapping - and line truncation are STRONGLY DISCOURAGED, but - unavoidable in some cases. Applications which - require long lines should somehow differentiate - between soft and hard line breaks. (A simple way - to do this is to use the quoted-printable - encoding.) - - (5) Trailing "white space" characters (SPACE, TAB - (HT)) on a line may be discarded by some transport - agents, while other transport agents may pad lines - with these characters so that all lines in a mail - file are of equal length. The persistence of - trailing white space, therefore, should not be - relied on. - - (6) Many mail domains use variations on the ASCII - character set, or use character sets such as - EBCDIC which contain most but not all of the US- - ASCII characters. The correct translation of - characters not in the "invariant" set cannot be - depended on across character converting gateways. - For example, this situation is a problem when - sending uuencoded information across BITNET, an - EBCDIC system. Similar problems can occur without - crossing a gateway, since many Internet hosts use - character sets other than ASCII internally. The - definition of Printable Strings in X.400 adds - further restrictions in certain special cases. In - particular, the only characters that are known to - be consistent across all gateways are the 73 - characters that correspond to the upper and lower - case letters A-Z and a-z, the 10 digits 0-9, and - the following eleven special characters: - - "'" (ASCII code 39) - "(" (ASCII code 40) - ")" (ASCII code 41) - "+" (ASCII code 43) - "," (ASCII code 44) - "-" (ASCII code 45) - "." (ASCII code 46) - "/" (ASCII code 47) - ":" (ASCII code 58) - "=" (ASCII code 61) - "?" (ASCII code 63) - - A maximally portable mail representation, such as - the base64 encoding, will confine itself to - relatively short lines of text in which the only - meaningful characters are taken from this set of - - - - Borenstein & Freed [Page 60] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - 73 characters. - - Please note that the above list is NOT a list of recommended - practices for MTAs. RFC 821 MTAs are prohibited from - altering the character of white space or wrapping long - lines. These BAD and illegal practices are known to occur - on established networks, and implementions should be robust - in dealing with the bad effects they can cause. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 61] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix C -- A Complex Multipart Example - - What follows is the outline of a complex multipart message. - This message has five parts to be displayed serially: two - introductory plain text parts, an embedded multipart - message, a richtext part, and a closing encapsulated text - message in a non-ASCII character set. The embedded - multipart message has two parts to be displayed in parallel, - a picture and an audio fragment. - - MIME-Version: 1.0 - From: Nathaniel Borenstein <nsb@bellcore.com> - Subject: A multipart example - Content-Type: multipart/mixed; - boundary=unique-boundary-1 - - This is the preamble area of a multipart message. - Mail readers that understand multipart format - should ignore this preamble. - If you are reading this text, you might want to - consider changing to a mail reader that understands - how to properly display multipart messages. - --unique-boundary-1 - - ...Some text appears here... - [Note that the preceding blank line means - no header fields were given and this is text, - with charset US ASCII. It could have been - done with explicit typing as in the next part.] - - --unique-boundary-1 - Content-type: text/plain; charset=US-ASCII - - This could have been part of the previous part, - but illustrates explicit versus implicit - typing of body parts. - - --unique-boundary-1 - Content-Type: multipart/parallel; - boundary=unique-boundary-2 - - - --unique-boundary-2 - Content-Type: audio/basic - Content-Transfer-Encoding: base64 - - ... base64-encoded 8000 Hz single-channel - u-law-format audio data goes here.... - - --unique-boundary-2 - Content-Type: image/gif - Content-Transfer-Encoding: Base64 - - - - - - Borenstein & Freed [Page 62] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - ... base64-encoded image data goes here.... - - --unique-boundary-2-- - - --unique-boundary-1 - Content-type: text/richtext - - This is <bold><italic>richtext.</italic></bold> - <nl><nl>Isn't it - <bigger><bigger>cool?</bigger></bigger> - - --unique-boundary-1 - Content-Type: message/rfc822 - - From: (name in US-ASCII) - Subject: (subject in US-ASCII) - Content-Type: Text/plain; charset=ISO-8859-1 - Content-Transfer-Encoding: Quoted-printable - - ... Additional text in ISO-8859-1 goes here ... - - --unique-boundary-1-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 63] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix D -- A Simple Richtext-to-Text Translator in C - - One of the major goals in the design of the richtext subtype - of the text Content-Type is to make formatted text so simple - that even text-only mailers will implement richtext-to- - plain-text translators, thus increasing the likelihood that - multifont text will become "safe" to use very widely. To - demonstrate this simplicity, what follows is an extremely - simple 44-line C program that converts richtext input into - plain text output: - - #include <stdio.h> - #include <ctype.h> - main() { - int c, i; - char token[50]; - - while((c = getc(stdin)) != EOF) { - if (c == '<') { - for (i=0; (i<49 && (c = getc(stdin)) != '>' - && c != EOF); ++i) { - token[i] = isupper(c) ? tolower(c) : c; - } - if (c == EOF) break; - if (c != '>') while ((c = getc(stdin)) != - '>' - && c != EOF) {;} - if (c == EOF) break; - token[i] = '\0'; - if (!strcmp(token, "lt")) { - putc('<', stdout); - } else if (!strcmp(token, "nl")) { - putc('\n', stdout); - } else if (!strcmp(token, "/paragraph")) { - fputs("\n\n", stdout); - } else if (!strcmp(token, "comment")) { - int commct=1; - while (commct > 0) { - while ((c = getc(stdin)) != '<' - && c != EOF) ; - if (c == EOF) break; - for (i=0; (c = getc(stdin)) != '>' - && c != EOF; ++i) { - token[i] = isupper(c) ? - tolower(c) : c; - } - if (c== EOF) break; - token[i] = NULL; - if (!strcmp(token, "/comment")) -- - commct; - if (!strcmp(token, "comment")) - ++commct; - - - - - - Borenstein & Freed [Page 64] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - } - } /* Ignore all other tokens */ - } else if (c != '\n') putc(c, stdout); - } - putc('\n', stdout); /* for good measure */ - } - It should be noted that one can do considerably better than - this in displaying richtext data on a dumb terminal. In - particular, one can replace font information such as "bold" - with textual emphasis (like *this* or _T_H_I_S_). One can - also properly handle the richtext formatting commands - regarding indentation, justification, and others. However, - the above program is all that is necessary in order to - present richtext on a dumb terminal. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 65] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix E -- Collected Grammar - - This appendix contains the complete BNF grammar for all the - syntax specified by this document. - - By itself, however, this grammar is incomplete. It refers - to several entities that are defined by RFC 822. Rather - than reproduce those definitions here, and risk - unintentional differences between the two, this document - simply refers the reader to RFC 822 for the remaining - definitions. Wherever a term is undefined, it refers to the - RFC 822 definition. - - attribute := token - - body-part = <"message" as defined in RFC 822, - with all header fields optional, and with the - specified delimiter not occurring anywhere in - the message body, either on a line by itself - or as a substring anywhere.> - - boundary := 0*69<bchars> bcharsnospace - - bchars := bcharsnospace / " " - - bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / "+" / - "_" - / "," / "-" / "." / "/" / ":" / "=" / "?" - - close-delimiter := delimiter "--" - - Content-Description := *text - - Content-ID := msg-id - - Content-Transfer-Encoding := "BASE64" / "QUOTED- - PRINTABLE" / - "8BIT" / "7BIT" / - "BINARY" / x-token - - Content-Type := type "/" subtype *[";" parameter] - - delimiter := CRLF "--" boundary ; taken from Content-Type - field. - ; when content-type is - multipart - ; There should be no space - ; between "--" and boundary. - - encapsulation := delimiter CRLF body-part - - epilogue := *text ; to be ignored upon - receipt. - - - - - Borenstein & Freed [Page 66] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - MIME-Version := 1*text - - multipart-body := preamble 1*encapsulation close-delimiter - epilogue - - parameter := attribute "=" value - - preamble := *text ; to be ignored upon - receipt. - - subtype := token - - token := 1*<any CHAR except SPACE, CTLs, or tspecials> - - tspecials := "(" / ")" / "<" / ">" / "@" ; Must be in - / "," / ";" / ":" / "\" / <"> ; quoted-string, - / "/" / "[" / "]" / "?" / "." ; to use within - / "=" ; parameter values - - - type := "application" / "audio" ; case- - insensitive - / "image" / "message" - / "multipart" / "text" - / "video" / x-token - - value := token / quoted-string - - x-token := <The two characters "X-" followed, with no - intervening white space, by any token> - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 67] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix F -- IANA Registration Procedures - - MIME has been carefully designed to have extensible - mechanisms, and it is expected that the set of content- - type/subtype pairs and their associated parameters will grow - significantly with time. Several other MIME fields, notably - character set names, access-type parameters for the - message/external-body type, conversions parameters for the - application type, and possibly even Content-Transfer- - Encoding values, are likely to have new values defined over - time. In order to ensure that the set of such values is - developed in an orderly, well-specified, and public manner, - MIME defines a registration process which uses the Internet - Assigned Numbers Authority (IANA) as a central registry for - such values. - - In general, parameters in the content-type header field are - used to convey supplemental information for various content - types, and their use is defined when the content-type and - subtype are defined. New parameters should not be defined - as a way to introduce new functionality. - - In order to simplify and standardize the registration - process, this appendix gives templates for the registration - of new values with IANA. Each of these is given in the form - of an email message template, to be filled in by the - registering party. - - F.1 Registration of New Content-type/subtype Values - - Note that MIME is generally expected to be extended by - subtypes. If a new fundamental top-level type is needed, - its specification should be published as an RFC or - submitted in a form suitable to become an RFC, and be - subject to the Internet standards process. - - To: IANA@isi.edu - Subject: Registration of new MIME content-type/subtype - - MIME type name: - - (If the above is not an existing top-level MIME type, - please explain why an existing type cannot be used.) - - MIME subtype name: - - Required parameters: - - Optional parameters: - - Encoding considerations: - - Security considerations: - - - - - Borenstein & Freed [Page 68] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Published specification: - - (The published specification must be an Internet RFC or - RFC-to-be if a new top-level type is being defined, and - must be a publicly available specification in any - case.) - - Person & email address to contact for further - information: - F.2 Registration of New Character Set Values - - To: IANA@isi.edu - Subject: Registration of new MIME character set value - - MIME character set name: - - Published specification: - - (The published specification must be an Internet RFC or - RFC-to-be or an international standard.) - - Person & email address to contact for further - information: - - F.3 Registration of New Access-type Values for - Message/external-body - - To: IANA@isi.edu - Subject: Registration of new MIME Access-type for - Message/external-body content-type - - MIME access-type name: - - Required parameters: - - Optional parameters: - - Published specification: - - (The published specification must be an Internet RFC or - RFC-to-be.) - - Person & email address to contact for further - information: - - - F.4 Registration of New Conversions Values for Application - - To: IANA@isi.edu - Subject: Registration of new MIME Conversions value - for Application content-type - - MIME Conversions name: - - - - - Borenstein & Freed [Page 69] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Published specification: - - (The published specification must be an Internet RFC or - RFC-to-be.) - - Person & email address to contact for further - information: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 70] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix G -- Summary of the Seven Content-types - - Content-type: text - - Subtypes defined by this document: plain, richtext - - Important Parameters: charset - - Encoding notes: quoted-printable generally preferred if an - encoding is needed and the character set is mostly an - ASCII superset. - - Security considerations: Rich text formats such as TeX and - Troff often contain mechanisms for executing arbitrary - commands or file system operations, and should not be - used automatically unless these security problems have - been addressed. Even plain text may contain control - characters that can be used to exploit the capabilities - of "intelligent" terminals and cause security - violations. User interfaces designed to run on such - terminals should be aware of and try to prevent such - problems. - ________________________________________________________________ - - Content-type: multipart - - Subtypes defined by this document: mixed, alternative, - digest, parallel. - - Important Parameters: boundary - - Encoding notes: No content-transfer-encoding is permitted. - - ________________________________________________________________ - - Content-type: message - - Subtypes defined by this document: rfc822, partial, - external-body - - Important Parameters: id, number, total - - Encoding notes: No content-transfer-encoding is permitted. - - ________________________________________________________________ - - Content-type: application - - Subtypes defined by this document: octet-stream, - postscript, oda - - Important Parameters: profile - - - - - - Borenstein & Freed [Page 71] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Encoding notes: base64 generally preferred for octet-stream - or other unreadable subtypes. - - Security considerations: This type is intended for the - transmission of data to be interpreted by locally-installed - programs. If used, for example, to transmit executable - binary programs or programs in general-purpose interpreted - languages, such as LISP programs or shell scripts, severe - security problems could result. In general, authors of - mail-reading agents are cautioned against giving their - systems the power to execute mail-based application data - without carefully considering the security implications. - While it is certainly possible to define safe application - formats and even safe interpreters for unsafe formats, each - interpreter should be evaluated separately for possible - security problems. - ________________________________________________________________ - - Content-type: image - - Subtypes defined by this document: jpeg, gif - - Important Parameters: none - - Encoding notes: base64 generally preferred - - ________________________________________________________________ - - Content-type: audio - - Subtypes defined by this document: basic - - Important Parameters: none - - Encoding notes: base64 generally preferred - - ________________________________________________________________ - - Content-type: video - - Subtypes defined by this document: mpeg - - Important Parameters: none - - Encoding notes: base64 generally preferred - - - - - - - - - - - - - Borenstein & Freed [Page 72] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Appendix H -- Canonical Encoding Model - - - - There was some confusion, in earlier drafts of this memo, - regarding the model for when email data was to be converted - to canonical form and encoded, and in particular how this - process would affect the treatment of CRLFs, given that the - representation of newlines varies greatly from system to - system. For this reason, a canonical model for encoding is - presented below. - - The process of composing a MIME message part can be modelled - as being done in a number of steps. Note that these steps - are roughly similar to those steps used in RFC1113: - - Step 1. Creation of local form. - - The body part to be transmitted is created in the system's - native format. The native character set is used, and where - appropriate local end of line conventions are used as well. - The may be a UNIX-style text file, or a Sun raster image, or - a VMS indexed file, or audio data in a system-dependent - format stored only in memory, or anything else that - corresponds to the local model for the representation of - some form of information. - - Step 2. Conversion to canonical form. - - The entire body part, including "out-of-band" information - such as record lengths and possibly file attribute - information, is converted to a universal canonical form. - The specific content type of the body part as well as its - associated attributes dictate the nature of the canonical - form that is used. Conversion to the proper canonical form - may involve character set conversion, transformation of - audio data, compression, or various other operations - specific to the various content types. - - For example, in the case of text/plain data, the text must - be converted to a supported character set and lines must be - delimited with CRLF delimiters in accordance with RFC822. - Note that the restriction on line lengths implied by RFC822 - is eliminated if the next step employs either quoted- - printable or base64 encoding. - - Step 3. Apply transfer encoding. - - A Content-Transfer-Encoding appropriate for this body part - is applied. Note that there is no fixed relationship - between the content type and the transfer encoding. In - particular, it may be appropriate to base the choice of - base64 or quoted-printable on character frequency counts - which are specific to a given instance of body part. - - - - Borenstein & Freed [Page 73] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Step 4. Insertion into message. - - The encoded object is inserted into a MIME message with - appropriate body part headers and boundary markers. - - It is vital to note that these steps are only a model; they - are specifically NOT a blueprint for how an actual system - would be built. In particular, the model fails to account - for two common designs: - - 1. In many cases the conversion to a canonical - form prior to encoding will be subsumed into the - encoder itself, which understands local formats - directly. For example, the local newline - convention for text bodyparts might be carried - through to the encoder itself along with knowledge - of what that format is. - - 2. The output of the encoders may have to pass - through one or more additional steps prior to - being transmitted as a message. As such, the - output of the encoder may not be compliant with - the formats specified by RFC822. In particular, - once again it may be appropriate for the - converter's output to be expressed using local - newline conventions rather than using the standard - RFC822 CRLF delimiters. - - Other implementation variations are conceivable as well. - The only important aspect of this discussion is that the - resulting messages are consistent with those produced by the - model described here. - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 74] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - References - - [US-ASCII] Coded Character Set--7-Bit American Standard Code - for Information Interchange, ANSI X3.4-1986. - - [ATK] Borenstein, Nathaniel S., Multimedia Applications - Development with the Andrew Toolkit, Prentice-Hall, 1990. - - [GIF] Graphics Interchange Format (Version 89a), Compuserve, - Inc., Columbus, Ohio, 1990. - - [ISO-2022] International Standard--Information Processing-- - ISO 7-bit and 8-bit coded character sets--Code extension - techniques, ISO 2022:1986. - - [ISO-8859] Information Processing -- 8-bit Single-Byte Coded - Graphic Character Sets -- Part 1: Latin Alphabet No. 1, ISO - 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2, - 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part - 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: - Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: - Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: - Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: - Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin - alphabet No. 5, ISO 8859-9, 1990. - - [ISO-646] International Standard--Information Processing-- - ISO 7-bit coded character set for information interchange, - ISO 646:1983. - - [MPEG] Video Coding Draft Standard ISO 11172 CD, ISO - IEC/TJC1/SC2/WG11 (Motion Picture Experts Group), May, 1991. - - [ODA] ISO 8613; Information Processing: Text and Office - System; Office Document Architecture (ODA) and Interchange - Format (ODIF), Part 1-8, 1989. - - [PCM] CCITT, Fascicle III.4 - Recommendation G.711, Geneva, - 1972, "Pulse Code Modulation (PCM) of Voice Frequencies". - - [POSTSCRIPT] Adobe Systems, Inc., PostScript Language - Reference Manual, Addison-Wesley, 1985. - - [X400] Schicker, Pietro, "Message Handling Systems, X.400", - Message Handling Systems and Distributed Applications, E. - Stefferud, O-j. Jacobsen, and P. Schicker, eds., North- - Holland, 1989, pp. 3-41. - - [RFC-783] Sollins, K.R. TFTP Protocol (revision 2). June, - 1981, MIT, RFC-783. - - [RFC-821] Postel, J.B. Simple Mail Transfer Protocol. - August, 1982, USC/Information Sciences Institute, RFC-821. - - - - - Borenstein & Freed [Page 75] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - [RFC-822] Crocker, D. Standard for the format of ARPA - Internet text messages. August, 1982, UDEL, RFC-822. - - [RFC-934] Rose, M.T.; Stefferud, E.A. Proposed standard - for message encapsulation. January, 1985, Delaware - and NMA, RFC-934. - - [RFC-959] Postel, J.B.; Reynolds, J.K. File Transfer - Protocol. October, 1985, USC/Information Sciences - Institute, RFC-959. - - [RFC-1049] Sirbu, M.A. Content-Type header field for - Internet messages. March, 1988, CMU, RFC-1049. - - [RFC-1113] Linn, J. Privacy enhancement for Internet - electronic mail: Part I - message encipherment and - authentication procedures. August, 1989, IAB Privacy Task - Force, RFC-1113. - - [RFC-1154] Robinson, D.; Ullmann, R. Encoding header field - for Internet messages. April, 1990, Prime Computer, - Inc., RFC-1154. - - [RFC-1342] Moore, Keith, Representation of Non-Ascii Text in - Internet Message Headers. June, 1992, University of - Tennessee, RFC-1342. - - Security Considerations - - Security issues are discussed in Section 7.4.2 and in - Appendix G. Implementors should pay special attention to - the security implications of any mail content-types that can - cause the remote execution of any actions in the recipient's - environment. In such cases, the discussion of the - applicaton/postscript content-type in Section 7.4.2 may - serve as a model for considering other content-types with - remote execution capabilities. - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 76] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - Authors' Addresses - - For more information, the authors of this document may be - contacted via Internet mail: - - Nathaniel S. Borenstein - MRE 2D-296, Bellcore - 445 South St. - Morristown, NJ 07962-1910 - - Phone: +1 201 829 4270 - Fax: +1 201 829 7019 - Email: nsb@bellcore.com - - - Ned Freed - Innosoft International, Inc. - 250 West First Street - Suite 240 - Claremont, CA 91711 - - Phone: +1 714 624 7907 - Fax: +1 714 621 5319 - Email: ned@innosoft.com - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page 77] - - - - - RFC 1341MIME: Multipurpose Internet Mail ExtensionsJune 1992 - - - - - - THIS PAGE INTENTIONALLY LEFT BLANK. - - Please discard this page and place the following table of - contents after the title page. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page i] - - - - - - - - - Table of Contents - - - 1 Introduction....................................... 1 - 2 Notations, Conventions, and Generic BNF Grammar.... 3 - 3 The MIME-Version Header Field...................... 5 - 4 The Content-Type Header Field...................... 6 - 5 The Content-Transfer-Encoding Header Field......... 10 - 5.1 Quoted-Printable Content-Transfer-Encoding......... 14 - 5.2 Base64 Content-Transfer-Encoding................... 17 - 6 Additional Optional Content- Header Fields......... 19 - 6.1 Optional Content-ID Header Field................... 19 - 6.2 Optional Content-Description Header Field.......... 19 - 7 The Predefined Content-Type Values................. 20 - 7.1 The Text Content-Type.............................. 20 - 7.1.1 The charset parameter.............................. 20 - 7.1.2 The Text/plain subtype............................. 23 - 7.1.3 The Text/richtext subtype.......................... 23 - 7.2 The Multipart Content-Type......................... 29 - 7.2.1 Multipart: The common syntax...................... 30 - 7.2.2 The Multipart/mixed (primary) subtype.............. 34 - 7.2.3 The Multipart/alternative subtype.................. 34 - 7.2.4 The Multipart/digest subtype....................... 36 - 7.2.5 The Multipart/parallel subtype..................... 36 - 7.3 The Message Content-Type........................... 37 - 7.3.1 The Message/rfc822 (primary) subtype............... 37 - 7.3.2 The Message/Partial subtype........................ 37 - 7.3.3 The Message/External-Body subtype.................. 40 - 7.4 The Application Content-Type....................... 46 - 7.4.1 The Application/Octet-Stream (primary) subtype..... 46 - 7.4.2 The Application/PostScript subtype................. 47 - 7.4.3 The Application/ODA subtype........................ 50 - 7.5 The Image Content-Type............................. 51 - 7.6 The Audio Content-Type............................. 51 - 7.7 The Video Content-Type............................. 51 - 7.8 Experimental Content-Type Values................... 51 - Summary............................................ 53 - Acknowledgements................................... 54 - Appendix A -- Minimal MIME-Conformance............. 56 - Appendix B -- General Guidelines For Sending Email Data59 - Appendix C -- A Complex Multipart Example.......... 62 - Appendix D -- A Simple Richtext-to-Text Translator in C64 - Appendix E -- Collected Grammar.................... 66 - Appendix F -- IANA Registration Procedures......... 68 - F.1 Registration of New Content-type/subtype Values..68 - F.2 Registration of New Character Set Values...... 69 - F.3 Registration of New Access-type Values for Message/external-body69 - F.4 Registration of New Conversions Values for Application69 - Appendix G -- Summary of the Seven Content-types... 71 - Appendix H -- Canonical Encoding Model............. 73 - References......................................... 75 - Security Considerations............................ 76 - Authors' Addresses................................. 77 - - - - Borenstein & Freed [Page ii] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Borenstein & Freed [Page iii] - diff --git a/proto/rfc2045.txt b/proto/rfc2045.txt @@ -1,1739 +0,0 @@ - - - - - - -Network Working Group N. Freed -Request for Comments: 2045 Innosoft -Obsoletes: 1521, 1522, 1590 N. Borenstein -Category: Standards Track First Virtual - November 1996 - - - Multipurpose Internet Mail Extensions - (MIME) Part One: - Format of Internet Message Bodies - -Status of this Memo - - This document specifies an Internet standards track protocol for the - Internet community, and requests discussion and suggestions for - improvements. Please refer to the current edition of the "Internet - Official Protocol Standards" (STD 1) for the standardization state - and status of this protocol. Distribution of this memo is unlimited. - -Abstract - - STD 11, RFC 822, defines a message representation protocol specifying - considerable detail about US-ASCII message headers, and leaves the - message content, or message body, as flat US-ASCII text. This set of - documents, collectively called the Multipurpose Internet Mail - Extensions, or MIME, redefines the format of messages to allow for - - (1) textual message bodies in character sets other than - US-ASCII, - - (2) an extensible set of different formats for non-textual - message bodies, - - (3) multi-part message bodies, and - - (4) textual header information in character sets other than - US-ASCII. - - These documents are based on earlier work documented in RFC 934, STD - 11, and RFC 1049, but extends and revises them. Because RFC 822 said - so little about message bodies, these documents are largely - orthogonal to (rather than a revision of) RFC 822. - - This initial document specifies the various headers used to describe - the structure of MIME messages. The second document, RFC 2046, - defines the general structure of the MIME media typing system and - defines an initial set of media types. The third document, RFC 2047, - describes extensions to RFC 822 to allow non-US-ASCII text data in - - - -Freed & Borenstein Standards Track [Page 1] - -RFC 2045 Internet Message Bodies November 1996 - - - Internet mail header fields. The fourth document, RFC 2048, specifies - various IANA registration procedures for MIME-related facilities. The - fifth and final document, RFC 2049, describes MIME conformance - criteria as well as providing some illustrative examples of MIME - message formats, acknowledgements, and the bibliography. - - These documents are revisions of RFCs 1521, 1522, and 1590, which - themselves were revisions of RFCs 1341 and 1342. An appendix in RFC - 2049 describes differences and changes from previous versions. - -Table of Contents - - 1. Introduction ......................................... 3 - 2. Definitions, Conventions, and Generic BNF Grammar .... 5 - 2.1 CRLF ................................................ 5 - 2.2 Character Set ....................................... 6 - 2.3 Message ............................................. 6 - 2.4 Entity .............................................. 6 - 2.5 Body Part ........................................... 7 - 2.6 Body ................................................ 7 - 2.7 7bit Data ........................................... 7 - 2.8 8bit Data ........................................... 7 - 2.9 Binary Data ......................................... 7 - 2.10 Lines .............................................. 7 - 3. MIME Header Fields ................................... 8 - 4. MIME-Version Header Field ............................ 8 - 5. Content-Type Header Field ............................ 10 - 5.1 Syntax of the Content-Type Header Field ............. 12 - 5.2 Content-Type Defaults ............................... 14 - 6. Content-Transfer-Encoding Header Field ............... 14 - 6.1 Content-Transfer-Encoding Syntax .................... 14 - 6.2 Content-Transfer-Encodings Semantics ................ 15 - 6.3 New Content-Transfer-Encodings ...................... 16 - 6.4 Interpretation and Use .............................. 16 - 6.5 Translating Encodings ............................... 18 - 6.6 Canonical Encoding Model ............................ 19 - 6.7 Quoted-Printable Content-Transfer-Encoding .......... 19 - 6.8 Base64 Content-Transfer-Encoding .................... 24 - 7. Content-ID Header Field .............................. 26 - 8. Content-Description Header Field ..................... 27 - 9. Additional MIME Header Fields ........................ 27 - 10. Summary ............................................. 27 - 11. Security Considerations ............................. 27 - 12. Authors' Addresses .................................. 28 - A. Collected Grammar .................................... 29 - - - - - - -Freed & Borenstein Standards Track [Page 2] - -RFC 2045 Internet Message Bodies November 1996 - - -1. Introduction - - Since its publication in 1982, RFC 822 has defined the standard - format of textual mail messages on the Internet. Its success has - been such that the RFC 822 format has been adopted, wholly or - partially, well beyond the confines of the Internet and the Internet - SMTP transport defined by RFC 821. As the format has seen wider use, - a number of limitations have proven increasingly restrictive for the - user community. - - RFC 822 was intended to specify a format for text messages. As such, - non-text messages, such as multimedia messages that might include - audio or images, are simply not mentioned. Even in the case of text, - however, RFC 822 is inadequate for the needs of mail users whose - languages require the use of character sets richer than US-ASCII. - Since RFC 822 does not specify mechanisms for mail containing audio, - video, Asian language text, or even text in most European languages, - additional specifications are needed. - - One of the notable limitations of RFC 821/822 based mail systems is - the fact that they limit the contents of electronic mail messages to - relatively short lines (e.g. 1000 characters or less [RFC-821]) of - 7bit US-ASCII. This forces users to convert any non-textual data - that they may wish to send into seven-bit bytes representable as - printable US-ASCII characters before invoking a local mail UA (User - Agent, a program with which human users send and receive mail). - Examples of such encodings currently used in the Internet include - pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in - RFC 1421, the Andrew Toolkit Representation [ATK], and many others. - - The limitations of RFC 822 mail become even more apparent as gateways - are designed to allow for the exchange of mail messages between RFC - 822 hosts and X.400 hosts. X.400 [X400] specifies mechanisms for the - inclusion of non-textual material within electronic mail messages. - The current standards for the mapping of X.400 messages to RFC 822 - messages specify either that X.400 non-textual material must be - converted to (not encoded in) IA5Text format, or that they must be - discarded, notifying the RFC 822 user that discarding has occurred. - This is clearly undesirable, as information that a user may wish to - receive is lost. Even though a user agent may not have the - capability of dealing with the non-textual material, the user might - have some mechanism external to the UA that can extract useful - information from the material. Moreover, it does not allow for the - fact that the message may eventually be gatewayed back into an X.400 - message handling system (i.e., the X.400 message is "tunneled" - through Internet mail), where the non-textual information would - definitely become useful again. - - - - -Freed & Borenstein Standards Track [Page 3] - -RFC 2045 Internet Message Bodies November 1996 - - - This document describes several mechanisms that combine to solve most - of these problems without introducing any serious incompatibilities - with the existing world of RFC 822 mail. In particular, it - describes: - - (1) A MIME-Version header field, which uses a version - number to declare a message to be conformant with MIME - and allows mail processing agents to distinguish - between such messages and those generated by older or - non-conformant software, which are presumed to lack - such a field. - - (2) A Content-Type header field, generalized from RFC 1049, - which can be used to specify the media type and subtype - of data in the body of a message and to fully specify - the native representation (canonical form) of such - data. - - (3) A Content-Transfer-Encoding header field, which can be - used to specify both the encoding transformation that - was applied to the body and the domain of the result. - Encoding transformations other than the identity - transformation are usually applied to data in order to - allow it to pass through mail transport mechanisms - which may have data or character set limitations. - - (4) Two additional header fields that can be used to - further describe the data in a body, the Content-ID and - Content-Description header fields. - - All of the header fields defined in this document are subject to the - general syntactic rules for header fields specified in RFC 822. In - particular, all of these header fields except for Content-Disposition - can include RFC 822 comments, which have no semantic content and - should be ignored during MIME processing. - - Finally, to specify and promote interoperability, RFC 2049 provides a - basic applicability statement for a subset of the above mechanisms - that defines a minimal level of "conformance" with this document. - - HISTORICAL NOTE: Several of the mechanisms described in this set of - documents may seem somewhat strange or even baroque at first reading. - It is important to note that compatibility with existing standards - AND robustness across existing practice were two of the highest - priorities of the working group that developed this set of documents. - In particular, compatibility was always favored over elegance. - - - - - -Freed & Borenstein Standards Track [Page 4] - -RFC 2045 Internet Message Bodies November 1996 - - - Please refer to the current edition of the "Internet Official - Protocol Standards" for the standardization state and status of this - protocol. RFC 822 and STD 3, RFC 1123 also provide essential - background for MIME since no conforming implementation of MIME can - violate them. In addition, several other informational RFC documents - will be of interest to the MIME implementor, in particular RFC 1344, - RFC 1345, and RFC 1524. - -2. Definitions, Conventions, and Generic BNF Grammar - - Although the mechanisms specified in this set of documents are all - described in prose, most are also described formally in the augmented - BNF notation of RFC 822. Implementors will need to be familiar with - this notation in order to understand this set of documents, and are - referred to RFC 822 for a complete explanation of the augmented BNF - notation. - - Some of the augmented BNF in this set of documents makes named - references to syntax rules defined in RFC 822. A complete formal - grammar, then, is obtained by combining the collected grammar - appendices in each document in this set with the BNF of RFC 822 plus - the modifications to RFC 822 defined in RFC 1123 (which specifically - changes the syntax for `return', `date' and `mailbox'). - - All numeric and octet values are given in decimal notation in this - set of documents. All media type values, subtype values, and - parameter names as defined are case-insensitive. However, parameter - values are case-sensitive unless otherwise specified for the specific - parameter. - - FORMATTING NOTE: Notes, such at this one, provide additional - nonessential information which may be skipped by the reader without - missing anything essential. The primary purpose of these non- - essential notes is to convey information about the rationale of this - set of documents, or to place these documents in the proper - historical or evolutionary context. Such information may in - particular be skipped by those who are focused entirely on building a - conformant implementation, but may be of use to those who wish to - understand why certain design choices were made. - -2.1. CRLF - - The term CRLF, in this set of documents, refers to the sequence of - octets corresponding to the two US-ASCII characters CR (decimal value - 13) and LF (decimal value 10) which, taken together, in this order, - denote a line break in RFC 822 mail. - - - - - -Freed & Borenstein Standards Track [Page 5] - -RFC 2045 Internet Message Bodies November 1996 - - -2.2. Character Set - - The term "character set" is used in MIME to refer to a method of - converting a sequence of octets into a sequence of characters. Note - that unconditional and unambiguous conversion in the other direction - is not required, in that not all characters may be representable by a - given character set and a character set may provide more than one - sequence of octets to represent a particular sequence of characters. - - This definition is intended to allow various kinds of character - encodings, from simple single-table mappings such as US-ASCII to - complex table switching methods such as those that use ISO 2022's - techniques, to be used as character sets. However, the definition - associated with a MIME character set name must fully specify the - mapping to be performed. In particular, use of external profiling - information to determine the exact mapping is not permitted. - - NOTE: The term "character set" was originally to describe such - straightforward schemes as US-ASCII and ISO-8859-1 which have a - simple one-to-one mapping from single octets to single characters. - Multi-octet coded character sets and switching techniques make the - situation more complex. For example, some communities use the term - "character encoding" for what MIME calls a "character set", while - using the phrase "coded character set" to denote an abstract mapping - from integers (not octets) to characters. - -2.3. Message - - The term "message", when not further qualified, means either a - (complete or "top-level") RFC 822 message being transferred on a - network, or a message encapsulated in a body of type "message/rfc822" - or "message/partial". - -2.4. Entity - - The term "entity", refers specifically to the MIME-defined header - fields and contents of either a message or one of the parts in the - body of a multipart entity. The specification of such entities is - the essence of MIME. Since the contents of an entity are often - called the "body", it makes sense to speak about the body of an - entity. Any sort of field may be present in the header of an entity, - but only those fields whose names begin with "content-" actually have - any MIME-related meaning. Note that this does NOT imply thay they - have no meaning at all -- an entity that is also a message has non- - MIME header fields whose meanings are defined by RFC 822. - - - - - - -Freed & Borenstein Standards Track [Page 6] - -RFC 2045 Internet Message Bodies November 1996 - - -2.5. Body Part - - The term "body part" refers to an entity inside of a multipart - entity. - -2.6. Body - - The term "body", when not further qualified, means the body of an - entity, that is, the body of either a message or of a body part. - - NOTE: The previous four definitions are clearly circular. This is - unavoidable, since the overall structure of a MIME message is indeed - recursive. - -2.7. 7bit Data - - "7bit data" refers to data that is all represented as relatively - short lines with 998 octets or less between CRLF line separation - sequences [RFC-821]. No octets with decimal values greater than 127 - are allowed and neither are NULs (octets with decimal value 0). CR - (decimal value 13) and LF (decimal value 10) octets only occur as - part of CRLF line separation sequences. - -2.8. 8bit Data - - "8bit data" refers to data that is all represented as relatively - short lines with 998 octets or less between CRLF line separation - sequences [RFC-821]), but octets with decimal values greater than 127 - may be used. As with "7bit data" CR and LF octets only occur as part - of CRLF line separation sequences and no NULs are allowed. - -2.9. Binary Data - - "Binary data" refers to data where any sequence of octets whatsoever - is allowed. - -2.10. Lines - - "Lines" are defined as sequences of octets separated by a CRLF - sequences. This is consistent with both RFC 821 and RFC 822. - "Lines" only refers to a unit of data in a message, which may or may - not correspond to something that is actually displayed by a user - agent. - - - - - - - - -Freed & Borenstein Standards Track [Page 7] - -RFC 2045 Internet Message Bodies November 1996 - - -3. MIME Header Fields - - MIME defines a number of new RFC 822 header fields that are used to - describe the content of a MIME entity. These header fields occur in - at least two contexts: - - (1) As part of a regular RFC 822 message header. - - (2) In a MIME body part header within a multipart - construct. - - The formal definition of these header fields is as follows: - - entity-headers := [ content CRLF ] - [ encoding CRLF ] - [ id CRLF ] - [ description CRLF ] - *( MIME-extension-field CRLF ) - - MIME-message-headers := entity-headers - fields - version CRLF - ; The ordering of the header - ; fields implied by this BNF - ; definition should be ignored. - - MIME-part-headers := entity-headers - [ fields ] - ; Any field not beginning with - ; "content-" can have no defined - ; meaning and may be ignored. - ; The ordering of the header - ; fields implied by this BNF - ; definition should be ignored. - - The syntax of the various specific MIME header fields will be - described in the following sections. - -4. MIME-Version Header Field - - Since RFC 822 was published in 1982, there has really been only one - format standard for Internet messages, and there has been little - perceived need to declare the format standard in use. This document - is an independent specification that complements RFC 822. Although - the extensions in this document have been defined in such a way as to - be compatible with RFC 822, there are still circumstances in which it - might be desirable for a mail-processing agent to know whether a - message was composed with the new standard in mind. - - - -Freed & Borenstein Standards Track [Page 8] - -RFC 2045 Internet Message Bodies November 1996 - - - Therefore, this document defines a new header field, "MIME-Version", - which is to be used to declare the version of the Internet message - body format standard in use. - - Messages composed in accordance with this document MUST include such - a header field, with the following verbatim text: - - MIME-Version: 1.0 - - The presence of this header field is an assertion that the message - has been composed in compliance with this document. - - Since it is possible that a future document might extend the message - format standard again, a formal BNF is given for the content of the - MIME-Version field: - - version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT - - Thus, future format specifiers, which might replace or extend "1.0", - are constrained to be two integer fields, separated by a period. If - a message is received with a MIME-version value other than "1.0", it - cannot be assumed to conform with this document. - - Note that the MIME-Version header field is required at the top level - of a message. It is not required for each body part of a multipart - entity. It is required for the embedded headers of a body of type - "message/rfc822" or "message/partial" if and only if the embedded - message is itself claimed to be MIME-conformant. - - It is not possible to fully specify how a mail reader that conforms - with MIME as defined in this document should treat a message that - might arrive in the future with some value of MIME-Version other than - "1.0". - - It is also worth noting that version control for specific media types - is not accomplished using the MIME-Version mechanism. In particular, - some formats (such as application/postscript) have version numbering - conventions that are internal to the media format. Where such - conventions exist, MIME does nothing to supersede them. Where no - such conventions exist, a MIME media type might use a "version" - parameter in the content-type field if necessary. - - - - - - - - - - -Freed & Borenstein Standards Track [Page 9] - -RFC 2045 Internet Message Bodies November 1996 - - - NOTE TO IMPLEMENTORS: When checking MIME-Version values any RFC 822 - comment strings that are present must be ignored. In particular, the - following four MIME-Version fields are equivalent: - - MIME-Version: 1.0 - - MIME-Version: 1.0 (produced by MetaSend Vx.x) - - MIME-Version: (produced by MetaSend Vx.x) 1.0 - - MIME-Version: 1.(produced by MetaSend Vx.x)0 - - In the absence of a MIME-Version field, a receiving mail user agent - (whether conforming to MIME requirements or not) may optionally - choose to interpret the body of the message according to local - conventions. Many such conventions are currently in use and it - should be noted that in practice non-MIME messages can contain just - about anything. - - It is impossible to be certain that a non-MIME mail message is - actually plain text in the US-ASCII character set since it might well - be a message that, using some set of nonstandard local conventions - that predate MIME, includes text in another character set or non- - textual data presented in a manner that cannot be automatically - recognized (e.g., a uuencoded compressed UNIX tar file). - -5. Content-Type Header Field - - The purpose of the Content-Type field is to describe the data - contained in the body fully enough that the receiving user agent can - pick an appropriate agent or mechanism to present the data to the - user, or otherwise deal with the data in an appropriate manner. The - value in this field is called a media type. - - HISTORICAL NOTE: The Content-Type header field was first defined in - RFC 1049. RFC 1049 used a simpler and less powerful syntax, but one - that is largely compatible with the mechanism given here. - - The Content-Type header field specifies the nature of the data in the - body of an entity by giving media type and subtype identifiers, and - by providing auxiliary information that may be required for certain - media types. After the media type and subtype names, the remainder - of the header field is simply a set of parameters, specified in an - attribute=value notation. The ordering of parameters is not - significant. - - - - - - -Freed & Borenstein Standards Track [Page 10] - -RFC 2045 Internet Message Bodies November 1996 - - - In general, the top-level media type is used to declare the general - type of data, while the subtype specifies a specific format for that - type of data. Thus, a media type of "image/xyz" is enough to tell a - user agent that the data is an image, even if the user agent has no - knowledge of the specific image format "xyz". Such information can - be used, for example, to decide whether or not to show a user the raw - data from an unrecognized subtype -- such an action might be - reasonable for unrecognized subtypes of text, but not for - unrecognized subtypes of image or audio. For this reason, registered - subtypes of text, image, audio, and video should not contain embedded - information that is really of a different type. Such compound - formats should be represented using the "multipart" or "application" - types. - - Parameters are modifiers of the media subtype, and as such do not - fundamentally affect the nature of the content. The set of - meaningful parameters depends on the media type and subtype. Most - parameters are associated with a single specific subtype. However, a - given top-level media type may define parameters which are applicable - to any subtype of that type. Parameters may be required by their - defining content type or subtype or they may be optional. MIME - implementations must ignore any parameters whose names they do not - recognize. - - For example, the "charset" parameter is applicable to any subtype of - "text", while the "boundary" parameter is required for any subtype of - the "multipart" media type. - - There are NO globally-meaningful parameters that apply to all media - types. Truly global mechanisms are best addressed, in the MIME - model, by the definition of additional Content-* header fields. - - An initial set of seven top-level media types is defined in RFC 2046. - Five of these are discrete types whose content is essentially opaque - as far as MIME processing is concerned. The remaining two are - composite types whose contents require additional handling by MIME - processors. - - This set of top-level media types is intended to be substantially - complete. It is expected that additions to the larger set of - supported types can generally be accomplished by the creation of new - subtypes of these initial types. In the future, more top-level types - may be defined only by a standards-track extension to this standard. - If another top-level type is to be used for any reason, it must be - given a name starting with "X-" to indicate its non-standard status - and to avoid a potential conflict with a future official name. - - - - - -Freed & Borenstein Standards Track [Page 11] - -RFC 2045 Internet Message Bodies November 1996 - - -5.1. Syntax of the Content-Type Header Field - - In the Augmented BNF notation of RFC 822, a Content-Type header field - value is defined as follows: - - content := "Content-Type" ":" type "/" subtype - *(";" parameter) - ; Matching of media type and subtype - ; is ALWAYS case-insensitive. - - type := discrete-type / composite-type - - discrete-type := "text" / "image" / "audio" / "video" / - "application" / extension-token - - composite-type := "message" / "multipart" / extension-token - - extension-token := ietf-token / x-token - - ietf-token := <An extension token defined by a - standards-track RFC and registered - with IANA.> - - x-token := <The two characters "X-" or "x-" followed, with - no intervening white space, by any token> - - subtype := extension-token / iana-token - - iana-token := <A publicly-defined extension token. Tokens - of this form must be registered with IANA - as specified in RFC 2048.> - - parameter := attribute "=" value - - attribute := token - ; Matching of attributes - ; is ALWAYS case-insensitive. - - value := token / quoted-string - - token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, - or tspecials> - - tspecials := "(" / ")" / "<" / ">" / "@" / - "," / ";" / ":" / "\" / <"> - "/" / "[" / "]" / "?" / "=" - ; Must be in quoted-string, - ; to use within parameter values - - - -Freed & Borenstein Standards Track [Page 12] - -RFC 2045 Internet Message Bodies November 1996 - - - Note that the definition of "tspecials" is the same as the RFC 822 - definition of "specials" with the addition of the three characters - "/", "?", and "=", and the removal of ".". - - Note also that a subtype specification is MANDATORY -- it may not be - omitted from a Content-Type header field. As such, there are no - default subtypes. - - The type, subtype, and parameter names are not case sensitive. For - example, TEXT, Text, and TeXt are all equivalent top-level media - types. Parameter values are normally case sensitive, but sometimes - are interpreted in a case-insensitive fashion, depending on the - intended use. (For example, multipart boundaries are case-sensitive, - but the "access-type" parameter for message/External-body is not - case-sensitive.) - - Note that the value of a quoted string parameter does not include the - quotes. That is, the quotation marks in a quoted-string are not a - part of the value of the parameter, but are merely used to delimit - that parameter value. In addition, comments are allowed in - accordance with RFC 822 rules for structured header fields. Thus the - following two forms - - Content-type: text/plain; charset=us-ascii (Plain text) - - Content-type: text/plain; charset="us-ascii" - - are completely equivalent. - - Beyond this syntax, the only syntactic constraint on the definition - of subtype names is the desire that their uses must not conflict. - That is, it would be undesirable to have two different communities - using "Content-Type: application/foobar" to mean two different - things. The process of defining new media subtypes, then, is not - intended to be a mechanism for imposing restrictions, but simply a - mechanism for publicizing their definition and usage. There are, - therefore, two acceptable mechanisms for defining new media subtypes: - - (1) Private values (starting with "X-") may be defined - bilaterally between two cooperating agents without - outside registration or standardization. Such values - cannot be registered or standardized. - - (2) New standard values should be registered with IANA as - described in RFC 2048. - - The second document in this set, RFC 2046, defines the initial set of - media types for MIME. - - - -Freed & Borenstein Standards Track [Page 13] - -RFC 2045 Internet Message Bodies November 1996 - - -5.2. Content-Type Defaults - - Default RFC 822 messages without a MIME Content-Type header are taken - by this protocol to be plain text in the US-ASCII character set, - which can be explicitly specified as: - - Content-type: text/plain; charset=us-ascii - - This default is assumed if no Content-Type header field is specified. - It is also recommend that this default be assumed when a - syntactically invalid Content-Type header field is encountered. In - the presence of a MIME-Version header field and the absence of any - Content-Type header field, a receiving User Agent can also assume - that plain US-ASCII text was the sender's intent. Plain US-ASCII - text may still be assumed in the absence of a MIME-Version or the - presence of an syntactically invalid Content-Type header field, but - the sender's intent might have been otherwise. - -6. Content-Transfer-Encoding Header Field - - Many media types which could be usefully transported via email are - represented, in their "natural" format, as 8bit character or binary - data. Such data cannot be transmitted over some transfer protocols. - For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII - data with lines no longer than 1000 characters including any trailing - CRLF line separator. - - It is necessary, therefore, to define a standard mechanism for - encoding such data into a 7bit short line format. Proper labelling - of unencoded material in less restrictive formats for direct use over - less restrictive transports is also desireable. This document - specifies that such encodings will be indicated by a new "Content- - Transfer-Encoding" header field. This field has not been defined by - any previous standard. - -6.1. Content-Transfer-Encoding Syntax - - The Content-Transfer-Encoding field's value is a single token - specifying the type of encoding, as enumerated below. Formally: - - encoding := "Content-Transfer-Encoding" ":" mechanism - - mechanism := "7bit" / "8bit" / "binary" / - "quoted-printable" / "base64" / - ietf-token / x-token - - These values are not case sensitive -- Base64 and BASE64 and bAsE64 - are all equivalent. An encoding type of 7BIT requires that the body - - - -Freed & Borenstein Standards Track [Page 14] - -RFC 2045 Internet Message Bodies November 1996 - - - is already in a 7bit mail-ready representation. This is the default - value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the - Content-Transfer-Encoding header field is not present. - -6.2. Content-Transfer-Encodings Semantics - - This single Content-Transfer-Encoding token actually provides two - pieces of information. It specifies what sort of encoding - transformation the body was subjected to and hence what decoding - operation must be used to restore it to its original form, and it - specifies what the domain of the result is. - - The transformation part of any Content-Transfer-Encodings specifies, - either explicitly or implicitly, a single, well-defined decoding - algorithm, which for any sequence of encoded octets either transforms - it to the original sequence of octets which was encoded, or shows - that it is illegal as an encoded sequence. Content-Transfer- - Encodings transformations never depend on any additional external - profile information for proper operation. Note that while decoders - must produce a single, well-defined output for a valid encoding no - such restrictions exist for encoders: Encoding a given sequence of - octets to different, equivalent encoded sequences is perfectly legal. - - Three transformations are currently defined: identity, the "quoted- - printable" encoding, and the "base64" encoding. The domains are - "binary", "8bit" and "7bit". - - The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all - mean that the identity (i.e. NO) encoding transformation has been - performed. As such, they serve simply as indicators of the domain of - the body data, and provide useful information about the sort of - encoding that might be needed for transmission in a given transport - system. The terms "7bit data", "8bit data", and "binary data" are - all defined in Section 2. - - The quoted-printable and base64 encodings transform their input from - an arbitrary domain into material in the "7bit" range, thus making it - safe to carry over restricted transports. The specific definition of - the transformations are given below. - - The proper Content-Transfer-Encoding label must always be used. - Labelling unencoded data containing 8bit characters as "7bit" is not - allowed, nor is labelling unencoded non-line-oriented data as - anything other than "binary" allowed. - - Unlike media subtypes, a proliferation of Content-Transfer-Encoding - values is both undesirable and unnecessary. However, establishing - only a single transformation into the "7bit" domain does not seem - - - -Freed & Borenstein Standards Track [Page 15] - -RFC 2045 Internet Message Bodies November 1996 - - - possible. There is a tradeoff between the desire for a compact and - efficient encoding of largely- binary data and the desire for a - somewhat readable encoding of data that is mostly, but not entirely, - 7bit. For this reason, at least two encoding mechanisms are - necessary: a more or less readable encoding (quoted-printable) and a - "dense" or "uniform" encoding (base64). - - Mail transport for unencoded 8bit data is defined in RFC 1652. As of - the initial publication of this document, there are no standardized - Internet mail transports for which it is legitimate to include - unencoded binary data in mail bodies. Thus there are no - circumstances in which the "binary" Content-Transfer-Encoding is - actually valid in Internet mail. However, in the event that binary - mail transport becomes a reality in Internet mail, or when MIME is - used in conjunction with any other binary-capable mail transport - mechanism, binary bodies must be labelled as such using this - mechanism. - - NOTE: The five values defined for the Content-Transfer-Encoding field - imply nothing about the media type other than the algorithm by which - it was encoded or the transport system requirements if unencoded. - -6.3. New Content-Transfer-Encodings - - Implementors may, if necessary, define private Content-Transfer- - Encoding values, but must use an x-token, which is a name prefixed by - "X-", to indicate its non-standard status, e.g., "Content-Transfer- - Encoding: x-my-new-encoding". Additional standardized Content- - Transfer-Encoding values must be specified by a standards-track RFC. - The requirements such specifications must meet are given in RFC 2048. - As such, all content-transfer-encoding namespace except that - beginning with "X-" is explicitly reserved to the IETF for future - use. - - Unlike media types and subtypes, the creation of new Content- - Transfer-Encoding values is STRONGLY discouraged, as it seems likely - to hinder interoperability with little potential benefit - -6.4. Interpretation and Use - - If a Content-Transfer-Encoding header field appears as part of a - message header, it applies to the entire body of that message. If a - Content-Transfer-Encoding header field appears as part of an entity's - headers, it applies only to the body of that entity. If an entity is - of type "multipart" the Content-Transfer-Encoding is not permitted to - have any value other than "7bit", "8bit" or "binary". Even more - severe restrictions apply to some subtypes of the "message" type. - - - - -Freed & Borenstein Standards Track [Page 16] - -RFC 2045 Internet Message Bodies November 1996 - - - It should be noted that most media types are defined in terms of - octets rather than bits, so that the mechanisms described here are - mechanisms for encoding arbitrary octet streams, not bit streams. If - a bit stream is to be encoded via one of these mechanisms, it must - first be converted to an 8bit byte stream using the network standard - bit order ("big-endian"), in which the earlier bits in a stream - become the higher-order bits in a 8bit byte. A bit stream not ending - at an 8bit boundary must be padded with zeroes. RFC 2046 provides a - mechanism for noting the addition of such padding in the case of the - application/octet-stream media type, which has a "padding" parameter. - - The encoding mechanisms defined here explicitly encode all data in - US-ASCII. Thus, for example, suppose an entity has header fields - such as: - - Content-Type: text/plain; charset=ISO-8859-1 - Content-transfer-encoding: base64 - - This must be interpreted to mean that the body is a base64 US-ASCII - encoding of data that was originally in ISO-8859-1, and will be in - that character set again after decoding. - - Certain Content-Transfer-Encoding values may only be used on certain - media types. In particular, it is EXPRESSLY FORBIDDEN to use any - encodings other than "7bit", "8bit", or "binary" with any composite - media type, i.e. one that recursively includes other Content-Type - fields. Currently the only composite media types are "multipart" and - "message". All encodings that are desired for bodies of type - multipart or message must be done at the innermost level, by encoding - the actual body that needs to be encoded. - - It should also be noted that, by definition, if a composite entity - has a transfer-encoding value such as "7bit", but one of the enclosed - entities has a less restrictive value such as "8bit", then either the - outer "7bit" labelling is in error, because 8bit data are included, - or the inner "8bit" labelling placed an unnecessarily high demand on - the transport system because the actual included data were actually - 7bit-safe. - - NOTE ON ENCODING RESTRICTIONS: Though the prohibition against using - content-transfer-encodings on composite body data may seem overly - restrictive, it is necessary to prevent nested encodings, in which - data are passed through an encoding algorithm multiple times, and - must be decoded multiple times in order to be properly viewed. - Nested encodings add considerable complexity to user agents: Aside - from the obvious efficiency problems with such multiple encodings, - they can obscure the basic structure of a message. In particular, - they can imply that several decoding operations are necessary simply - - - -Freed & Borenstein Standards Track [Page 17] - -RFC 2045 Internet Message Bodies November 1996 - - - to find out what types of bodies a message contains. Banning nested - encodings may complicate the job of certain mail gateways, but this - seems less of a problem than the effect of nested encodings on user - agents. - - Any entity with an unrecognized Content-Transfer-Encoding must be - treated as if it has a Content-Type of "application/octet-stream", - regardless of what the Content-Type header field actually says. - - NOTE ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER- - ENCODING: It may seem that the Content-Transfer-Encoding could be - inferred from the characteristics of the media that is to be encoded, - or, at the very least, that certain Content-Transfer-Encodings could - be mandated for use with specific media types. There are several - reasons why this is not the case. First, given the varying types of - transports used for mail, some encodings may be appropriate for some - combinations of media types and transports but not for others. (For - example, in an 8bit transport, no encoding would be required for text - in certain character sets, while such encodings are clearly required - for 7bit SMTP.) - - Second, certain media types may require different types of transfer - encoding under different circumstances. For example, many PostScript - bodies might consist entirely of short lines of 7bit data and hence - require no encoding at all. Other PostScript bodies (especially - those using Level 2 PostScript's binary encoding mechanism) may only - be reasonably represented using a binary transport encoding. - Finally, since the Content-Type field is intended to be an open-ended - specification mechanism, strict specification of an association - between media types and encodings effectively couples the - specification of an application protocol with a specific lower-level - transport. This is not desirable since the developers of a media - type should not have to be aware of all the transports in use and - what their limitations are. - -6.5. Translating Encodings - - The quoted-printable and base64 encodings are designed so that - conversion between them is possible. The only issue that arises in - such a conversion is the handling of hard line breaks in quoted- - printable encoding output. When converting from quoted-printable to - base64 a hard line break in the quoted-printable form represents a - CRLF sequence in the canonical form of the data. It must therefore be - converted to a corresponding encoded CRLF in the base64 form of the - data. Similarly, a CRLF sequence in the canonical form of the data - obtained after base64 decoding must be converted to a quoted- - printable hard line break, but ONLY when converting text data. - - - - -Freed & Borenstein Standards Track [Page 18] - -RFC 2045 Internet Message Bodies November 1996 - - -6.6. Canonical Encoding Model - - There was some confusion, in the previous versions of this RFC, - regarding the model for when email data was to be converted to - canonical form and encoded, and in particular how this process would - affect the treatment of CRLFs, given that the representation of - newlines varies greatly from system to system, and the relationship - between content-transfer-encodings and character sets. A canonical - model for encoding is presented in RFC 2049 for this reason. - -6.7. Quoted-Printable Content-Transfer-Encoding - - The Quoted-Printable encoding is intended to represent data that - largely consists of octets that correspond to printable characters in - the US-ASCII character set. It encodes the data in such a way that - the resulting octets are unlikely to be modified by mail transport. - If the data being encoded are mostly US-ASCII text, the encoded form - of the data remains largely recognizable by humans. A body which is - entirely US-ASCII may also be encoded in Quoted-Printable to ensure - the integrity of the data should the message pass through a - character-translating, and/or line-wrapping gateway. - - In this encoding, octets are to be represented as determined by the - following rules: - - (1) (General 8bit representation) Any octet, except a CR or - LF that is part of a CRLF line break of the canonical - (standard) form of the data being encoded, may be - represented by an "=" followed by a two digit - hexadecimal representation of the octet's value. The - digits of the hexadecimal alphabet, for this purpose, - are "0123456789ABCDEF". Uppercase letters must be - used; lowercase letters are not allowed. Thus, for - example, the decimal value 12 (US-ASCII form feed) can - be represented by "=0C", and the decimal value 61 (US- - ASCII EQUAL SIGN) can be represented by "=3D". This - rule must be followed except when the following rules - allow an alternative encoding. - - (2) (Literal representation) Octets with decimal values of - 33 through 60 inclusive, and 62 through 126, inclusive, - MAY be represented as the US-ASCII characters which - correspond to those octets (EXCLAMATION POINT through - LESS THAN, and GREATER THAN through TILDE, - respectively). - - (3) (White Space) Octets with values of 9 and 32 MAY be - represented as US-ASCII TAB (HT) and SPACE characters, - - - -Freed & Borenstein Standards Track [Page 19] - -RFC 2045 Internet Message Bodies November 1996 - - - respectively, but MUST NOT be so represented at the end - of an encoded line. Any TAB (HT) or SPACE characters - on an encoded line MUST thus be followed on that line - by a printable character. In particular, an "=" at the - end of an encoded line, indicating a soft line break - (see rule #5) may follow one or more TAB (HT) or SPACE - characters. It follows that an octet with decimal - value 9 or 32 appearing at the end of an encoded line - must be represented according to Rule #1. This rule is - necessary because some MTAs (Message Transport Agents, - programs which transport messages from one user to - another, or perform a portion of such transfers) are - known to pad lines of text with SPACEs, and others are - known to remove "white space" characters from the end - of a line. Therefore, when decoding a Quoted-Printable - body, any trailing white space on a line must be - deleted, as it will necessarily have been added by - intermediate transport agents. - - (4) (Line Breaks) A line break in a text body, represented - as a CRLF sequence in the text canonical form, must be - represented by a (RFC 822) line break, which is also a - CRLF sequence, in the Quoted-Printable encoding. Since - the canonical representation of media types other than - text do not generally include the representation of - line breaks as CRLF sequences, no hard line breaks - (i.e. line breaks that are intended to be meaningful - and to be displayed to the user) can occur in the - quoted-printable encoding of such types. Sequences - like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely - appear in non-text data represented in quoted- - printable, of course. - - Note that many implementations may elect to encode the - local representation of various content types directly - rather than converting to canonical form first, - encoding, and then converting back to local - representation. In particular, this may apply to plain - text material on systems that use newline conventions - other than a CRLF terminator sequence. Such an - implementation optimization is permissible, but only - when the combined canonicalization-encoding step is - equivalent to performing the three steps separately. - - (5) (Soft Line Breaks) The Quoted-Printable encoding - REQUIRES that encoded lines be no more than 76 - characters long. If longer lines are to be encoded - with the Quoted-Printable encoding, "soft" line breaks - - - -Freed & Borenstein Standards Track [Page 20] - -RFC 2045 Internet Message Bodies November 1996 - - - must be used. An equal sign as the last character on a - encoded line indicates such a non-significant ("soft") - line break in the encoded text. - - Thus if the "raw" form of the line is a single unencoded line that - says: - - Now's the time for all folk to come to the aid of their country. - - This can be represented, in the Quoted-Printable encoding, as: - - Now's the time = - for all folk to come= - to the aid of their country. - - This provides a mechanism with which long lines are encoded in such a - way as to be restored by the user agent. The 76 character limit does - not count the trailing CRLF, but counts all other characters, - including any equal signs. - - Since the hyphen character ("-") may be represented as itself in the - Quoted-Printable encoding, care must be taken, when encapsulating a - quoted-printable encoded body inside one or more multipart entities, - to ensure that the boundary delimiter does not appear anywhere in the - encoded body. (A good strategy is to choose a boundary that includes - a character sequence such as "=_" which can never appear in a - quoted-printable body. See the definition of multipart messages in - RFC 2046.) - - NOTE: The quoted-printable encoding represents something of a - compromise between readability and reliability in transport. Bodies - encoded with the quoted-printable encoding will work reliably over - most mail gateways, but may not work perfectly over a few gateways, - notably those involving translation into EBCDIC. A higher level of - confidence is offered by the base64 Content-Transfer-Encoding. A way - to get reasonably reliable transport through EBCDIC gateways is to - also quote the US-ASCII characters - - !"#$@[\]^`{|}~ - - according to rule #1. - - Because quoted-printable data is generally assumed to be line- - oriented, it is to be expected that the representation of the breaks - between the lines of quoted-printable data may be altered in - transport, in the same manner that plain text mail has always been - altered in Internet mail when passing between systems with differing - newline conventions. If such alterations are likely to constitute a - - - -Freed & Borenstein Standards Track [Page 21] - -RFC 2045 Internet Message Bodies November 1996 - - - corruption of the data, it is probably more sensible to use the - base64 encoding rather than the quoted-printable encoding. - - NOTE: Several kinds of substrings cannot be generated according to - the encoding rules for the quoted-printable content-transfer- - encoding, and hence are formally illegal if they appear in the output - of a quoted-printable encoder. This note enumerates these cases and - suggests ways to handle such illegal substrings if any are - encountered in quoted-printable data that is to be decoded. - - (1) An "=" followed by two hexadecimal digits, one or both - of which are lowercase letters in "abcdef", is formally - illegal. A robust implementation might choose to - recognize them as the corresponding uppercase letters. - - (2) An "=" followed by a character that is neither a - hexadecimal digit (including "abcdef") nor the CR - character of a CRLF pair is illegal. This case can be - the result of US-ASCII text having been included in a - quoted-printable part of a message without itself - having been subjected to quoted-printable encoding. A - reasonable approach by a robust implementation might be - to include the "=" character and the following - character in the decoded data without any - transformation and, if possible, indicate to the user - that proper decoding was not possible at this point in - the data. - - (3) An "=" cannot be the ultimate or penultimate character - in an encoded object. This could be handled as in case - (2) above. - - (4) Control characters other than TAB, or CR and LF as - parts of CRLF pairs, must not appear. The same is true - for octets with decimal values greater than 126. If - found in incoming quoted-printable data by a decoder, a - robust implementation might exclude them from the - decoded data and warn the user that illegal characters - were discovered. - - (5) Encoded lines must not be longer than 76 characters, - not counting the trailing CRLF. If longer lines are - found in incoming, encoded data, a robust - implementation might nevertheless decode the lines, and - might report the erroneous encoding to the user. - - - - - - -Freed & Borenstein Standards Track [Page 22] - -RFC 2045 Internet Message Bodies November 1996 - - - WARNING TO IMPLEMENTORS: If binary data is encoded in quoted- - printable, care must be taken to encode CR and LF characters as "=0D" - and "=0A", respectively. In particular, a CRLF sequence in binary - data should be encoded as "=0D=0A". Otherwise, if CRLF were - represented as a hard line break, it might be incorrectly decoded on - platforms with different line break conventions. - - For formalists, the syntax of quoted-printable data is described by - the following grammar: - - quoted-printable := qp-line *(CRLF qp-line) - - qp-line := *(qp-segment transport-padding CRLF) - qp-part transport-padding - - qp-part := qp-section - ; Maximum length of 76 characters - - qp-segment := qp-section *(SPACE / TAB) "=" - ; Maximum length of 76 characters - - qp-section := [*(ptext / SPACE / TAB) ptext] - - ptext := hex-octet / safe-char - - safe-char := <any octet with decimal value of 33 through - 60 inclusive, and 62 through 126> - ; Characters not listed as "mail-safe" in - ; RFC 2049 are also not recommended. - - hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") - ; Octet must be used for characters > 127, =, - ; SPACEs or TABs at the ends of lines, and is - ; recommended for any character not listed in - ; RFC 2049 as "mail-safe". - - transport-padding := *LWSP-char - ; Composers MUST NOT generate - ; non-zero length transport - ; padding, but receivers MUST - ; be able to handle padding - ; added by message transports. - - IMPORTANT: The addition of LWSP between the elements shown in this - BNF is NOT allowed since this BNF does not specify a structured - header field. - - - - - -Freed & Borenstein Standards Track [Page 23] - -RFC 2045 Internet Message Bodies November 1996 - - -6.8. Base64 Content-Transfer-Encoding - - The Base64 Content-Transfer-Encoding is designed to represent - arbitrary sequences of octets in a form that need not be humanly - readable. The encoding and decoding algorithms are simple, but the - encoded data are consistently only about 33 percent larger than the - unencoded data. This encoding is virtually identical to the one used - in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421. - - A 65-character subset of US-ASCII is used, enabling 6 bits to be - represented per printable character. (The extra 65th character, "=", - is used to signify a special processing function.) - - NOTE: This subset has the important property that it is represented - identically in all versions of ISO 646, including US-ASCII, and all - characters in the subset are also represented identically in all - versions of EBCDIC. Other popular encodings, such as the encoding - used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741], and - the base85 encoding specified as part of Level 2 PostScript, do not - share these properties, and thus do not fulfill the portability - requirements a binary transport encoding for mail must meet. - - The encoding process represents 24-bit groups of input bits as output - strings of 4 encoded characters. Proceeding from left to right, a - 24-bit input group is formed by concatenating 3 8bit input groups. - These 24 bits are then treated as 4 concatenated 6-bit groups, each - of which is translated into a single digit in the base64 alphabet. - When encoding a bit stream via the base64 encoding, the bit stream - must be presumed to be ordered with the most-significant-bit first. - That is, the first bit in the stream will be the high-order bit in - the first 8bit byte, and the eighth bit will be the low-order bit in - the first 8bit byte, and so on. - - Each 6-bit group is used as an index into an array of 64 printable - characters. The character referenced by the index is placed in the - output string. These characters, identified in Table 1, below, are - selected so as to be universally representable, and the set excludes - characters with particular significance to SMTP (e.g., ".", CR, LF) - and to the multipart boundary delimiters defined in RFC 2046 (e.g., - "-"). - - - - - - - - - - - -Freed & Borenstein Standards Track [Page 24] - -RFC 2045 Internet Message Bodies November 1996 - - - Table 1: The Base64 Alphabet - - Value Encoding Value Encoding Value Encoding Value Encoding - 0 A 17 R 34 i 51 z - 1 B 18 S 35 j 52 0 - 2 C 19 T 36 k 53 1 - 3 D 20 U 37 l 54 2 - 4 E 21 V 38 m 55 3 - 5 F 22 W 39 n 56 4 - 6 G 23 X 40 o 57 5 - 7 H 24 Y 41 p 58 6 - 8 I 25 Z 42 q 59 7 - 9 J 26 a 43 r 60 8 - 10 K 27 b 44 s 61 9 - 11 L 28 c 45 t 62 + - 12 M 29 d 46 u 63 / - 13 N 30 e 47 v - 14 O 31 f 48 w (pad) = - 15 P 32 g 49 x - 16 Q 33 h 50 y - - The encoded output stream must be represented in lines of no more - than 76 characters each. All line breaks or other characters not - found in Table 1 must be ignored by decoding software. In base64 - data, characters other than those in Table 1, line breaks, and other - white space probably indicate a transmission error, about which a - warning message or even a message rejection might be appropriate - under some circumstances. - - Special processing is performed if fewer than 24 bits are available - at the end of the data being encoded. A full encoding quantum is - always completed at the end of a body. When fewer than 24 input bits - are available in an input group, zero bits are added (on the right) - to form an integral number of 6-bit groups. Padding at the end of - the data is performed using the "=" character. Since all base64 - input is an integral number of octets, only the following cases can - arise: (1) the final quantum of encoding input is an integral - multiple of 24 bits; here, the final unit of encoded output will be - an integral multiple of 4 characters with no "=" padding, (2) the - final quantum of encoding input is exactly 8 bits; here, the final - unit of encoded output will be two characters followed by two "=" - padding characters, or (3) the final quantum of encoding input is - exactly 16 bits; here, the final unit of encoded output will be three - characters followed by one "=" padding character. - - Because it is used only for padding at the end of the data, the - occurrence of any "=" characters may be taken as evidence that the - end of the data has been reached (without truncation in transit). No - - - -Freed & Borenstein Standards Track [Page 25] - -RFC 2045 Internet Message Bodies November 1996 - - - such assurance is possible, however, when the number of octets - transmitted was a multiple of three and no "=" characters are - present. - - Any characters outside of the base64 alphabet are to be ignored in - base64-encoded data. - - Care must be taken to use the proper octets for line breaks if base64 - encoding is applied directly to text material that has not been - converted to canonical form. In particular, text line breaks must be - converted into CRLF sequences prior to base64 encoding. The - important thing to note is that this may be done directly by the - encoder rather than in a prior canonicalization step in some - implementations. - - NOTE: There is no need to worry about quoting potential boundary - delimiters within base64-encoded bodies within multipart entities - because no hyphen characters are used in the base64 encoding. - -7. Content-ID Header Field - - In constructing a high-level user agent, it may be desirable to allow - one body to make reference to another. Accordingly, bodies may be - labelled using the "Content-ID" header field, which is syntactically - identical to the "Message-ID" header field: - - id := "Content-ID" ":" msg-id - - Like the Message-ID values, Content-ID values must be generated to be - world-unique. - - The Content-ID value may be used for uniquely identifying MIME - entities in several contexts, particularly for caching data - referenced by the message/external-body mechanism. Although the - Content-ID header is generally optional, its use is MANDATORY in - implementations which generate data of the optional MIME media type - "message/external-body". That is, each message/external-body entity - must have a Content-ID field to permit caching of such data. - - It is also worth noting that the Content-ID value has special - semantics in the case of the multipart/alternative media type. This - is explained in the section of RFC 2046 dealing with - multipart/alternative. - - - - - - - - -Freed & Borenstein Standards Track [Page 26] - -RFC 2045 Internet Message Bodies November 1996 - - -8. Content-Description Header Field - - The ability to associate some descriptive information with a given - body is often desirable. For example, it may be useful to mark an - "image" body as "a picture of the Space Shuttle Endeavor." Such text - may be placed in the Content-Description header field. This header - field is always optional. - - description := "Content-Description" ":" *text - - The description is presumed to be given in the US-ASCII character - set, although the mechanism specified in RFC 2047 may be used for - non-US-ASCII Content-Description values. - -9. Additional MIME Header Fields - - Future documents may elect to define additional MIME header fields - for various purposes. Any new header field that further describes - the content of a message should begin with the string "Content-" to - allow such fields which appear in a message header to be - distinguished from ordinary RFC 822 message header fields. - - MIME-extension-field := <Any RFC 822 header field which - begins with the string - "Content-"> - -10. Summary - - Using the MIME-Version, Content-Type, and Content-Transfer-Encoding - header fields, it is possible to include, in a standardized way, - arbitrary types of data with RFC 822 conformant mail messages. No - restrictions imposed by either RFC 821 or RFC 822 are violated, and - care has been taken to avoid problems caused by additional - restrictions imposed by the characteristics of some Internet mail - transport mechanisms (see RFC 2049). - - The next document in this set, RFC 2046, specifies the initial set of - media types that can be labelled and transported using these headers. - -11. Security Considerations - - Security issues are discussed in the second document in this set, RFC - 2046. - - - - - - - - -Freed & Borenstein Standards Track [Page 27] - -RFC 2045 Internet Message Bodies November 1996 - - -12. Authors' Addresses - - For more information, the authors of this document are best contacted - via Internet mail: - - Ned Freed - Innosoft International, Inc. - 1050 East Garvey Avenue South - West Covina, CA 91790 - USA - - Phone: +1 818 919 3600 - Fax: +1 818 919 3614 - EMail: ned@innosoft.com - - - Nathaniel S. Borenstein - First Virtual Holdings - 25 Washington Avenue - Morristown, NJ 07960 - USA - - Phone: +1 201 540 8967 - Fax: +1 201 993 3032 - EMail: nsb@nsb.fv.com - - - MIME is a result of the work of the Internet Engineering Task Force - Working Group on RFC 822 Extensions. The chairman of that group, - Greg Vaudreuil, may be reached at: - - Gregory M. Vaudreuil - Octel Network Services - 17080 Dallas Parkway - Dallas, TX 75248-1905 - USA - - EMail: Greg.Vaudreuil@Octel.Com - - - - - - - - - - - - - -Freed & Borenstein Standards Track [Page 28] - -RFC 2045 Internet Message Bodies November 1996 - - -Appendix A -- Collected Grammar - - This appendix contains the complete BNF grammar for all the syntax - specified by this document. - - By itself, however, this grammar is incomplete. It refers by name to - several syntax rules that are defined by RFC 822. Rather than - reproduce those definitions here, and risk unintentional differences - between the two, this document simply refers the reader to RFC 822 - for the remaining definitions. Wherever a term is undefined, it - refers to the RFC 822 definition. - - attribute := token - ; Matching of attributes - ; is ALWAYS case-insensitive. - - composite-type := "message" / "multipart" / extension-token - - content := "Content-Type" ":" type "/" subtype - *(";" parameter) - ; Matching of media type and subtype - ; is ALWAYS case-insensitive. - - description := "Content-Description" ":" *text - - discrete-type := "text" / "image" / "audio" / "video" / - "application" / extension-token - - encoding := "Content-Transfer-Encoding" ":" mechanism - - entity-headers := [ content CRLF ] - [ encoding CRLF ] - [ id CRLF ] - [ description CRLF ] - *( MIME-extension-field CRLF ) - - extension-token := ietf-token / x-token - - hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") - ; Octet must be used for characters > 127, =, - ; SPACEs or TABs at the ends of lines, and is - ; recommended for any character not listed in - ; RFC 2049 as "mail-safe". - - iana-token := <A publicly-defined extension token. Tokens - of this form must be registered with IANA - as specified in RFC 2048.> - - - - -Freed & Borenstein Standards Track [Page 29] - -RFC 2045 Internet Message Bodies November 1996 - - - ietf-token := <An extension token defined by a - standards-track RFC and registered - with IANA.> - - id := "Content-ID" ":" msg-id - - mechanism := "7bit" / "8bit" / "binary" / - "quoted-printable" / "base64" / - ietf-token / x-token - - MIME-extension-field := <Any RFC 822 header field which - begins with the string - "Content-"> - - MIME-message-headers := entity-headers - fields - version CRLF - ; The ordering of the header - ; fields implied by this BNF - ; definition should be ignored. - - MIME-part-headers := entity-headers - [fields] - ; Any field not beginning with - ; "content-" can have no defined - ; meaning and may be ignored. - ; The ordering of the header - ; fields implied by this BNF - ; definition should be ignored. - - parameter := attribute "=" value - - ptext := hex-octet / safe-char - - qp-line := *(qp-segment transport-padding CRLF) - qp-part transport-padding - - qp-part := qp-section - ; Maximum length of 76 characters - - qp-section := [*(ptext / SPACE / TAB) ptext] - - qp-segment := qp-section *(SPACE / TAB) "=" - ; Maximum length of 76 characters - - quoted-printable := qp-line *(CRLF qp-line) - - - - - -Freed & Borenstein Standards Track [Page 30] - -RFC 2045 Internet Message Bodies November 1996 - - - safe-char := <any octet with decimal value of 33 through - 60 inclusive, and 62 through 126> - ; Characters not listed as "mail-safe" in - ; RFC 2049 are also not recommended. - - subtype := extension-token / iana-token - - token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, - or tspecials> - - transport-padding := *LWSP-char - ; Composers MUST NOT generate - ; non-zero length transport - ; padding, but receivers MUST - ; be able to handle padding - ; added by message transports. - - tspecials := "(" / ")" / "<" / ">" / "@" / - "," / ";" / ":" / "\" / <"> - "/" / "[" / "]" / "?" / "=" - ; Must be in quoted-string, - ; to use within parameter values - - type := discrete-type / composite-type - - value := token / quoted-string - - version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT - - x-token := <The two characters "X-" or "x-" followed, with - no intervening white space, by any token> - - - - - - - - - - - - - - - - - - - - -Freed & Borenstein Standards Track [Page 31] - diff --git a/proto/rfc2046.txt b/proto/rfc2046.txt @@ -1,2467 +0,0 @@ - - - - - - -Network Working Group N. Freed -Request for Comments: 2046 Innosoft -Obsoletes: 1521, 1522, 1590 N. Borenstein -Category: Standards Track First Virtual - November 1996 - - - Multipurpose Internet Mail Extensions - (MIME) Part Two: - Media Types - -Status of this Memo - - This document specifies an Internet standards track protocol for the - Internet community, and requests discussion and suggestions for - improvements. Please refer to the current edition of the "Internet - Official Protocol Standards" (STD 1) for the standardization state - and status of this protocol. Distribution of this memo is unlimited. - -Abstract - - STD 11, RFC 822 defines a message representation protocol specifying - considerable detail about US-ASCII message headers, but which leaves - the message content, or message body, as flat US-ASCII text. This - set of documents, collectively called the Multipurpose Internet Mail - Extensions, or MIME, redefines the format of messages to allow for - - (1) textual message bodies in character sets other than - US-ASCII, - - (2) an extensible set of different formats for non-textual - message bodies, - - (3) multi-part message bodies, and - - (4) textual header information in character sets other than - US-ASCII. - - These documents are based on earlier work documented in RFC 934, STD - 11, and RFC 1049, but extends and revises them. Because RFC 822 said - so little about message bodies, these documents are largely - orthogonal to (rather than a revision of) RFC 822. - - The initial document in this set, RFC 2045, specifies the various - headers used to describe the structure of MIME messages. This second - document defines the general structure of the MIME media typing - system and defines an initial set of media types. The third document, - RFC 2047, describes extensions to RFC 822 to allow non-US-ASCII text - - - -Freed & Borenstein Standards Track [Page 1] - -RFC 2046 Media Types November 1996 - - - data in Internet mail header fields. The fourth document, RFC 2048, - specifies various IANA registration procedures for MIME-related - facilities. The fifth and final document, RFC 2049, describes MIME - conformance criteria as well as providing some illustrative examples - of MIME message formats, acknowledgements, and the bibliography. - - These documents are revisions of RFCs 1521 and 1522, which themselves - were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 - describes differences and changes from previous versions. - -Table of Contents - - 1. Introduction ......................................... 3 - 2. Definition of a Top-Level Media Type ................. 4 - 3. Overview Of The Initial Top-Level Media Types ........ 4 - 4. Discrete Media Type Values ........................... 6 - 4.1 Text Media Type ..................................... 6 - 4.1.1 Representation of Line Breaks ..................... 7 - 4.1.2 Charset Parameter ................................. 7 - 4.1.3 Plain Subtype ..................................... 11 - 4.1.4 Unrecognized Subtypes ............................. 11 - 4.2 Image Media Type .................................... 11 - 4.3 Audio Media Type .................................... 11 - 4.4 Video Media Type .................................... 12 - 4.5 Application Media Type .............................. 12 - 4.5.1 Octet-Stream Subtype .............................. 13 - 4.5.2 PostScript Subtype ................................ 14 - 4.5.3 Other Application Subtypes ........................ 17 - 5. Composite Media Type Values .......................... 17 - 5.1 Multipart Media Type ................................ 17 - 5.1.1 Common Syntax ..................................... 19 - 5.1.2 Handling Nested Messages and Multiparts ........... 24 - 5.1.3 Mixed Subtype ..................................... 24 - 5.1.4 Alternative Subtype ............................... 24 - 5.1.5 Digest Subtype .................................... 26 - 5.1.6 Parallel Subtype .................................. 27 - 5.1.7 Other Multipart Subtypes .......................... 28 - 5.2 Message Media Type .................................. 28 - 5.2.1 RFC822 Subtype .................................... 28 - 5.2.2 Partial Subtype ................................... 29 - 5.2.2.1 Message Fragmentation and Reassembly ............ 30 - 5.2.2.2 Fragmentation and Reassembly Example ............ 31 - 5.2.3 External-Body Subtype ............................. 33 - 5.2.4 Other Message Subtypes ............................ 40 - 6. Experimental Media Type Values ....................... 40 - 7. Summary .............................................. 41 - 8. Security Considerations .............................. 41 - 9. Authors' Addresses ................................... 42 - - - -Freed & Borenstein Standards Track [Page 2] - -RFC 2046 Media Types November 1996 - - - A. Collected Grammar .................................... 43 - -1. Introduction - - The first document in this set, RFC 2045, defines a number of header - fields, including Content-Type. The Content-Type field is used to - specify the nature of the data in the body of a MIME entity, by - giving media type and subtype identifiers, and by providing auxiliary - information that may be required for certain media types. After the - type and subtype names, the remainder of the header field is simply a - set of parameters, specified in an attribute/value notation. The - ordering of parameters is not significant. - - In general, the top-level media type is used to declare the general - type of data, while the subtype specifies a specific format for that - type of data. Thus, a media type of "image/xyz" is enough to tell a - user agent that the data is an image, even if the user agent has no - knowledge of the specific image format "xyz". Such information can - be used, for example, to decide whether or not to show a user the raw - data from an unrecognized subtype -- such an action might be - reasonable for unrecognized subtypes of "text", but not for - unrecognized subtypes of "image" or "audio". For this reason, - registered subtypes of "text", "image", "audio", and "video" should - not contain embedded information that is really of a different type. - Such compound formats should be represented using the "multipart" or - "application" types. - - Parameters are modifiers of the media subtype, and as such do not - fundamentally affect the nature of the content. The set of - meaningful parameters depends on the media type and subtype. Most - parameters are associated with a single specific subtype. However, a - given top-level media type may define parameters which are applicable - to any subtype of that type. Parameters may be required by their - defining media type or subtype or they may be optional. MIME - implementations must also ignore any parameters whose names they do - not recognize. - - MIME's Content-Type header field and media type mechanism has been - carefully designed to be extensible, and it is expected that the set - of media type/subtype pairs and their associated parameters will grow - significantly over time. Several other MIME facilities, such as - transfer encodings and "message/external-body" access types, are - likely to have new values defined over time. In order to ensure that - the set of such values is developed in an orderly, well-specified, - and public manner, MIME sets up a registration process which uses the - Internet Assigned Numbers Authority (IANA) as a central registry for - MIME's various areas of extensibility. The registration process for - these areas is described in a companion document, RFC 2048. - - - -Freed & Borenstein Standards Track [Page 3] - -RFC 2046 Media Types November 1996 - - - The initial seven standard top-level media type are defined and - described in the remainder of this document. - -2. Definition of a Top-Level Media Type - - The definition of a top-level media type consists of: - - (1) a name and a description of the type, including - criteria for whether a particular type would qualify - under that type, - - (2) the names and definitions of parameters, if any, which - are defined for all subtypes of that type (including - whether such parameters are required or optional), - - (3) how a user agent and/or gateway should handle unknown - subtypes of this type, - - (4) general considerations on gatewaying entities of this - top-level type, if any, and - - (5) any restrictions on content-transfer-encodings for - entities of this top-level type. - -3. Overview Of The Initial Top-Level Media Types - - The five discrete top-level media types are: - - (1) text -- textual information. The subtype "plain" in - particular indicates plain text containing no - formatting commands or directives of any sort. Plain - text is intended to be displayed "as-is". No special - software is required to get the full meaning of the - text, aside from support for the indicated character - set. Other subtypes are to be used for enriched text in - forms where application software may enhance the - appearance of the text, but such software must not be - required in order to get the general idea of the - content. Possible subtypes of "text" thus include any - word processor format that can be read without - resorting to software that understands the format. In - particular, formats that employ embeddded binary - formatting information are not considered directly - readable. A very simple and portable subtype, - "richtext", was defined in RFC 1341, with a further - revision in RFC 1896 under the name "enriched". - - - - - -Freed & Borenstein Standards Track [Page 4] - -RFC 2046 Media Types November 1996 - - - (2) image -- image data. "Image" requires a display device - (such as a graphical display, a graphics printer, or a - FAX machine) to view the information. An initial - subtype is defined for the widely-used image format - JPEG. . subtypes are defined for two widely-used image - formats, jpeg and gif. - - (3) audio -- audio data. "Audio" requires an audio output - device (such as a speaker or a telephone) to "display" - the contents. An initial subtype "basic" is defined in - this document. - - (4) video -- video data. "Video" requires the capability - to display moving images, typically including - specialized hardware and software. An initial subtype - "mpeg" is defined in this document. - - (5) application -- some other kind of data, typically - either uninterpreted binary data or information to be - processed by an application. The subtype "octet- - stream" is to be used in the case of uninterpreted - binary data, in which case the simplest recommended - action is to offer to write the information into a file - for the user. The "PostScript" subtype is also defined - for the transport of PostScript material. Other - expected uses for "application" include spreadsheets, - data for mail-based scheduling systems, and languages - for "active" (computational) messaging, and word - processing formats that are not directly readable. - Note that security considerations may exist for some - types of application data, most notably - "application/PostScript" and any form of active - messaging. These issues are discussed later in this - document. - - The two composite top-level media types are: - - (1) multipart -- data consisting of multiple entities of - independent data types. Four subtypes are initially - defined, including the basic "mixed" subtype specifying - a generic mixed set of parts, "alternative" for - representing the same data in multiple formats, - "parallel" for parts intended to be viewed - simultaneously, and "digest" for multipart entities in - which each part has a default type of "message/rfc822". - - - - - - -Freed & Borenstein Standards Track [Page 5] - -RFC 2046 Media Types November 1996 - - - (2) message -- an encapsulated message. A body of media - type "message" is itself all or a portion of some kind - of message object. Such objects may or may not in turn - contain other entities. The "rfc822" subtype is used - when the encapsulated content is itself an RFC 822 - message. The "partial" subtype is defined for partial - RFC 822 messages, to permit the fragmented transmission - of bodies that are thought to be too large to be passed - through transport facilities in one piece. Another - subtype, "external-body", is defined for specifying - large bodies by reference to an external data source. - - It should be noted that the list of media type values given here may - be augmented in time, via the mechanisms described above, and that - the set of subtypes is expected to grow substantially. - -4. Discrete Media Type Values - - Five of the seven initial media type values refer to discrete bodies. - The content of these types must be handled by non-MIME mechanisms; - they are opaque to MIME processors. - -4.1. Text Media Type - - The "text" media type is intended for sending material which is - principally textual in form. A "charset" parameter may be used to - indicate the character set of the body text for "text" subtypes, - notably including the subtype "text/plain", which is a generic - subtype for plain text. Plain text does not provide for or allow - formatting commands, font attribute specifications, processing - instructions, interpretation directives, or content markup. Plain - text is seen simply as a linear sequence of characters, possibly - interrupted by line breaks or page breaks. Plain text may allow the - stacking of several characters in the same position in the text. - Plain text in scripts like Arabic and Hebrew may also include - facilitites that allow the arbitrary mixing of text segments with - opposite writing directions. - - Beyond plain text, there are many formats for representing what might - be known as "rich text". An interesting characteristic of many such - representations is that they are to some extent readable even without - the software that interprets them. It is useful, then, to - distinguish them, at the highest level, from such unreadable data as - images, audio, or text represented in an unreadable form. In the - absence of appropriate interpretation software, it is reasonable to - show subtypes of "text" to the user, while it is not reasonable to do - so with most nontextual data. Such formatted textual data should be - represented using subtypes of "text". - - - -Freed & Borenstein Standards Track [Page 6] - -RFC 2046 Media Types November 1996 - - -4.1.1. Representation of Line Breaks - - The canonical form of any MIME "text" subtype MUST always represent a - line break as a CRLF sequence. Similarly, any occurrence of CRLF in - MIME "text" MUST represent a line break. Use of CR and LF outside of - line break sequences is also forbidden. - - This rule applies regardless of format or character set or sets - involved. - - NOTE: The proper interpretation of line breaks when a body is - displayed depends on the media type. In particular, while it is - appropriate to treat a line break as a transition to a new line when - displaying a "text/plain" body, this treatment is actually incorrect - for other subtypes of "text" like "text/enriched" [RFC-1896]. - Similarly, whether or not line breaks should be added during display - operations is also a function of the media type. It should not be - necessary to add any line breaks to display "text/plain" correctly, - whereas proper display of "text/enriched" requires the appropriate - addition of line breaks. - - NOTE: Some protocols defines a maximum line length. E.g. SMTP [RFC- - 821] allows a maximum of 998 octets before the next CRLF sequence. - To be transported by such protocols, data which includes too long - segments without CRLF sequences must be encoded with a suitable - content-transfer-encoding. - -4.1.2. Charset Parameter - - A critical parameter that may be specified in the Content-Type field - for "text/plain" data is the character set. This is specified with a - "charset" parameter, as in: - - Content-type: text/plain; charset=iso-8859-1 - - Unlike some other parameter values, the values of the charset - parameter are NOT case sensitive. The default character set, which - must be assumed in the absence of a charset parameter, is US-ASCII. - - The specification for any future subtypes of "text" must specify - whether or not they will also utilize a "charset" parameter, and may - possibly restrict its values as well. For other subtypes of "text" - than "text/plain", the semantics of the "charset" parameter should be - defined to be identical to those specified here for "text/plain", - i.e., the body consists entirely of characters in the given charset. - In particular, definers of future "text" subtypes should pay close - attention to the implications of multioctet character sets for their - subtype definitions. - - - -Freed & Borenstein Standards Track [Page 7] - -RFC 2046 Media Types November 1996 - - - The charset parameter for subtypes of "text" gives a name of a - character set, as "character set" is defined in RFC 2045. The rules - regarding line breaks detailed in the previous section must also be - observed -- a character set whose definition does not conform to - these rules cannot be used in a MIME "text" subtype. - - An initial list of predefined character set names can be found at the - end of this section. Additional character sets may be registered - with IANA. - - Other media types than subtypes of "text" might choose to employ the - charset parameter as defined here, but with the CRLF/line break - restriction removed. Therefore, all character sets that conform to - the general definition of "character set" in RFC 2045 can be - registered for MIME use. - - Note that if the specified character set includes 8-bit characters - and such characters are used in the body, a Content-Transfer-Encoding - header field and a corresponding encoding on the data are required in - order to transmit the body via some mail transfer protocols, such as - SMTP [RFC-821]. - - The default character set, US-ASCII, has been the subject of some - confusion and ambiguity in the past. Not only were there some - ambiguities in the definition, there have been wide variations in - practice. In order to eliminate such ambiguity and variations in the - future, it is strongly recommended that new user agents explicitly - specify a character set as a media type parameter in the Content-Type - header field. "US-ASCII" does not indicate an arbitrary 7-bit - character set, but specifies that all octets in the body must be - interpreted as characters according to the US-ASCII character set. - National and application-oriented versions of ISO 646 [ISO-646] are - usually NOT identical to US-ASCII, and in that case their use in - Internet mail is explicitly discouraged. The omission of the ISO 646 - character set from this document is deliberate in this regard. The - character set name of "US-ASCII" explicitly refers to the character - set defined in ANSI X3.4-1986 [US- ASCII]. The new international - reference version (IRV) of the 1991 edition of ISO 646 is identical - to US-ASCII. The character set name "ASCII" is reserved and must not - be used for any purpose. - - NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier - version of the American Standard. Insofar as one of the purposes of - specifying a media type and character set is to permit the receiver - to unambiguously determine how the sender intended the coded message - to be interpreted, assuming anything other than "strict ASCII" as the - default would risk unintentional and incompatible changes to the - semantics of messages now being transmitted. This also implies that - - - -Freed & Borenstein Standards Track [Page 8] - -RFC 2046 Media Types November 1996 - - - messages containing characters coded according to other versions of - ISO 646 than US-ASCII and the 1991 IRV, or using code-switching - procedures (e.g., those of ISO 2022), as well as 8bit or multiple - octet character encodings MUST use an appropriate character set - specification to be consistent with MIME. - - The complete US-ASCII character set is listed in ANSI X3.4- 1986. - Note that the control characters including DEL (0-31, 127) have no - defined meaning in apart from the combination CRLF (US-ASCII values - 13 and 10) indicating a new line. Two of the characters have de - facto meanings in wide use: FF (12) often means "start subsequent - text on the beginning of a new page"; and TAB or HT (9) often (though - not always) means "move the cursor to the next available column after - the current position where the column number is a multiple of 8 - (counting the first column as column 0)." Aside from these - conventions, any use of the control characters or DEL in a body must - either occur - - (1) because a subtype of text other than "plain" - specifically assigns some additional meaning, or - - (2) within the context of a private agreement between the - sender and recipient. Such private agreements are - discouraged and should be replaced by the other - capabilities of this document. - - NOTE: An enormous proliferation of character sets exist beyond US- - ASCII. A large number of partially or totally overlapping character - sets is NOT a good thing. A SINGLE character set that can be used - universally for representing all of the world's languages in Internet - mail would be preferrable. Unfortunately, existing practice in - several communities seems to point to the continued use of multiple - character sets in the near future. A small number of standard - character sets are, therefore, defined for Internet use in this - document. - - The defined charset values are: - - (1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. - - (2) ISO-8859-X -- where "X" is to be replaced, as - necessary, for the parts of ISO-8859 [ISO-8859]. Note - that the ISO 646 character sets have deliberately been - omitted in favor of their 8859 replacements, which are - the designated character sets for Internet mail. As of - the publication of this document, the legitimate values - for "X" are the digits 1 through 10. - - - - -Freed & Borenstein Standards Track [Page 9] - -RFC 2046 Media Types November 1996 - - - Characters in the range 128-159 has no assigned meaning in ISO-8859- - X. Characters with values below 128 in ISO-8859-X have the same - assigned meaning as they do in US-ASCII. - - Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew - alphabet) includes both characters for which the normal writing - direction is right to left and characters for which it is left to - right, but do not define a canonical ordering method for representing - bi-directional text. The charset values "ISO-8859-6" and "ISO-8859- - 8", however, specify that the visual method is used [RFC-1556]. - - All of these character sets are used as pure 7bit or 8bit sets - without any shift or escape functions. The meaning of shift and - escape sequences in these character sets is not defined. - - The character sets specified above are the ones that were relatively - uncontroversial during the drafting of MIME. This document does not - endorse the use of any particular character set other than US-ASCII, - and recognizes that the future evolution of world character sets - remains unclear. - - Note that the character set used, if anything other than US- ASCII, - must always be explicitly specified in the Content-Type field. - - No character set name other than those defined above may be used in - Internet mail without the publication of a formal specification and - its registration with IANA, or by private agreement, in which case - the character set name must begin with "X-". - - Implementors are discouraged from defining new character sets unless - absolutely necessary. - - The "charset" parameter has been defined primarily for the purpose of - textual data, and is described in this section for that reason. - However, it is conceivable that non-textual data might also wish to - specify a charset value for some purpose, in which case the same - syntax and values should be used. - - In general, composition software should always use the "lowest common - denominator" character set possible. For example, if a body contains - only US-ASCII characters, it SHOULD be marked as being in the US- - ASCII character set, not ISO-8859-1, which, like all the ISO-8859 - family of character sets, is a superset of US-ASCII. More generally, - if a widely-used character set is a subset of another character set, - and a body contains only characters in the widely-used subset, it - should be labelled as being in that subset. This will increase the - chances that the recipient will be able to view the resulting entity - correctly. - - - -Freed & Borenstein Standards Track [Page 10] - -RFC 2046 Media Types November 1996 - - -4.1.3. Plain Subtype - - The simplest and most important subtype of "text" is "plain". This - indicates plain text that does not contain any formatting commands or - directives. Plain text is intended to be displayed "as-is", that is, - no interpretation of embedded formatting commands, font attribute - specifications, processing instructions, interpretation directives, - or content markup should be necessary for proper display. The - default media type of "text/plain; charset=us-ascii" for Internet - mail describes existing Internet practice. That is, it is the type - of body defined by RFC 822. - - No other "text" subtype is defined by this document. - -4.1.4. Unrecognized Subtypes - - Unrecognized subtypes of "text" should be treated as subtype "plain" - as long as the MIME implementation knows how to handle the charset. - Unrecognized subtypes which also specify an unrecognized charset - should be treated as "application/octet- stream". - -4.2. Image Media Type - - A media type of "image" indicates that the body contains an image. - The subtype names the specific image format. These names are not - case sensitive. An initial subtype is "jpeg" for the JPEG format - using JFIF encoding [JPEG]. - - The list of "image" subtypes given here is neither exclusive nor - exhaustive, and is expected to grow as more types are registered with - IANA, as described in RFC 2048. - - Unrecognized subtypes of "image" should at a miniumum be treated as - "application/octet-stream". Implementations may optionally elect to - pass subtypes of "image" that they do not specifically recognize to a - secure and robust general-purpose image viewing application, if such - an application is available. - - NOTE: Using of a generic-purpose image viewing application this way - inherits the security problems of the most dangerous type supported - by the application. - -4.3. Audio Media Type - - A media type of "audio" indicates that the body contains audio data. - Although there is not yet a consensus on an "ideal" audio format for - use with computers, there is a pressing need for a format capable of - providing interoperable behavior. - - - -Freed & Borenstein Standards Track [Page 11] - -RFC 2046 Media Types November 1996 - - - The initial subtype of "basic" is specified to meet this requirement - by providing an absolutely minimal lowest common denominator audio - format. It is expected that richer formats for higher quality and/or - lower bandwidth audio will be defined by a later document. - - The content of the "audio/basic" subtype is single channel audio - encoded using 8bit ISDN mu-law [PCM] at a sample rate of 8000 Hz. - - Unrecognized subtypes of "audio" should at a miniumum be treated as - "application/octet-stream". Implementations may optionally elect to - pass subtypes of "audio" that they do not specifically recognize to a - robust general-purpose audio playing application, if such an - application is available. - -4.4. Video Media Type - - A media type of "video" indicates that the body contains a time- - varying-picture image, possibly with color and coordinated sound. - The term 'video' is used in its most generic sense, rather than with - reference to any particular technology or format, and is not meant to - preclude subtypes such as animated drawings encoded compactly. The - subtype "mpeg" refers to video coded according to the MPEG standard - [MPEG]. - - Note that although in general this document strongly discourages the - mixing of multiple media in a single body, it is recognized that many - so-called video formats include a representation for synchronized - audio, and this is explicitly permitted for subtypes of "video". - - Unrecognized subtypes of "video" should at a minumum be treated as - "application/octet-stream". Implementations may optionally elect to - pass subtypes of "video" that they do not specifically recognize to a - robust general-purpose video display application, if such an - application is available. - -4.5. Application Media Type - - The "application" media type is to be used for discrete data which do - not fit in any of the other categories, and particularly for data to - be processed by some type of application program. This is - information which must be processed by an application before it is - viewable or usable by a user. Expected uses for the "application" - media type include file transfer, spreadsheets, data for mail-based - scheduling systems, and languages for "active" (computational) - material. (The latter, in particular, can pose security problems - which must be understood by implementors, and are considered in - detail in the discussion of the "application/PostScript" media type.) - - - - -Freed & Borenstein Standards Track [Page 12] - -RFC 2046 Media Types November 1996 - - - For example, a meeting scheduler might define a standard - representation for information about proposed meeting dates. An - intelligent user agent would use this information to conduct a dialog - with the user, and might then send additional material based on that - dialog. More generally, there have been several "active" messaging - languages developed in which programs in a suitably specialized - language are transported to a remote location and automatically run - in the recipient's environment. - - Such applications may be defined as subtypes of the "application" - media type. This document defines two subtypes: - - octet-stream, and PostScript. - - The subtype of "application" will often be either the name or include - part of the name of the application for which the data are intended. - This does not mean, however, that any application program name may be - used freely as a subtype of "application". - -4.5.1. Octet-Stream Subtype - - The "octet-stream" subtype is used to indicate that a body contains - arbitrary binary data. The set of currently defined parameters is: - - (1) TYPE -- the general type or category of binary data. - This is intended as information for the human recipient - rather than for any automatic processing. - - (2) PADDING -- the number of bits of padding that were - appended to the bit-stream comprising the actual - contents to produce the enclosed 8bit byte-oriented - data. This is useful for enclosing a bit-stream in a - body when the total number of bits is not a multiple of - 8. - - Both of these parameters are optional. - - An additional parameter, "CONVERSIONS", was defined in RFC 1341 but - has since been removed. RFC 1341 also defined the use of a "NAME" - parameter which gave a suggested file name to be used if the data - were to be written to a file. This has been deprecated in - anticipation of a separate Content-Disposition header field, to be - defined in a subsequent RFC. - - The recommended action for an implementation that receives an - "application/octet-stream" entity is to simply offer to put the data - in a file, with any Content-Transfer-Encoding undone, or perhaps to - use it as input to a user-specified process. - - - -Freed & Borenstein Standards Track [Page 13] - -RFC 2046 Media Types November 1996 - - - To reduce the danger of transmitting rogue programs, it is strongly - recommended that implementations NOT implement a path-search - mechanism whereby an arbitrary program named in the Content-Type - parameter (e.g., an "interpreter=" parameter) is found and executed - using the message body as input. - -4.5.2. PostScript Subtype - - A media type of "application/postscript" indicates a PostScript - program. Currently two variants of the PostScript language are - allowed; the original level 1 variant is described in [POSTSCRIPT] - and the more recent level 2 variant is described in [POSTSCRIPT2]. - - PostScript is a registered trademark of Adobe Systems, Inc. Use of - the MIME media type "application/postscript" implies recognition of - that trademark and all the rights it entails. - - The PostScript language definition provides facilities for internal - labelling of the specific language features a given program uses. - This labelling, called the PostScript document structuring - conventions, or DSC, is very general and provides substantially more - information than just the language level. The use of document - structuring conventions, while not required, is strongly recommended - as an aid to interoperability. Documents which lack proper - structuring conventions cannot be tested to see whether or not they - will work in a given environment. As such, some systems may assume - the worst and refuse to process unstructured documents. - - The execution of general-purpose PostScript interpreters entails - serious security risks, and implementors are discouraged from simply - sending PostScript bodies to "off- the-shelf" interpreters. While it - is usually safe to send PostScript to a printer, where the potential - for harm is greatly constrained by typical printer environments, - implementors should consider all of the following before they add - interactive display of PostScript bodies to their MIME readers. - - The remainder of this section outlines some, though probably not all, - of the possible problems with the transport of PostScript entities. - - (1) Dangerous operations in the PostScript language - include, but may not be limited to, the PostScript - operators "deletefile", "renamefile", "filenameforall", - and "file". "File" is only dangerous when applied to - something other than standard input or output. - Implementations may also define additional nonstandard - file operators; these may also pose a threat to - security. "Filenameforall", the wildcard file search - operator, may appear at first glance to be harmless. - - - -Freed & Borenstein Standards Track [Page 14] - -RFC 2046 Media Types November 1996 - - - Note, however, that this operator has the potential to - reveal information about what files the recipient has - access to, and this information may itself be - sensitive. Message senders should avoid the use of - potentially dangerous file operators, since these - operators are quite likely to be unavailable in secure - PostScript implementations. Message receiving and - displaying software should either completely disable - all potentially dangerous file operators or take - special care not to delegate any special authority to - their operation. These operators should be viewed as - being done by an outside agency when interpreting - PostScript documents. Such disabling and/or checking - should be done completely outside of the reach of the - PostScript language itself; care should be taken to - insure that no method exists for re-enabling full- - function versions of these operators. - - (2) The PostScript language provides facilities for exiting - the normal interpreter, or server, loop. Changes made - in this "outer" environment are customarily retained - across documents, and may in some cases be retained - semipermanently in nonvolatile memory. The operators - associated with exiting the interpreter loop have the - potential to interfere with subsequent document - processing. As such, their unrestrained use - constitutes a threat of service denial. PostScript - operators that exit the interpreter loop include, but - may not be limited to, the exitserver and startjob - operators. Message sending software should not - generate PostScript that depends on exiting the - interpreter loop to operate, since the ability to exit - will probably be unavailable in secure PostScript - implementations. Message receiving and displaying - software should completely disable the ability to make - retained changes to the PostScript environment by - eliminating or disabling the "startjob" and - "exitserver" operations. If these operations cannot be - eliminated or completely disabled the password - associated with them should at least be set to a hard- - to-guess value. - - (3) PostScript provides operators for setting system-wide - and device-specific parameters. These parameter - settings may be retained across jobs and may - potentially pose a threat to the correct operation of - the interpreter. The PostScript operators that set - system and device parameters include, but may not be - - - -Freed & Borenstein Standards Track [Page 15] - -RFC 2046 Media Types November 1996 - - - limited to, the "setsystemparams" and "setdevparams" - operators. Message sending software should not - generate PostScript that depends on the setting of - system or device parameters to operate correctly. The - ability to set these parameters will probably be - unavailable in secure PostScript implementations. - Message receiving and displaying software should - disable the ability to change system and device - parameters. If these operators cannot be completely - disabled the password associated with them should at - least be set to a hard-to-guess value. - - (4) Some PostScript implementations provide nonstandard - facilities for the direct loading and execution of - machine code. Such facilities are quite obviously open - to substantial abuse. Message sending software should - not make use of such features. Besides being totally - hardware-specific, they are also likely to be - unavailable in secure implementations of PostScript. - Message receiving and displaying software should not - allow such operators to be used if they exist. - - (5) PostScript is an extensible language, and many, if not - most, implementations of it provide a number of their - own extensions. This document does not deal with such - extensions explicitly since they constitute an unknown - factor. Message sending software should not make use - of nonstandard extensions; they are likely to be - missing from some implementations. Message receiving - and displaying software should make sure that any - nonstandard PostScript operators are secure and don't - present any kind of threat. - - (6) It is possible to write PostScript that consumes huge - amounts of various system resources. It is also - possible to write PostScript programs that loop - indefinitely. Both types of programs have the - potential to cause damage if sent to unsuspecting - recipients. Message-sending software should avoid the - construction and dissemination of such programs, which - is antisocial. Message receiving and displaying - software should provide appropriate mechanisms to abort - processing after a reasonable amount of time has - elapsed. In addition, PostScript interpreters should be - limited to the consumption of only a reasonable amount - of any given system resource. - - - - - -Freed & Borenstein Standards Track [Page 16] - -RFC 2046 Media Types November 1996 - - - (7) It is possible to include raw binary information inside - PostScript in various forms. This is not recommended - for use in Internet mail, both because it is not - supported by all PostScript interpreters and because it - significantly complicates the use of a MIME Content- - Transfer-Encoding. (Without such binary, PostScript - may typically be viewed as line-oriented data. The - treatment of CRLF sequences becomes extremely - problematic if binary and line-oriented data are mixed - in a single Postscript data stream.) - - (8) Finally, bugs may exist in some PostScript interpreters - which could possibly be exploited to gain unauthorized - access to a recipient's system. Apart from noting this - possibility, there is no specific action to take to - prevent this, apart from the timely correction of such - bugs if any are found. - -4.5.3. Other Application Subtypes - - It is expected that many other subtypes of "application" will be - defined in the future. MIME implementations must at a minimum treat - any unrecognized subtypes as being equivalent to "application/octet- - stream". - -5. Composite Media Type Values - - The remaining two of the seven initial Content-Type values refer to - composite entities. Composite entities are handled using MIME - mechanisms -- a MIME processor typically handles the body directly. - -5.1. Multipart Media Type - - In the case of multipart entities, in which one or more different - sets of data are combined in a single body, a "multipart" media type - field must appear in the entity's header. The body must then contain - one or more body parts, each preceded by a boundary delimiter line, - and the last one followed by a closing boundary delimiter line. - After its boundary delimiter line, each body part then consists of a - header area, a blank line, and a body area. Thus a body part is - similar to an RFC 822 message in syntax, but different in meaning. - - A body part is an entity and hence is NOT to be interpreted as - actually being an RFC 822 message. To begin with, NO header fields - are actually required in body parts. A body part that starts with a - blank line, therefore, is allowed and is a body part for which all - default values are to be assumed. In such a case, the absence of a - Content-Type header usually indicates that the corresponding body has - - - -Freed & Borenstein Standards Track [Page 17] - -RFC 2046 Media Types November 1996 - - - a content-type of "text/plain; charset=US-ASCII". - - The only header fields that have defined meaning for body parts are - those the names of which begin with "Content-". All other header - fields may be ignored in body parts. Although they should generally - be retained if at all possible, they may be discarded by gateways if - necessary. Such other fields are permitted to appear in body parts - but must not be depended on. "X-" fields may be created for - experimental or private purposes, with the recognition that the - information they contain may be lost at some gateways. - - NOTE: The distinction between an RFC 822 message and a body part is - subtle, but important. A gateway between Internet and X.400 mail, - for example, must be able to tell the difference between a body part - that contains an image and a body part that contains an encapsulated - message, the body of which is a JPEG image. In order to represent - the latter, the body part must have "Content-Type: message/rfc822", - and its body (after the blank line) must be the encapsulated message, - with its own "Content-Type: image/jpeg" header field. The use of - similar syntax facilitates the conversion of messages to body parts, - and vice versa, but the distinction between the two must be - understood by implementors. (For the special case in which parts - actually are messages, a "digest" subtype is also defined.) - - As stated previously, each body part is preceded by a boundary - delimiter line that contains the boundary delimiter. The boundary - delimiter MUST NOT appear inside any of the encapsulated parts, on a - line by itself or as the prefix of any line. This implies that it is - crucial that the composing agent be able to choose and specify a - unique boundary parameter value that does not contain the boundary - parameter value of an enclosing multipart as a prefix. - - All present and future subtypes of the "multipart" type must use an - identical syntax. Subtypes may differ in their semantics, and may - impose additional restrictions on syntax, but must conform to the - required syntax for the "multipart" type. This requirement ensures - that all conformant user agents will at least be able to recognize - and separate the parts of any multipart entity, even those of an - unrecognized subtype. - - As stated in the definition of the Content-Transfer-Encoding field - [RFC 2045], no encoding other than "7bit", "8bit", or "binary" is - permitted for entities of type "multipart". The "multipart" boundary - delimiters and header fields are always represented as 7bit US-ASCII - in any case (though the header fields may encode non-US-ASCII header - text as per RFC 2047) and data within the body parts can be encoded - on a part-by-part basis, with Content-Transfer-Encoding fields for - each appropriate body part. - - - -Freed & Borenstein Standards Track [Page 18] - -RFC 2046 Media Types November 1996 - - -5.1.1. Common Syntax - - This section defines a common syntax for subtypes of "multipart". - All subtypes of "multipart" must use this syntax. A simple example - of a multipart message also appears in this section. An example of a - more complex multipart message is given in RFC 2049. - - The Content-Type field for multipart entities requires one parameter, - "boundary". The boundary delimiter line is then defined as a line - consisting entirely of two hyphen characters ("-", decimal value 45) - followed by the boundary parameter value from the Content-Type header - field, optional linear whitespace, and a terminating CRLF. - - NOTE: The hyphens are for rough compatibility with the earlier RFC - 934 method of message encapsulation, and for ease of searching for - the boundaries in some implementations. However, it should be noted - that multipart messages are NOT completely compatible with RFC 934 - encapsulations; in particular, they do not obey RFC 934 quoting - conventions for embedded lines that begin with hyphens. This - mechanism was chosen over the RFC 934 mechanism because the latter - causes lines to grow with each level of quoting. The combination of - this growth with the fact that SMTP implementations sometimes wrap - long lines made the RFC 934 mechanism unsuitable for use in the event - that deeply-nested multipart structuring is ever desired. - - WARNING TO IMPLEMENTORS: The grammar for parameters on the Content- - type field is such that it is often necessary to enclose the boundary - parameter values in quotes on the Content-type line. This is not - always necessary, but never hurts. Implementors should be sure to - study the grammar carefully in order to avoid producing invalid - Content-type fields. Thus, a typical "multipart" Content-Type header - field might look like this: - - Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p - - But the following is not valid: - - Content-Type: multipart/mixed; boundary=gc0pJq0M:08jU534c0p - - (because of the colon) and must instead be represented as - - Content-Type: multipart/mixed; boundary="gc0pJq0M:08jU534c0p" - - This Content-Type value indicates that the content consists of one or - more parts, each with a structure that is syntactically identical to - an RFC 822 message, except that the header area is allowed to be - completely empty, and that the parts are each preceded by the line - - - - -Freed & Borenstein Standards Track [Page 19] - -RFC 2046 Media Types November 1996 - - - --gc0pJq0M:08jU534c0p - - The boundary delimiter MUST occur at the beginning of a line, i.e., - following a CRLF, and the initial CRLF is considered to be attached - to the boundary delimiter line rather than part of the preceding - part. The boundary may be followed by zero or more characters of - linear whitespace. It is then terminated by either another CRLF and - the header fields for the next part, or by two CRLFs, in which case - there are no header fields for the next part. If no Content-Type - field is present it is assumed to be "message/rfc822" in a - "multipart/digest" and "text/plain" otherwise. - - NOTE: The CRLF preceding the boundary delimiter line is conceptually - attached to the boundary so that it is possible to have a part that - does not end with a CRLF (line break). Body parts that must be - considered to end with line breaks, therefore, must have two CRLFs - preceding the boundary delimiter line, the first of which is part of - the preceding body part, and the second of which is part of the - encapsulation boundary. - - Boundary delimiters must not appear within the encapsulated material, - and must be no longer than 70 characters, not counting the two - leading hyphens. - - The boundary delimiter line following the last body part is a - distinguished delimiter that indicates that no further body parts - will follow. Such a delimiter line is identical to the previous - delimiter lines, with the addition of two more hyphens after the - boundary parameter value. - - --gc0pJq0M:08jU534c0p-- - - NOTE TO IMPLEMENTORS: Boundary string comparisons must compare the - boundary value with the beginning of each candidate line. An exact - match of the entire candidate line is not required; it is sufficient - that the boundary appear in its entirety following the CRLF. - - There appears to be room for additional information prior to the - first boundary delimiter line and following the final boundary - delimiter line. These areas should generally be left blank, and - implementations must ignore anything that appears before the first - boundary delimiter line or after the last one. - - NOTE: These "preamble" and "epilogue" areas are generally not used - because of the lack of proper typing of these parts and the lack of - clear semantics for handling these areas at gateways, particularly - X.400 gateways. However, rather than leaving the preamble area - blank, many MIME implementations have found this to be a convenient - - - -Freed & Borenstein Standards Track [Page 20] - -RFC 2046 Media Types November 1996 - - - place to insert an explanatory note for recipients who read the - message with pre-MIME software, since such notes will be ignored by - MIME-compliant software. - - NOTE: Because boundary delimiters must not appear in the body parts - being encapsulated, a user agent must exercise care to choose a - unique boundary parameter value. The boundary parameter value in the - example above could have been the result of an algorithm designed to - produce boundary delimiters with a very low probability of already - existing in the data to be encapsulated without having to prescan the - data. Alternate algorithms might result in more "readable" boundary - delimiters for a recipient with an old user agent, but would require - more attention to the possibility that the boundary delimiter might - appear at the beginning of some line in the encapsulated part. The - simplest boundary delimiter line possible is something like "---", - with a closing boundary delimiter line of "-----". - - As a very simple example, the following multipart message has two - parts, both of them plain text, one of them explicitly typed and one - of them implicitly typed: - - From: Nathaniel Borenstein <nsb@bellcore.com> - To: Ned Freed <ned@innosoft.com> - Date: Sun, 21 Mar 1993 23:56:48 -0800 (PST) - Subject: Sample message - MIME-Version: 1.0 - Content-type: multipart/mixed; boundary="simple boundary" - - This is the preamble. It is to be ignored, though it - is a handy place for composition agents to include an - explanatory note to non-MIME conformant readers. - - --simple boundary - - This is implicitly typed plain US-ASCII text. - It does NOT end with a linebreak. - --simple boundary - Content-type: text/plain; charset=us-ascii - - This is explicitly typed plain US-ASCII text. - It DOES end with a linebreak. - - --simple boundary-- - - This is the epilogue. It is also to be ignored. - - - - - - -Freed & Borenstein Standards Track [Page 21] - -RFC 2046 Media Types November 1996 - - - The use of a media type of "multipart" in a body part within another - "multipart" entity is explicitly allowed. In such cases, for obvious - reasons, care must be taken to ensure that each nested "multipart" - entity uses a different boundary delimiter. See RFC 2049 for an - example of nested "multipart" entities. - - The use of the "multipart" media type with only a single body part - may be useful in certain contexts, and is explicitly permitted. - - NOTE: Experience has shown that a "multipart" media type with a - single body part is useful for sending non-text media types. It has - the advantage of providing the preamble as a place to include - decoding instructions. In addition, a number of SMTP gateways move - or remove the MIME headers, and a clever MIME decoder can take a good - guess at multipart boundaries even in the absence of the Content-Type - header and thereby successfully decode the message. - - The only mandatory global parameter for the "multipart" media type is - the boundary parameter, which consists of 1 to 70 characters from a - set of characters known to be very robust through mail gateways, and - NOT ending with white space. (If a boundary delimiter line appears to - end with white space, the white space must be presumed to have been - added by a gateway, and must be deleted.) It is formally specified - by the following BNF: - - boundary := 0*69<bchars> bcharsnospace - - bchars := bcharsnospace / " " - - bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / - "+" / "_" / "," / "-" / "." / - "/" / ":" / "=" / "?" - - Overall, the body of a "multipart" entity may be specified as - follows: - - dash-boundary := "--" boundary - ; boundary taken from the value of - ; boundary parameter of the - ; Content-Type field. - - multipart-body := [preamble CRLF] - dash-boundary transport-padding CRLF - body-part *encapsulation - close-delimiter transport-padding - [CRLF epilogue] - - - - - -Freed & Borenstein Standards Track [Page 22] - -RFC 2046 Media Types November 1996 - - - transport-padding := *LWSP-char - ; Composers MUST NOT generate - ; non-zero length transport - ; padding, but receivers MUST - ; be able to handle padding - ; added by message transports. - - encapsulation := delimiter transport-padding - CRLF body-part - - delimiter := CRLF dash-boundary - - close-delimiter := delimiter "--" - - preamble := discard-text - - epilogue := discard-text - - discard-text := *(*text CRLF) *text - ; May be ignored or discarded. - - body-part := MIME-part-headers [CRLF *OCTET] - ; Lines in a body-part must not start - ; with the specified dash-boundary and - ; the delimiter must not appear anywhere - ; in the body part. Note that the - ; semantics of a body-part differ from - ; the semantics of a message, as - ; described in the text. - - OCTET := <any 0-255 octet value> - - IMPORTANT: The free insertion of linear-white-space and RFC 822 - comments between the elements shown in this BNF is NOT allowed since - this BNF does not specify a structured header field. - - NOTE: In certain transport enclaves, RFC 822 restrictions such as - the one that limits bodies to printable US-ASCII characters may not - be in force. (That is, the transport domains may exist that resemble - standard Internet mail transport as specified in RFC 821 and assumed - by RFC 822, but without certain restrictions.) The relaxation of - these restrictions should be construed as locally extending the - definition of bodies, for example to include octets outside of the - US-ASCII range, as long as these extensions are supported by the - transport and adequately documented in the Content- Transfer-Encoding - header field. However, in no event are headers (either message - headers or body part headers) allowed to contain anything other than - US-ASCII characters. - - - -Freed & Borenstein Standards Track [Page 23] - -RFC 2046 Media Types November 1996 - - - NOTE: Conspicuously missing from the "multipart" type is a notion of - structured, related body parts. It is recommended that those wishing - to provide more structured or integrated multipart messaging - facilities should define subtypes of multipart that are syntactically - identical but define relationships between the various parts. For - example, subtypes of multipart could be defined that include a - distinguished part which in turn is used to specify the relationships - between the other parts, probably referring to them by their - Content-ID field. Old implementations will not recognize the new - subtype if this approach is used, but will treat it as - multipart/mixed and will thus be able to show the user the parts that - are recognized. - -5.1.2. Handling Nested Messages and Multiparts - - The "message/rfc822" subtype defined in a subsequent section of this - document has no terminating condition other than running out of data. - Similarly, an improperly truncated "multipart" entity may not have - any terminating boundary marker, and can turn up operationally due to - mail system malfunctions. - - It is essential that such entities be handled correctly when they are - themselves imbedded inside of another "multipart" structure. MIME - implementations are therefore required to recognize outer level - boundary markers at ANY level of inner nesting. It is not sufficient - to only check for the next expected marker or other terminating - condition. - -5.1.3. Mixed Subtype - - The "mixed" subtype of "multipart" is intended for use when the body - parts are independent and need to be bundled in a particular order. - Any "multipart" subtypes that an implementation does not recognize - must be treated as being of subtype "mixed". - -5.1.4. Alternative Subtype - - The "multipart/alternative" type is syntactically identical to - "multipart/mixed", but the semantics are different. In particular, - each of the body parts is an "alternative" version of the same - information. - - Systems should recognize that the content of the various parts are - interchangeable. Systems should choose the "best" type based on the - local environment and references, in some cases even through user - interaction. As with "multipart/mixed", the order of body parts is - significant. In this case, the alternatives appear in an order of - increasing faithfulness to the original content. In general, the - - - -Freed & Borenstein Standards Track [Page 24] - -RFC 2046 Media Types November 1996 - - - best choice is the LAST part of a type supported by the recipient - system's local environment. - - "Multipart/alternative" may be used, for example, to send a message - in a fancy text format in such a way that it can easily be displayed - anywhere: - - From: Nathaniel Borenstein <nsb@bellcore.com> - To: Ned Freed <ned@innosoft.com> - Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST) - Subject: Formatted text mail - MIME-Version: 1.0 - Content-Type: multipart/alternative; boundary=boundary42 - - --boundary42 - Content-Type: text/plain; charset=us-ascii - - ... plain text version of message goes here ... - - --boundary42 - Content-Type: text/enriched - - ... RFC 1896 text/enriched version of same message - goes here ... - - --boundary42 - Content-Type: application/x-whatever - - ... fanciest version of same message goes here ... - - --boundary42-- - - In this example, users whose mail systems understood the - "application/x-whatever" format would see only the fancy version, - while other users would see only the enriched or plain text version, - depending on the capabilities of their system. - - In general, user agents that compose "multipart/alternative" entities - must place the body parts in increasing order of preference, that is, - with the preferred format last. For fancy text, the sending user - agent should put the plainest format first and the richest format - last. Receiving user agents should pick and display the last format - they are capable of displaying. In the case where one of the - alternatives is itself of type "multipart" and contains unrecognized - sub-parts, the user agent may choose either to show that alternative, - an earlier alternative, or both. - - - - - -Freed & Borenstein Standards Track [Page 25] - -RFC 2046 Media Types November 1996 - - - NOTE: From an implementor's perspective, it might seem more sensible - to reverse this ordering, and have the plainest alternative last. - However, placing the plainest alternative first is the friendliest - possible option when "multipart/alternative" entities are viewed - using a non-MIME-conformant viewer. While this approach does impose - some burden on conformant MIME viewers, interoperability with older - mail readers was deemed to be more important in this case. - - It may be the case that some user agents, if they can recognize more - than one of the formats, will prefer to offer the user the choice of - which format to view. This makes sense, for example, if a message - includes both a nicely- formatted image version and an easily-edited - text version. What is most critical, however, is that the user not - automatically be shown multiple versions of the same data. Either - the user should be shown the last recognized version or should be - given the choice. - - THE SEMANTICS OF CONTENT-ID IN MULTIPART/ALTERNATIVE: Each part of a - "multipart/alternative" entity represents the same data, but the - mappings between the two are not necessarily without information - loss. For example, information is lost when translating ODA to - PostScript or plain text. It is recommended that each part should - have a different Content-ID value in the case where the information - content of the two parts is not identical. And when the information - content is identical -- for example, where several parts of type - "message/external-body" specify alternate ways to access the - identical data -- the same Content-ID field value should be used, to - optimize any caching mechanisms that might be present on the - recipient's end. However, the Content-ID values used by the parts - should NOT be the same Content-ID value that describes the - "multipart/alternative" as a whole, if there is any such Content-ID - field. That is, one Content-ID value will refer to the - "multipart/alternative" entity, while one or more other Content-ID - values will refer to the parts inside it. - -5.1.5. Digest Subtype - - This document defines a "digest" subtype of the "multipart" Content- - Type. This type is syntactically identical to "multipart/mixed", but - the semantics are different. In particular, in a digest, the default - Content-Type value for a body part is changed from "text/plain" to - "message/rfc822". This is done to allow a more readable digest - format that is largely compatible (except for the quoting convention) - with RFC 934. - - Note: Though it is possible to specify a Content-Type value for a - body part in a digest which is other than "message/rfc822", such as a - "text/plain" part containing a description of the material in the - - - -Freed & Borenstein Standards Track [Page 26] - -RFC 2046 Media Types November 1996 - - - digest, actually doing so is undesireble. The "multipart/digest" - Content-Type is intended to be used to send collections of messages. - If a "text/plain" part is needed, it should be included as a seperate - part of a "multipart/mixed" message. - - A digest in this format might, then, look something like this: - - From: Moderator-Address - To: Recipient-List - Date: Mon, 22 Mar 1994 13:34:51 +0000 - Subject: Internet Digest, volume 42 - MIME-Version: 1.0 - Content-Type: multipart/mixed; - boundary="---- main boundary ----" - - ------ main boundary ---- - - ...Introductory text or table of contents... - - ------ main boundary ---- - Content-Type: multipart/digest; - boundary="---- next message ----" - - ------ next message ---- - - From: someone-else - Date: Fri, 26 Mar 1993 11:13:32 +0200 - Subject: my opinion - - ...body goes here ... - - ------ next message ---- - - From: someone-else-again - Date: Fri, 26 Mar 1993 10:07:13 -0500 - Subject: my different opinion - - ... another body goes here ... - - ------ next message ------ - - ------ main boundary ------ - -5.1.6. Parallel Subtype - - This document defines a "parallel" subtype of the "multipart" - Content-Type. This type is syntactically identical to - "multipart/mixed", but the semantics are different. In particular, - - - -Freed & Borenstein Standards Track [Page 27] - -RFC 2046 Media Types November 1996 - - - in a parallel entity, the order of body parts is not significant. - - A common presentation of this type is to display all of the parts - simultaneously on hardware and software that are capable of doing so. - However, composing agents should be aware that many mail readers will - lack this capability and will show the parts serially in any event. - -5.1.7. Other Multipart Subtypes - - Other "multipart" subtypes are expected in the future. MIME - implementations must in general treat unrecognized subtypes of - "multipart" as being equivalent to "multipart/mixed". - -5.2. Message Media Type - - It is frequently desirable, in sending mail, to encapsulate another - mail message. A special media type, "message", is defined to - facilitate this. In particular, the "rfc822" subtype of "message" is - used to encapsulate RFC 822 messages. - - NOTE: It has been suggested that subtypes of "message" might be - defined for forwarded or rejected messages. However, forwarded and - rejected messages can be handled as multipart messages in which the - first part contains any control or descriptive information, and a - second part, of type "message/rfc822", is the forwarded or rejected - message. Composing rejection and forwarding messages in this manner - will preserve the type information on the original message and allow - it to be correctly presented to the recipient, and hence is strongly - encouraged. - - Subtypes of "message" often impose restrictions on what encodings are - allowed. These restrictions are described in conjunction with each - specific subtype. - - Mail gateways, relays, and other mail handling agents are commonly - known to alter the top-level header of an RFC 822 message. In - particular, they frequently add, remove, or reorder header fields. - These operations are explicitly forbidden for the encapsulated - headers embedded in the bodies of messages of type "message." - -5.2.1. RFC822 Subtype - - A media type of "message/rfc822" indicates that the body contains an - encapsulated message, with the syntax of an RFC 822 message. - However, unlike top-level RFC 822 messages, the restriction that each - "message/rfc822" body must include a "From", "Date", and at least one - destination header is removed and replaced with the requirement that - at least one of "From", "Subject", or "Date" must be present. - - - -Freed & Borenstein Standards Track [Page 28] - -RFC 2046 Media Types November 1996 - - - It should be noted that, despite the use of the numbers "822", a - "message/rfc822" entity isn't restricted to material in strict - conformance to RFC822, nor are the semantics of "message/rfc822" - objects restricted to the semantics defined in RFC822. More - specifically, a "message/rfc822" message could well be a News article - or a MIME message. - - No encoding other than "7bit", "8bit", or "binary" is permitted for - the body of a "message/rfc822" entity. The message header fields are - always US-ASCII in any case, and data within the body can still be - encoded, in which case the Content-Transfer-Encoding header field in - the encapsulated message will reflect this. Non-US-ASCII text in the - headers of an encapsulated message can be specified using the - mechanisms described in RFC 2047. - -5.2.2. Partial Subtype - - The "partial" subtype is defined to allow large entities to be - delivered as several separate pieces of mail and automatically - reassembled by a receiving user agent. (The concept is similar to IP - fragmentation and reassembly in the basic Internet Protocols.) This - mechanism can be used when intermediate transport agents limit the - size of individual messages that can be sent. The media type - "message/partial" thus indicates that the body contains a fragment of - a larger entity. - - Because data of type "message" may never be encoded in base64 or - quoted-printable, a problem might arise if "message/partial" entities - are constructed in an environment that supports binary or 8bit - transport. The problem is that the binary data would be split into - multiple "message/partial" messages, each of them requiring binary - transport. If such messages were encountered at a gateway into a - 7bit transport environment, there would be no way to properly encode - them for the 7bit world, aside from waiting for all of the fragments, - reassembling the inner message, and then encoding the reassembled - data in base64 or quoted-printable. Since it is possible that - different fragments might go through different gateways, even this is - not an acceptable solution. For this reason, it is specified that - entities of type "message/partial" must always have a content- - transfer-encoding of 7bit (the default). In particular, even in - environments that support binary or 8bit transport, the use of a - content- transfer-encoding of "8bit" or "binary" is explicitly - prohibited for MIME entities of type "message/partial". This in turn - implies that the inner message must not use "8bit" or "binary" - encoding. - - - - - - -Freed & Borenstein Standards Track [Page 29] - -RFC 2046 Media Types November 1996 - - - Because some message transfer agents may choose to automatically - fragment large messages, and because such agents may use very - different fragmentation thresholds, it is possible that the pieces of - a partial message, upon reassembly, may prove themselves to comprise - a partial message. This is explicitly permitted. - - Three parameters must be specified in the Content-Type field of type - "message/partial": The first, "id", is a unique identifier, as close - to a world-unique identifier as possible, to be used to match the - fragments together. (In general, the identifier is essentially a - message-id; if placed in double quotes, it can be ANY message-id, in - accordance with the BNF for "parameter" given in RFC 2045.) The - second, "number", an integer, is the fragment number, which indicates - where this fragment fits into the sequence of fragments. The third, - "total", another integer, is the total number of fragments. This - third subfield is required on the final fragment, and is optional - (though encouraged) on the earlier fragments. Note also that these - parameters may be given in any order. - - Thus, the second piece of a 3-piece message may have either of the - following header fields: - - Content-Type: Message/Partial; number=2; total=3; - id="oc=jpbe0M2Yt4s@thumper.bellcore.com" - - Content-Type: Message/Partial; - id="oc=jpbe0M2Yt4s@thumper.bellcore.com"; - number=2 - - But the third piece MUST specify the total number of fragments: - - Content-Type: Message/Partial; number=3; total=3; - id="oc=jpbe0M2Yt4s@thumper.bellcore.com" - - Note that fragment numbering begins with 1, not 0. - - When the fragments of an entity broken up in this manner are put - together, the result is always a complete MIME entity, which may have - its own Content-Type header field, and thus may contain any other - data type. - -5.2.2.1. Message Fragmentation and Reassembly - - The semantics of a reassembled partial message must be those of the - "inner" message, rather than of a message containing the inner - message. This makes it possible, for example, to send a large audio - message as several partial messages, and still have it appear to the - recipient as a simple audio message rather than as an encapsulated - - - -Freed & Borenstein Standards Track [Page 30] - -RFC 2046 Media Types November 1996 - - - message containing an audio message. That is, the encapsulation of - the message is considered to be "transparent". - - When generating and reassembling the pieces of a "message/partial" - message, the headers of the encapsulated message must be merged with - the headers of the enclosing entities. In this process the following - rules must be observed: - - (1) Fragmentation agents must split messages at line - boundaries only. This restriction is imposed because - splits at points other than the ends of lines in turn - depends on message transports being able to preserve - the semantics of messages that don't end with a CRLF - sequence. Many transports are incapable of preserving - such semantics. - - (2) All of the header fields from the initial enclosing - message, except those that start with "Content-" and - the specific header fields "Subject", "Message-ID", - "Encrypted", and "MIME-Version", must be copied, in - order, to the new message. - - (3) The header fields in the enclosed message which start - with "Content-", plus the "Subject", "Message-ID", - "Encrypted", and "MIME-Version" fields, must be - appended, in order, to the header fields of the new - message. Any header fields in the enclosed message - which do not start with "Content-" (except for the - "Subject", "Message-ID", "Encrypted", and "MIME- - Version" fields) will be ignored and dropped. - - (4) All of the header fields from the second and any - subsequent enclosing messages are discarded by the - reassembly process. - -5.2.2.2. Fragmentation and Reassembly Example - - If an audio message is broken into two pieces, the first piece might - look something like this: - - X-Weird-Header-1: Foo - From: Bill@host.com - To: joe@otherhost.com - Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) - Subject: Audio mail (part 1 of 2) - Message-ID: <id1@host.com> - MIME-Version: 1.0 - Content-type: message/partial; id="ABC@host.com"; - - - -Freed & Borenstein Standards Track [Page 31] - -RFC 2046 Media Types November 1996 - - - number=1; total=2 - - X-Weird-Header-1: Bar - X-Weird-Header-2: Hello - Message-ID: <anotherid@foo.com> - Subject: Audio mail - MIME-Version: 1.0 - Content-type: audio/basic - Content-transfer-encoding: base64 - - ... first half of encoded audio data goes here ... - - and the second half might look something like this: - - From: Bill@host.com - To: joe@otherhost.com - Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) - Subject: Audio mail (part 2 of 2) - MIME-Version: 1.0 - Message-ID: <id2@host.com> - Content-type: message/partial; - id="ABC@host.com"; number=2; total=2 - - ... second half of encoded audio data goes here ... - - Then, when the fragmented message is reassembled, the resulting - message to be displayed to the user should look something like this: - - X-Weird-Header-1: Foo - From: Bill@host.com - To: joe@otherhost.com - Date: Fri, 26 Mar 1993 12:59:38 -0500 (EST) - Subject: Audio mail - Message-ID: <anotherid@foo.com> - MIME-Version: 1.0 - Content-type: audio/basic - Content-transfer-encoding: base64 - - ... first half of encoded audio data goes here ... - ... second half of encoded audio data goes here ... - - The inclusion of a "References" field in the headers of the second - and subsequent pieces of a fragmented message that references the - Message-Id on the previous piece may be of benefit to mail readers - that understand and track references. However, the generation of - such "References" fields is entirely optional. - - - - - -Freed & Borenstein Standards Track [Page 32] - -RFC 2046 Media Types November 1996 - - - Finally, it should be noted that the "Encrypted" header field has - been made obsolete by Privacy Enhanced Messaging (PEM) [RFC-1421, - RFC-1422, RFC-1423, RFC-1424], but the rules above are nevertheless - believed to describe the correct way to treat it if it is encountered - in the context of conversion to and from "message/partial" fragments. - -5.2.3. External-Body Subtype - - The external-body subtype indicates that the actual body data are not - included, but merely referenced. In this case, the parameters - describe a mechanism for accessing the external data. - - When a MIME entity is of type "message/external-body", it consists of - a header, two consecutive CRLFs, and the message header for the - encapsulated message. If another pair of consecutive CRLFs appears, - this of course ends the message header for the encapsulated message. - However, since the encapsulated message's body is itself external, it - does NOT appear in the area that follows. For example, consider the - following message: - - Content-type: message/external-body; - access-type=local-file; - name="/u/nsb/Me.jpeg" - - Content-type: image/jpeg - Content-ID: <id42@guppylake.bellcore.com> - Content-Transfer-Encoding: binary - - THIS IS NOT REALLY THE BODY! - - The area at the end, which might be called the "phantom body", is - ignored for most external-body messages. However, it may be used to - contain auxiliary information for some such messages, as indeed it is - when the access-type is "mail- server". The only access-type defined - in this document that uses the phantom body is "mail-server", but - other access-types may be defined in the future in other - specifications that use this area. - - The encapsulated headers in ALL "message/external-body" entities MUST - include a Content-ID header field to give a unique identifier by - which to reference the data. This identifier may be used for caching - mechanisms, and for recognizing the receipt of the data when the - access-type is "mail-server". - - Note that, as specified here, the tokens that describe external-body - data, such as file names and mail server commands, are required to be - in the US-ASCII character set. - - - - -Freed & Borenstein Standards Track [Page 33] - -RFC 2046 Media Types November 1996 - - - If this proves problematic in practice, a new mechanism may be - required as a future extension to MIME, either as newly defined - access-types for "message/external-body" or by some other mechanism. - - As with "message/partial", MIME entities of type "message/external- - body" MUST have a content-transfer-encoding of 7bit (the default). - In particular, even in environments that support binary or 8bit - transport, the use of a content- transfer-encoding of "8bit" or - "binary" is explicitly prohibited for entities of type - "message/external-body". - -5.2.3.1. General External-Body Parameters - - The parameters that may be used with any "message/external- body" - are: - - (1) ACCESS-TYPE -- A word indicating the supported access - mechanism by which the file or data may be obtained. - This word is not case sensitive. Values include, but - are not limited to, "FTP", "ANON-FTP", "TFTP", "LOCAL- - FILE", and "MAIL-SERVER". Future values, except for - experimental values beginning with "X-", must be - registered with IANA, as described in RFC 2048. - This parameter is unconditionally mandatory and MUST be - present on EVERY "message/external-body". - - (2) EXPIRATION -- The date (in the RFC 822 "date-time" - syntax, as extended by RFC 1123 to permit 4 digits in - the year field) after which the existence of the - external data is not guaranteed. This parameter may be - used with ANY access-type and is ALWAYS optional. - - (3) SIZE -- The size (in octets) of the data. The intent - of this parameter is to help the recipient decide - whether or not to expend the necessary resources to - retrieve the external data. Note that this describes - the size of the data in its canonical form, that is, - before any Content-Transfer-Encoding has been applied - or after the data have been decoded. This parameter - may be used with ANY access-type and is ALWAYS - optional. - - (4) PERMISSION -- A case-insensitive field that indicates - whether or not it is expected that clients might also - attempt to overwrite the data. By default, or if - permission is "read", the assumption is that they are - not, and that if the data is retrieved once, it is - never needed again. If PERMISSION is "read-write", - - - -Freed & Borenstein Standards Track [Page 34] - -RFC 2046 Media Types November 1996 - - - this assumption is invalid, and any local copy must be - considered no more than a cache. "Read" and "Read- - write" are the only defined values of permission. This - parameter may be used with ANY access-type and is - ALWAYS optional. - - The precise semantics of the access-types defined here are described - in the sections that follow. - -5.2.3.2. The 'ftp' and 'tftp' Access-Types - - An access-type of FTP or TFTP indicates that the message body is - accessible as a file using the FTP [RFC-959] or TFTP [RFC- 783] - protocols, respectively. For these access-types, the following - additional parameters are mandatory: - - (1) NAME -- The name of the file that contains the actual - body data. - - (2) SITE -- A machine from which the file may be obtained, - using the given protocol. This must be a fully - qualified domain name, not a nickname. - - (3) Before any data are retrieved, using FTP, the user will - generally need to be asked to provide a login id and a - password for the machine named by the site parameter. - For security reasons, such an id and password are not - specified as content-type parameters, but must be - obtained from the user. - - In addition, the following parameters are optional: - - (1) DIRECTORY -- A directory from which the data named by - NAME should be retrieved. - - (2) MODE -- A case-insensitive string indicating the mode - to be used when retrieving the information. The valid - values for access-type "TFTP" are "NETASCII", "OCTET", - and "MAIL", as specified by the TFTP protocol [RFC- - 783]. The valid values for access-type "FTP" are - "ASCII", "EBCDIC", "IMAGE", and "LOCALn" where "n" is a - decimal integer, typically 8. These correspond to the - representation types "A" "E" "I" and "L n" as specified - by the FTP protocol [RFC-959]. Note that "BINARY" and - "TENEX" are not valid values for MODE and that "OCTET" - or "IMAGE" or "LOCAL8" should be used instead. IF MODE - is not specified, the default value is "NETASCII" for - TFTP and "ASCII" otherwise. - - - -Freed & Borenstein Standards Track [Page 35] - -RFC 2046 Media Types November 1996 - - -5.2.3.3. The 'anon-ftp' Access-Type - - The "anon-ftp" access-type is identical to the "ftp" access type, - except that the user need not be asked to provide a name and password - for the specified site. Instead, the ftp protocol will be used with - login "anonymous" and a password that corresponds to the user's mail - address. - -5.2.3.4. The 'local-file' Access-Type - - An access-type of "local-file" indicates that the actual body is - accessible as a file on the local machine. Two additional parameters - are defined for this access type: - - (1) NAME -- The name of the file that contains the actual - body data. This parameter is mandatory for the - "local-file" access-type. - - (2) SITE -- A domain specifier for a machine or set of - machines that are known to have access to the data - file. This optional parameter is used to describe the - locality of reference for the data, that is, the site - or sites at which the file is expected to be visible. - Asterisks may be used for wildcard matching to a part - of a domain name, such as "*.bellcore.com", to indicate - a set of machines on which the data should be directly - visible, while a single asterisk may be used to - indicate a file that is expected to be universally - available, e.g., via a global file system. - -5.2.3.5. The 'mail-server' Access-Type - - The "mail-server" access-type indicates that the actual body is - available from a mail server. Two additional parameters are defined - for this access-type: - - (1) SERVER -- The addr-spec of the mail server from which - the actual body data can be obtained. This parameter - is mandatory for the "mail-server" access-type. - - (2) SUBJECT -- The subject that is to be used in the mail - that is sent to obtain the data. Note that keying mail - servers on Subject lines is NOT recommended, but such - mail servers are known to exist. This is an optional - parameter. - - - - - - -Freed & Borenstein Standards Track [Page 36] - -RFC 2046 Media Types November 1996 - - - Because mail servers accept a variety of syntaxes, some of which is - multiline, the full command to be sent to a mail server is not - included as a parameter in the content-type header field. Instead, - it is provided as the "phantom body" when the media type is - "message/external-body" and the access-type is mail-server. - - Note that MIME does not define a mail server syntax. Rather, it - allows the inclusion of arbitrary mail server commands in the phantom - body. Implementations must include the phantom body in the body of - the message it sends to the mail server address to retrieve the - relevant data. - - Unlike other access-types, mail-server access is asynchronous and - will happen at an unpredictable time in the future. For this reason, - it is important that there be a mechanism by which the returned data - can be matched up with the original "message/external-body" entity. - MIME mail servers must use the same Content-ID field on the returned - message that was used in the original "message/external-body" - entities, to facilitate such matching. - -5.2.3.6. External-Body Security Issues - - "Message/external-body" entities give rise to two important security - issues: - - (1) Accessing data via a "message/external-body" reference - effectively results in the message recipient performing - an operation that was specified by the message - originator. It is therefore possible for the message - originator to trick a recipient into doing something - they would not have done otherwise. For example, an - originator could specify a action that attempts - retrieval of material that the recipient is not - authorized to obtain, causing the recipient to - unwittingly violate some security policy. For this - reason, user agents capable of resolving external - references must always take steps to describe the - action they are to take to the recipient and ask for - explicit permisssion prior to performing it. - - The 'mail-server' access-type is particularly - vulnerable, in that it causes the recipient to send a - new message whose contents are specified by the - original message's originator. Given the potential for - abuse, any such request messages that are constructed - should contain a clear indication that they were - generated automatically (e.g. in a Comments: header - field) in an attempt to resolve a MIME - - - -Freed & Borenstein Standards Track [Page 37] - -RFC 2046 Media Types November 1996 - - - "message/external-body" reference. - - (2) MIME will sometimes be used in environments that - provide some guarantee of message integrity and - authenticity. If present, such guarantees may apply - only to the actual direct content of messages -- they - may or may not apply to data accessed through MIME's - "message/external-body" mechanism. In particular, it - may be possible to subvert certain access mechanisms - even when the messaging system itself is secure. - - It should be noted that this problem exists either with - or without the availabilty of MIME mechanisms. A - casual reference to an FTP site containing a document - in the text of a secure message brings up similar - issues -- the only difference is that MIME provides for - automatic retrieval of such material, and users may - place unwarranted trust is such automatic retrieval - mechanisms. - -5.2.3.7. Examples and Further Explanations - - When the external-body mechanism is used in conjunction with the - "multipart/alternative" media type it extends the functionality of - "multipart/alternative" to include the case where the same entity is - provided in the same format but via different accces mechanisms. - When this is done the originator of the message must order the parts - first in terms of preferred formats and then by preferred access - mechanisms. The recipient's viewer should then evaluate the list - both in terms of format and access mechanisms. - - With the emerging possibility of very wide-area file systems, it - becomes very hard to know in advance the set of machines where a file - will and will not be accessible directly from the file system. - Therefore it may make sense to provide both a file name, to be tried - directly, and the name of one or more sites from which the file is - known to be accessible. An implementation can try to retrieve remote - files using FTP or any other protocol, using anonymous file retrieval - or prompting the user for the necessary name and password. If an - external body is accessible via multiple mechanisms, the sender may - include multiple entities of type "message/external-body" within the - body parts of an enclosing "multipart/alternative" entity. - - However, the external-body mechanism is not intended to be limited to - file retrieval, as shown by the mail-server access-type. Beyond - this, one can imagine, for example, using a video server for external - references to video clips. - - - - -Freed & Borenstein Standards Track [Page 38] - -RFC 2046 Media Types November 1996 - - - The embedded message header fields which appear in the body of the - "message/external-body" data must be used to declare the media type - of the external body if it is anything other than plain US-ASCII - text, since the external body does not have a header section to - declare its type. Similarly, any Content-transfer-encoding other - than "7bit" must also be declared here. Thus a complete - "message/external-body" message, referring to an object in PostScript - format, might look like this: - - From: Whomever - To: Someone - Date: Whenever - Subject: whatever - MIME-Version: 1.0 - Message-ID: <id1@host.com> - Content-Type: multipart/alternative; boundary=42 - Content-ID: <id001@guppylake.bellcore.com> - - --42 - Content-Type: message/external-body; name="BodyFormats.ps"; - site="thumper.bellcore.com"; mode="image"; - access-type=ANON-FTP; directory="pub"; - expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" - - Content-type: application/postscript - Content-ID: <id42@guppylake.bellcore.com> - - --42 - Content-Type: message/external-body; access-type=local-file; - name="/u/nsb/writing/rfcs/RFC-MIME.ps"; - site="thumper.bellcore.com"; - expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" - - Content-type: application/postscript - Content-ID: <id42@guppylake.bellcore.com> - - --42 - Content-Type: message/external-body; - access-type=mail-server - server="listserv@bogus.bitnet"; - expiration="Fri, 14 Jun 1991 19:13:14 -0400 (EDT)" - - Content-type: application/postscript - Content-ID: <id42@guppylake.bellcore.com> - - get RFC-MIME.DOC - - --42-- - - - -Freed & Borenstein Standards Track [Page 39] - -RFC 2046 Media Types November 1996 - - - Note that in the above examples, the default Content-transfer- - encoding of "7bit" is assumed for the external postscript data. - - Like the "message/partial" type, the "message/external-body" media - type is intended to be transparent, that is, to convey the data type - in the external body rather than to convey a message with a body of - that type. Thus the headers on the outer and inner parts must be - merged using the same rules as for "message/partial". In particular, - this means that the Content-type and Subject fields are overridden, - but the From field is preserved. - - Note that since the external bodies are not transported along with - the external body reference, they need not conform to transport - limitations that apply to the reference itself. In particular, - Internet mail transports may impose 7bit and line length limits, but - these do not automatically apply to binary external body references. - Thus a Content-Transfer-Encoding is not generally necessary, though - it is permitted. - - Note that the body of a message of type "message/external-body" is - governed by the basic syntax for an RFC 822 message. In particular, - anything before the first consecutive pair of CRLFs is header - information, while anything after it is body information, which is - ignored for most access-types. - -5.2.4. Other Message Subtypes - - MIME implementations must in general treat unrecognized subtypes of - "message" as being equivalent to "application/octet-stream". - - Future subtypes of "message" intended for use with email should be - restricted to "7bit" encoding. A type other than "message" should be - used if restriction to "7bit" is not possible. - -6. Experimental Media Type Values - - A media type value beginning with the characters "X-" is a private - value, to be used by consenting systems by mutual agreement. Any - format without a rigorous and public definition must be named with an - "X-" prefix, and publicly specified values shall never begin with - "X-". (Older versions of the widely used Andrew system use the "X- - BE2" name, so new systems should probably choose a different name.) - - In general, the use of "X-" top-level types is strongly discouraged. - Implementors should invent subtypes of the existing types whenever - possible. In many cases, a subtype of "application" will be more - appropriate than a new top-level type. - - - - -Freed & Borenstein Standards Track [Page 40] - -RFC 2046 Media Types November 1996 - - -7. Summary - - The five discrete media types provide provide a standardized - mechanism for tagging entities as "audio", "image", or several other - kinds of data. The composite "multipart" and "message" media types - allow mixing and hierarchical structuring of entities of different - types in a single message. A distinguished parameter syntax allows - further specification of data format details, particularly the - specification of alternate character sets. Additional optional - header fields provide mechanisms for certain extensions deemed - desirable by many implementors. Finally, a number of useful media - types are defined for general use by consenting user agents, notably - "message/partial" and "message/external-body". - -9. Security Considerations - - Security issues are discussed in the context of the - "application/postscript" type, the "message/external-body" type, and - in RFC 2048. Implementors should pay special attention to the - security implications of any media types that can cause the remote - execution of any actions in the recipient's environment. In such - cases, the discussion of the "application/postscript" type may serve - as a model for considering other media types with remote execution - capabilities. - - - - - - - - - - - - - - - - - - - - - - - - - - - -Freed & Borenstein Standards Track [Page 41] - -RFC 2046 Media Types November 1996 - - -9. Authors' Addresses - - For more information, the authors of this document are best contacted - via Internet mail: - - Ned Freed - Innosoft International, Inc. - 1050 East Garvey Avenue South - West Covina, CA 91790 - USA - - Phone: +1 818 919 3600 - Fax: +1 818 919 3614 - EMail: ned@innosoft.com - - - Nathaniel S. Borenstein - First Virtual Holdings - 25 Washington Avenue - Morristown, NJ 07960 - USA - - Phone: +1 201 540 8967 - Fax: +1 201 993 3032 - EMail: nsb@nsb.fv.com - - - MIME is a result of the work of the Internet Engineering Task Force - Working Group on RFC 822 Extensions. The chairman of that group, - Greg Vaudreuil, may be reached at: - - Gregory M. Vaudreuil - Octel Network Services - 17080 Dallas Parkway - Dallas, TX 75248-1905 - USA - - EMail: Greg.Vaudreuil@Octel.Com - - - - - - - - - - - - - -Freed & Borenstein Standards Track [Page 42] - -RFC 2046 Media Types November 1996 - - -Appendix A -- Collected Grammar - - This appendix contains the complete BNF grammar for all the syntax - specified by this document. - - By itself, however, this grammar is incomplete. It refers by name to - several syntax rules that are defined by RFC 822. Rather than - reproduce those definitions here, and risk unintentional differences - between the two, this document simply refers the reader to RFC 822 - for the remaining definitions. Wherever a term is undefined, it - refers to the RFC 822 definition. - - boundary := 0*69<bchars> bcharsnospace - - bchars := bcharsnospace / " " - - bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" / - "+" / "_" / "," / "-" / "." / - "/" / ":" / "=" / "?" - - body-part := <"message" as defined in RFC 822, with all - header fields optional, not starting with the - specified dash-boundary, and with the - delimiter not occurring anywhere in the - body part. Note that the semantics of a - part differ from the semantics of a message, - as described in the text.> - - close-delimiter := delimiter "--" - - dash-boundary := "--" boundary - ; boundary taken from the value of - ; boundary parameter of the - ; Content-Type field. - - delimiter := CRLF dash-boundary - - discard-text := *(*text CRLF) - ; May be ignored or discarded. - - encapsulation := delimiter transport-padding - CRLF body-part - - epilogue := discard-text - - multipart-body := [preamble CRLF] - dash-boundary transport-padding CRLF - body-part *encapsulation - - - -Freed & Borenstein Standards Track [Page 43] - -RFC 2046 Media Types November 1996 - - - close-delimiter transport-padding - [CRLF epilogue] - - preamble := discard-text - - transport-padding := *LWSP-char - ; Composers MUST NOT generate - ; non-zero length transport - ; padding, but receivers MUST - ; be able to handle padding - ; added by message transports. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Freed & Borenstein Standards Track [Page 44] - diff --git a/proto/rfc2047.txt b/proto/rfc2047.txt @@ -1,843 +0,0 @@ - - - - - - -Network Working Group K. Moore -Request for Comments: 2047 University of Tennessee -Obsoletes: 1521, 1522, 1590 November 1996 -Category: Standards Track - - - MIME (Multipurpose Internet Mail Extensions) Part Three: - Message Header Extensions for Non-ASCII Text - -Status of this Memo - - This document specifies an Internet standards track protocol for the - Internet community, and requests discussion and suggestions for - improvements. Please refer to the current edition of the "Internet - Official Protocol Standards" (STD 1) for the standardization state - and status of this protocol. Distribution of this memo is unlimited. - -Abstract - - STD 11, RFC 822, defines a message representation protocol specifying - considerable detail about US-ASCII message headers, and leaves the - message content, or message body, as flat US-ASCII text. This set of - documents, collectively called the Multipurpose Internet Mail - Extensions, or MIME, redefines the format of messages to allow for - - (1) textual message bodies in character sets other than US-ASCII, - - (2) an extensible set of different formats for non-textual message - bodies, - - (3) multi-part message bodies, and - - (4) textual header information in character sets other than US-ASCII. - - These documents are based on earlier work documented in RFC 934, STD - 11, and RFC 1049, but extends and revises them. Because RFC 822 said - so little about message bodies, these documents are largely - orthogonal to (rather than a revision of) RFC 822. - - This particular document is the third document in the series. It - describes extensions to RFC 822 to allow non-US-ASCII text data in - Internet mail header fields. - - - - - - - - - -Moore Standards Track [Page 1] - -RFC 2047 Message Header Extensions November 1996 - - - Other documents in this series include: - - + RFC 2045, which specifies the various headers used to describe - the structure of MIME messages. - - + RFC 2046, which defines the general structure of the MIME media - typing system and defines an initial set of media types, - - + RFC 2048, which specifies various IANA registration procedures - for MIME-related facilities, and - - + RFC 2049, which describes MIME conformance criteria and - provides some illustrative examples of MIME message formats, - acknowledgements, and the bibliography. - - These documents are revisions of RFCs 1521, 1522, and 1590, which - themselves were revisions of RFCs 1341 and 1342. An appendix in RFC - 2049 describes differences and changes from previous versions. - -1. Introduction - - RFC 2045 describes a mechanism for denoting textual body parts which - are coded in various character sets, as well as methods for encoding - such body parts as sequences of printable US-ASCII characters. This - memo describes similar techniques to allow the encoding of non-ASCII - text in various portions of a RFC 822 [2] message header, in a manner - which is unlikely to confuse existing message handling software. - - Like the encoding techniques described in RFC 2045, the techniques - outlined here were designed to allow the use of non-ASCII characters - in message headers in a way which is unlikely to be disturbed by the - quirks of existing Internet mail handling programs. In particular, - some mail relaying programs are known to (a) delete some message - header fields while retaining others, (b) rearrange the order of - addresses in To or Cc fields, (c) rearrange the (vertical) order of - header fields, and/or (d) "wrap" message headers at different places - than those in the original message. In addition, some mail reading - programs are known to have difficulty correctly parsing message - headers which, while legal according to RFC 822, make use of - backslash-quoting to "hide" special characters such as "<", ",", or - ":", or which exploit other infrequently-used features of that - specification. - - While it is unfortunate that these programs do not correctly - interpret RFC 822 headers, to "break" these programs would cause - severe operational problems for the Internet mail system. The - extensions described in this memo therefore do not rely on little- - used features of RFC 822. - - - -Moore Standards Track [Page 2] - -RFC 2047 Message Header Extensions November 1996 - - - Instead, certain sequences of "ordinary" printable ASCII characters - (known as "encoded-words") are reserved for use as encoded data. The - syntax of encoded-words is such that they are unlikely to - "accidentally" appear as normal text in message headers. - Furthermore, the characters used in encoded-words are restricted to - those which do not have special meanings in the context in which the - encoded-word appears. - - Generally, an "encoded-word" is a sequence of printable ASCII - characters that begins with "=?", ends with "?=", and has two "?"s in - between. It specifies a character set and an encoding method, and - also includes the original text encoded as graphic ASCII characters, - according to the rules for that encoding method. - - A mail composer that implements this specification will provide a - means of inputting non-ASCII text in header fields, but will - translate these fields (or appropriate portions of these fields) into - encoded-words before inserting them into the message header. - - A mail reader that implements this specification will recognize - encoded-words when they appear in certain portions of the message - header. Instead of displaying the encoded-word "as is", it will - reverse the encoding and display the original text in the designated - character set. - -NOTES - - This memo relies heavily on notation and terms defined RFC 822 and - RFC 2045. In particular, the syntax for the ABNF used in this memo - is defined in RFC 822, as well as many of the terminal or nonterminal - symbols from RFC 822 are used in the grammar for the header - extensions defined here. Among the symbols defined in RFC 822 and - referenced in this memo are: 'addr-spec', 'atom', 'CHAR', 'comment', - 'CTLs', 'ctext', 'linear-white-space', 'phrase', 'quoted-pair'. - 'quoted-string', 'SPACE', and 'word'. Successful implementation of - this protocol extension requires careful attention to the RFC 822 - definitions of these terms. - - When the term "ASCII" appears in this memo, it refers to the "7-Bit - American Standard Code for Information Interchange", ANSI X3.4-1986. - The MIME charset name for this character set is "US-ASCII". When not - specifically referring to the MIME charset name, this document uses - the term "ASCII", both for brevity and for consistency with RFC 822. - However, implementors are warned that the character set name must be - spelled "US-ASCII" in MIME message and body part headers. - - - - - - -Moore Standards Track [Page 3] - -RFC 2047 Message Header Extensions November 1996 - - - This memo specifies a protocol for the representation of non-ASCII - text in message headers. It specifically DOES NOT define any - translation between "8-bit headers" and pure ASCII headers, nor is - any such translation assumed to be possible. - -2. Syntax of encoded-words - - An 'encoded-word' is defined by the following ABNF grammar. The - notation of RFC 822 is used, with the exception that white space - characters MUST NOT appear between components of an 'encoded-word'. - - encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" - - charset = token ; see section 3 - - encoding = token ; see section 4 - - token = 1*<Any CHAR except SPACE, CTLs, and especials> - - especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / " - <"> / "/" / "[" / "]" / "?" / "." / "=" - - encoded-text = 1*<Any printable ASCII character other than "?" - or SPACE> - ; (but see "Use of encoded-words in message - ; headers", section 5) - - Both 'encoding' and 'charset' names are case-independent. Thus the - charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the - encoding named "Q" may be spelled either "Q" or "q". - - An 'encoded-word' may not be more than 75 characters long, including - 'charset', 'encoding', 'encoded-text', and delimiters. If it is - desirable to encode more text than will fit in an 'encoded-word' of - 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may - be used. - - While there is no limit to the length of a multiple-line header - field, each line of a header field that contains one or more - 'encoded-word's is limited to 76 characters. - - The length restrictions are included both to ease interoperability - through internetwork mail gateways, and to impose a limit on the - amount of lookahead a header parser must employ (while looking for a - final ?= delimiter) before it can decide whether a token is an - "encoded-word" or something else. - - - - - -Moore Standards Track [Page 4] - -RFC 2047 Message Header Extensions November 1996 - - - IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's - by an RFC 822 parser. As a consequence, unencoded white space - characters (such as SPACE and HTAB) are FORBIDDEN within an - 'encoded-word'. For example, the character sequence - - =?iso-8859-1?q?this is some text?= - - would be parsed as four 'atom's, rather than as a single 'atom' (by - an RFC 822 parser) or 'encoded-word' (by a parser which understands - 'encoded-words'). The correct way to encode the string "this is some - text" is to encode the SPACE characters as well, e.g. - - =?iso-8859-1?q?this=20is=20some=20text?= - - The characters which may appear in 'encoded-text' are further - restricted by the rules in section 5. - -3. Character sets - - The 'charset' portion of an 'encoded-word' specifies the character - set associated with the unencoded text. A 'charset' can be any of - the character set names allowed in an MIME "charset" parameter of a - "text/plain" body part, or any character set name registered with - IANA for use with the MIME text/plain content-type. - - Some character sets use code-switching techniques to switch between - "ASCII mode" and other modes. If unencoded text in an 'encoded-word' - contains a sequence which causes the charset interpreter to switch - out of ASCII mode, it MUST contain additional control codes such that - ASCII mode is again selected at the end of the 'encoded-word'. (This - rule applies separately to each 'encoded-word', including adjacent - 'encoded-word's within a single header field.) - - When there is a possibility of using more than one character set to - represent the text in an 'encoded-word', and in the absence of - private agreements between sender and recipients of a message, it is - recommended that members of the ISO-8859-* series be used in - preference to other character sets. - -4. Encodings - - Initially, the legal values for "encoding" are "Q" and "B". These - encodings are described below. The "Q" encoding is recommended for - use when most of the characters to be encoded are in the ASCII - character set; otherwise, the "B" encoding should be used. - Nevertheless, a mail reader which claims to recognize 'encoded-word's - MUST be able to accept either encoding for any character set which it - supports. - - - -Moore Standards Track [Page 5] - -RFC 2047 Message Header Extensions November 1996 - - - Only a subset of the printable ASCII characters may be used in - 'encoded-text'. Space and tab characters are not allowed, so that - the beginning and end of an 'encoded-word' are obvious. The "?" - character is used within an 'encoded-word' to separate the various - portions of the 'encoded-word' from one another, and thus cannot - appear in the 'encoded-text' portion. Other characters are also - illegal in certain contexts. For example, an 'encoded-word' in a - 'phrase' preceding an address in a From header field may not contain - any of the "specials" defined in RFC 822. Finally, certain other - characters are disallowed in some contexts, to ensure reliability for - messages that pass through internetwork mail gateways. - - The "B" encoding automatically meets these requirements. The "Q" - encoding allows a wide range of printable characters to be used in - non-critical locations in the message header (e.g., Subject), with - fewer characters available for use in other locations. - -4.1. The "B" encoding - - The "B" encoding is identical to the "BASE64" encoding defined by RFC - 2045. - -4.2. The "Q" encoding - - The "Q" encoding is similar to the "Quoted-Printable" content- - transfer-encoding defined in RFC 2045. It is designed to allow text - containing mostly ASCII characters to be decipherable on an ASCII - terminal without decoding. - - (1) Any 8-bit value may be represented by a "=" followed by two - hexadecimal digits. For example, if the character set in use - were ISO-8859-1, the "=" character would thus be encoded as - "=3D", and a SPACE by "=20". (Upper case should be used for - hexadecimal digits "A" through "F".) - - (2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be - represented as "_" (underscore, ASCII 95.). (This character may - not pass through some internetwork mail gateways, but its use - will greatly enhance readability of "Q" encoded data with mail - readers that do not support this encoding.) Note that the "_" - always represents hexadecimal 20, even if the SPACE character - occupies a different code position in the character set in use. - - (3) 8-bit values which correspond to printable ASCII characters other - than "=", "?", and "_" (underscore), MAY be represented as those - characters. (But see section 5 for restrictions.) In - particular, SPACE and TAB MUST NOT be represented as themselves - within encoded words. - - - -Moore Standards Track [Page 6] - -RFC 2047 Message Header Extensions November 1996 - - -5. Use of encoded-words in message headers - - An 'encoded-word' may appear in a message header or body part header - according to the following rules: - -(1) An 'encoded-word' may replace a 'text' token (as defined by RFC 822) - in any Subject or Comments header field, any extension message - header field, or any MIME body part field for which the field body - is defined as '*text'. An 'encoded-word' may also appear in any - user-defined ("X-") message or body part header field. - - Ordinary ASCII text and 'encoded-word's may appear together in the - same header field. However, an 'encoded-word' that appears in a - header field defined as '*text' MUST be separated from any adjacent - 'encoded-word' or 'text' by 'linear-white-space'. - -(2) An 'encoded-word' may appear within a 'comment' delimited by "(" and - ")", i.e., wherever a 'ctext' is allowed. More precisely, the RFC - 822 ABNF definition for 'comment' is amended as follows: - - comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")" - - A "Q"-encoded 'encoded-word' which appears in a 'comment' MUST NOT - contain the characters "(", ")" or " - 'encoded-word' that appears in a 'comment' MUST be separated from - any adjacent 'encoded-word' or 'ctext' by 'linear-white-space'. - - It is important to note that 'comment's are only recognized inside - "structured" field bodies. In fields whose bodies are defined as - '*text', "(" and ")" are treated as ordinary characters rather than - comment delimiters, and rule (1) of this section applies. (See RFC - 822, sections 3.1.2 and 3.1.3) - -(3) As a replacement for a 'word' entity within a 'phrase', for example, - one that precedes an address in a From, To, or Cc header. The ABNF - definition for 'phrase' from RFC 822 thus becomes: - - phrase = 1*( encoded-word / word ) - - In this case the set of characters that may be used in a "Q"-encoded - 'encoded-word' is restricted to: <upper and lower case ASCII - letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_" - (underscore, ASCII 95.)>. An 'encoded-word' that appears within a - 'phrase' MUST be separated from any adjacent 'word', 'text' or - 'special' by 'linear-white-space'. - - - - - - -Moore Standards Track [Page 7] - -RFC 2047 Message Header Extensions November 1996 - - - These are the ONLY locations where an 'encoded-word' may appear. In - particular: - - + An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'. - - + An 'encoded-word' MUST NOT appear within a 'quoted-string'. - - + An 'encoded-word' MUST NOT be used in a Received header field. - - + An 'encoded-word' MUST NOT be used in parameter of a MIME - Content-Type or Content-Disposition field, or in any structured - field body except within a 'comment' or 'phrase'. - - The 'encoded-text' in an 'encoded-word' must be self-contained; - 'encoded-text' MUST NOT be continued from one 'encoded-word' to - another. This implies that the 'encoded-text' portion of a "B" - 'encoded-word' will be a multiple of 4 characters long; for a "Q" - 'encoded-word', any "=" character that appears in the 'encoded-text' - portion will be followed by two hexadecimal characters. - - Each 'encoded-word' MUST encode an integral number of octets. The - 'encoded-text' in each 'encoded-word' must be well-formed according - to the encoding specified; the 'encoded-text' may not be continued in - the next 'encoded-word'. (For example, "=?charset?Q?=?= - =?charset?Q?AB?=" would be illegal, because the two hex digits "AB" - must follow the "=" in the same 'encoded-word'.) - - Each 'encoded-word' MUST represent an integral number of characters. - A multi-octet character may not be split across adjacent 'encoded- - word's. - - Only printable and white space character data should be encoded using - this scheme. However, since these encoding schemes allow the - encoding of arbitrary octet values, mail readers that implement this - decoding should also ensure that display of the decoded data on the - recipient's terminal will not cause unwanted side-effects. - - Use of these methods to encode non-textual data (e.g., pictures or - sounds) is not defined by this memo. Use of 'encoded-word's to - represent strings of purely ASCII characters is allowed, but - discouraged. In rare cases it may be necessary to encode ordinary - text that looks like an 'encoded-word'. - - - - - - - - - -Moore Standards Track [Page 8] - -RFC 2047 Message Header Extensions November 1996 - - -6. Support of 'encoded-word's by mail readers - -6.1. Recognition of 'encoded-word's in message headers - - A mail reader must parse the message and body part headers according - to the rules in RFC 822 to correctly recognize 'encoded-word's. - - 'encoded-word's are to be recognized as follows: - - (1) Any message or body part header field defined as '*text', or any - user-defined header field, should be parsed as follows: Beginning - at the start of the field-body and immediately following each - occurrence of 'linear-white-space', each sequence of up to 75 - printable characters (not containing any 'linear-white-space') - should be examined to see if it is an 'encoded-word' according to - the syntax rules in section 2. Any other sequence of printable - characters should be treated as ordinary ASCII text. - - (2) Any header field not defined as '*text' should be parsed - according to the syntax rules for that header field. However, - any 'word' that appears within a 'phrase' should be treated as an - 'encoded-word' if it meets the syntax rules in section 2. - Otherwise it should be treated as an ordinary 'word'. - - (3) Within a 'comment', any sequence of up to 75 printable characters - (not containing 'linear-white-space'), that meets the syntax - rules in section 2, should be treated as an 'encoded-word'. - Otherwise it should be treated as normal comment text. - - (4) A MIME-Version header field is NOT required to be present for - 'encoded-word's to be interpreted according to this - specification. One reason for this is that the mail reader is - not expected to parse the entire message header before displaying - lines that may contain 'encoded-word's. - -6.2. Display of 'encoded-word's - - Any 'encoded-word's so recognized are decoded, and if possible, the - resulting unencoded text is displayed in the original character set. - - NOTE: Decoding and display of encoded-words occurs *after* a - structured field body is parsed into tokens. It is therefore - possible to hide 'special' characters in encoded-words which, when - displayed, will be indistinguishable from 'special' characters in the - surrounding text. For this and other reasons, it is NOT generally - possible to translate a message header containing 'encoded-word's to - an unencoded form which can be parsed by an RFC 822 mail reader. - - - - -Moore Standards Track [Page 9] - -RFC 2047 Message Header Extensions November 1996 - - - When displaying a particular header field that contains multiple - 'encoded-word's, any 'linear-white-space' that separates a pair of - adjacent 'encoded-word's is ignored. (This is to allow the use of - multiple 'encoded-word's to represent long strings of unencoded text, - without having to separate 'encoded-word's where spaces occur in the - unencoded text.) - - In the event other encodings are defined in the future, and the mail - reader does not support the encoding used, it may either (a) display - the 'encoded-word' as ordinary text, or (b) substitute an appropriate - message indicating that the text could not be decoded. - - If the mail reader does not support the character set used, it may - (a) display the 'encoded-word' as ordinary text (i.e., as it appears - in the header), (b) make a "best effort" to display using such - characters as are available, or (c) substitute an appropriate message - indicating that the decoded text could not be displayed. - - If the character set being used employs code-switching techniques, - display of the encoded text implicitly begins in "ASCII mode". In - addition, the mail reader must ensure that the output device is once - again in "ASCII mode" after the 'encoded-word' is displayed. - -6.3. Mail reader handling of incorrectly formed 'encoded-word's - - It is possible that an 'encoded-word' that is legal according to the - syntax defined in section 2, is incorrectly formed according to the - rules for the encoding being used. For example: - - (1) An 'encoded-word' which contains characters which are not legal - for a particular encoding (for example, a "-" in the "B" - encoding, or a SPACE or HTAB in either the "B" or "Q" encoding), - is incorrectly formed. - - (2) Any 'encoded-word' which encodes a non-integral number of - characters or octets is incorrectly formed. - - A mail reader need not attempt to display the text associated with an - 'encoded-word' that is incorrectly formed. However, a mail reader - MUST NOT prevent the display or handling of a message because an - 'encoded-word' is incorrectly formed. - -7. Conformance - - A mail composing program claiming compliance with this specification - MUST ensure that any string of non-white-space printable ASCII - characters within a '*text' or '*ctext' that begins with "=?" and - ends with "?=" be a valid 'encoded-word'. ("begins" means: at the - - - -Moore Standards Track [Page 10] - -RFC 2047 Message Header Extensions November 1996 - - - start of the field-body, immediately following 'linear-white-space', - or immediately following a "(" for an 'encoded-word' within '*ctext'; - "ends" means: at the end of the field-body, immediately preceding - 'linear-white-space', or immediately preceding a ")" for an - 'encoded-word' within '*ctext'.) In addition, any 'word' within a - 'phrase' that begins with "=?" and ends with "?=" must be a valid - 'encoded-word'. - - A mail reading program claiming compliance with this specification - must be able to distinguish 'encoded-word's from 'text', 'ctext', or - 'word's, according to the rules in section 6, anytime they appear in - appropriate places in message headers. It must support both the "B" - and "Q" encodings for any character set which it supports. The - program must be able to display the unencoded text if the character - set is "US-ASCII". For the ISO-8859-* character sets, the mail - reading program must at least be able to display the characters which - are also in the ASCII set. - -8. Examples - - The following are examples of message headers containing 'encoded- - word's: - - From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu> - To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk> - CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be> - Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= - =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?= - - Note: In the first 'encoded-word' of the Subject field above, the - last "=" at the end of the 'encoded-text' is necessary because each - 'encoded-word' must be self-contained (the "=" character completes a - group of 4 base64 characters representing 2 octets). An additional - octet could have been encoded in the first 'encoded-word' (so that - the encoded-word would contain an exact multiple of 3 encoded - octets), except that the second 'encoded-word' uses a different - 'charset' than the first one. - - From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se> - To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se - Subject: Time for ISO 10646? - - To: Dave Crocker <dcrocker@mordor.stanford.edu> - Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se - From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se> - Subject: Re: RFC-HDR care and feeding - - - - - -Moore Standards Track [Page 11] - -RFC 2047 Message Header Extensions November 1996 - - - From: Nathaniel Borenstein <nsb@thumper.bellcore.com> - (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=) - To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed - <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu> - Subject: Test of new header generator - MIME-Version: 1.0 - Content-type: text/plain; charset=ISO-8859-1 - - The following examples illustrate how text containing 'encoded-word's - which appear in a structured field body. The rules are slightly - different for fields defined as '*text' because "(" and ")" are not - recognized as 'comment' delimiters. [Section 5, paragraph (1)]. - - In each of the following examples, if the same sequence were to occur - in a '*text' field, the "displayed as" form would NOT be treated as - encoded words, but be identical to the "encoded form". This is - because each of the encoded-words in the following examples is - adjacent to a "(" or ")" character. - - encoded form displayed as - --------------------------------------------------------------------- - (=?ISO-8859-1?Q?a?=) (a) - - (=?ISO-8859-1?Q?a?= b) (a b) - - Within a 'comment', white space MUST appear between an - 'encoded-word' and surrounding text. [Section 5, - paragraph (2)]. However, white space is not needed between - the initial "(" that begins the 'comment', and the - 'encoded-word'. - - - (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab) - - White space between adjacent 'encoded-word's is not - displayed. - - (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=) (ab) - - Even multiple SPACEs between 'encoded-word's are ignored - for the purpose of display. - - (=?ISO-8859-1?Q?a?= (ab) - =?ISO-8859-1?Q?b?=) - - Any amount of linear-space-white between 'encoded-word's, - even if it includes a CRLF followed by one or more SPACEs, - is ignored for the purposes of display. - - - -Moore Standards Track [Page 12] - -RFC 2047 Message Header Extensions November 1996 - - - (=?ISO-8859-1?Q?a_b?=) (a b) - - In order to cause a SPACE to be displayed within a portion - of encoded text, the SPACE MUST be encoded as part of the - 'encoded-word'. - - (=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=) (a b) - - In order to cause a SPACE to be displayed between two strings - of encoded text, the SPACE MAY be encoded as part of one of - the 'encoded-word's. - -9. References - - [RFC 822] Crocker, D., "Standard for the Format of ARPA Internet Text - Messages", STD 11, RFC 822, UDEL, August 1982. - - [RFC 2049] Borenstein, N., and N. Freed, "Multipurpose Internet Mail - Extensions (MIME) Part Five: Conformance Criteria and Examples", - RFC 2049, November 1996. - - [RFC 2045] Borenstein, N., and N. Freed, "Multipurpose Internet Mail - Extensions (MIME) Part One: Format of Internet Message Bodies", - RFC 2045, November 1996. - - [RFC 2046] Borenstein N., and N. Freed, "Multipurpose Internet Mail - Extensions (MIME) Part Two: Media Types", RFC 2046, - November 1996. - - [RFC 2048] Freed, N., Klensin, J., and J. Postel, "Multipurpose - Internet Mail Extensions (MIME) Part Four: Registration - Procedures", RFC 2048, November 1996. - - - - - - - - - - - - - - - - - - - -Moore Standards Track [Page 13] - -RFC 2047 Message Header Extensions November 1996 - - -10. Security Considerations - - Security issues are not discussed in this memo. - -11. Acknowledgements - - The author wishes to thank Nathaniel Borenstein, Issac Chan, Lutz - Donnerhacke, Paul Eggert, Ned Freed, Andreas M. Kirchwitz, Olle - Jarnefors, Mike Rosin, Yutaka Sato, Bart Schaefer, and Kazuhiko - Yamamoto, for their helpful advice, insightful comments, and - illuminating questions in response to earlier versions of this - specification. - -12. Author's Address - - Keith Moore - University of Tennessee - 107 Ayres Hall - Knoxville TN 37996-1301 - - EMail: moore@cs.utk.edu - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Moore Standards Track [Page 14] - -RFC 2047 Message Header Extensions November 1996 - - -Appendix - changes since RFC 1522 (in no particular order) - - + explicitly state that the MIME-Version is not requried to use - 'encoded-word's. - - + add explicit note that SPACEs and TABs are not allowed within - 'encoded-word's, explaining that an 'encoded-word' must look like an - 'atom' to an RFC822 parser.values, to be precise). - - + add examples from Olle Jarnefors (thanks!) which illustrate how - encoded-words with adjacent linear-white-space are displayed. - - + explicitly list terms defined in RFC822 and referenced in this memo - - + fix transcription typos that caused one or two lines and a couple of - characters to disappear in the resulting text, due to nroff quirks. - - + clarify that encoded-words are allowed in '*text' fields in both - RFC822 headers and MIME body part headers, but NOT as parameter - values. - - + clarify the requirement to switch back to ASCII within the encoded - portion of an 'encoded-word', for any charset that uses code switching - sequences. - - + add a note about 'encoded-word's being delimited by "(" and ")" - within a comment, but not in a *text (how bizarre!). - - + fix the Andre Pirard example to get rid of the trailing "_" after - the =E9. (no longer needed post-1342). - - + clarification: an 'encoded-word' may appear immediately following - the initial "(" or immediately before the final ")" that delimits a - comment, not just adjacent to "(" and ")" *within* *ctext. - - + add a note to explain that a "B" 'encoded-word' will always have a - multiple of 4 characters in the 'encoded-text' portion. - - + add note about the "=" in the examples - - + note that processing of 'encoded-word's occurs *after* parsing, and - some of the implications thereof. - - + explicitly state that you can't expect to translate between - 1522 and either vanilla 822 or so-called "8-bit headers". - - + explicitly state that 'encoded-word's are not valid within a - 'quoted-string'. - - - -Moore Standards Track [Page 15] - diff --git a/proto/rfc2048.txt b/proto/rfc2048.txt @@ -1,1180 +0,0 @@ - - - - - - -Network Working Group N. Freed -Request for Comments: 2048 Innosoft -BCP: 13 J. Klensin -Obsoletes: 1521, 1522, 1590 MCI -Category: Best Current Practice J. Postel - ISI - November 1996 - - - Multipurpose Internet Mail Extensions - (MIME) Part Four: - Registration Procedures - -Status of this Memo - - This document specifies an Internet Best Current Practices for the - Internet Community, and requests discussion and suggestions for - improvements. Distribution of this memo is unlimited. - -Abstract - - STD 11, RFC 822, defines a message representation protocol specifying - considerable detail about US-ASCII message headers, and leaves the - message content, or message body, as flat US-ASCII text. This set of - documents, collectively called the Multipurpose Internet Mail - Extensions, or MIME, redefines the format of messages to allow for - - (1) textual message bodies in character sets other than - US-ASCII, - - (2) an extensible set of different formats for non-textual - message bodies, - - (3) multi-part message bodies, and - - (4) textual header information in character sets other than - US-ASCII. - - These documents are based on earlier work documented in RFC 934, STD - 11, and RFC 1049, but extends and revises them. Because RFC 822 said - so little about message bodies, these documents are largely - orthogonal to (rather than a revision of) RFC 822. - - - - - - - - - -Freed, et. al. Best Current Practice [Page 1] - -RFC 2048 MIME Registration Procedures November 1996 - - - This fourth document, RFC 2048, specifies various IANA registration - procedures for the following MIME facilities: - - (1) media types, - - (2) external body access types, - - (3) content-transfer-encodings. - - Registration of character sets for use in MIME is covered elsewhere - and is no longer addressed by this document. - - These documents are revisions of RFCs 1521 and 1522, which themselves - were revisions of RFCs 1341 and 1342. An appendix in RFC 2049 - describes differences and changes from previous versions. - -Table of Contents - - 1. Introduction ......................................... 3 - 2. Media Type Registration .............................. 4 - 2.1 Registration Trees and Subtype Names ................ 4 - 2.1.1 IETF Tree ......................................... 4 - 2.1.2 Vendor Tree ....................................... 4 - 2.1.3 Personal or Vanity Tree ........................... 5 - 2.1.4 Special `x.' Tree ................................. 5 - 2.1.5 Additional Registration Trees ..................... 6 - 2.2 Registration Requirements ........................... 6 - 2.2.1 Functionality Requirement ......................... 6 - 2.2.2 Naming Requirements ............................... 6 - 2.2.3 Parameter Requirements ............................ 7 - 2.2.4 Canonicalization and Format Requirements .......... 7 - 2.2.5 Interchange Recommendations ....................... 8 - 2.2.6 Security Requirements ............................. 8 - 2.2.7 Usage and Implementation Non-requirements ......... 9 - 2.2.8 Publication Requirements .......................... 10 - 2.2.9 Additional Information ............................ 10 - 2.3 Registration Procedure .............................. 11 - 2.3.1 Present the Media Type to the Community for Review 11 - 2.3.2 IESG Approval ..................................... 12 - 2.3.3 IANA Registration ................................. 12 - 2.4 Comments on Media Type Registrations ................ 12 - 2.5 Location of Registered Media Type List .............. 12 - 2.6 IANA Procedures for Registering Media Types ......... 12 - 2.7 Change Control ...................................... 13 - 2.8 Registration Template ............................... 14 - 3. External Body Access Types ........................... 14 - 3.1 Registration Requirements ........................... 15 - 3.1.1 Naming Requirements ............................... 15 - - - -Freed, et. al. Best Current Practice [Page 2] - -RFC 2048 MIME Registration Procedures November 1996 - - - 3.1.2 Mechanism Specification Requirements .............. 15 - 3.1.3 Publication Requirements .......................... 15 - 3.1.4 Security Requirements ............................. 15 - 3.2 Registration Procedure .............................. 15 - 3.2.1 Present the Access Type to the Community .......... 16 - 3.2.2 Access Type Reviewer .............................. 16 - 3.2.3 IANA Registration ................................. 16 - 3.3 Location of Registered Access Type List ............. 16 - 3.4 IANA Procedures for Registering Access Types ........ 16 - 4. Transfer Encodings ................................... 17 - 4.1 Transfer Encoding Requirements ...................... 17 - 4.1.1 Naming Requirements ............................... 17 - 4.1.2 Algorithm Specification Requirements .............. 18 - 4.1.3 Input Domain Requirements ......................... 18 - 4.1.4 Output Range Requirements ......................... 18 - 4.1.5 Data Integrity and Generality Requirements ........ 18 - 4.1.6 New Functionality Requirements .................... 18 - 4.2 Transfer Encoding Definition Procedure .............. 19 - 4.3 IANA Procedures for Transfer Encoding Registration... 19 - 4.4 Location of Registered Transfer Encodings List ...... 19 - 5. Authors' Addresses ................................... 20 - A. Grandfathered Media Types ............................ 21 - -1. Introduction - - Recent Internet protocols have been carefully designed to be easily - extensible in certain areas. In particular, MIME [RFC 2045] is an - open-ended framework and can accommodate additional object types, - character sets, and access methods without any changes to the basic - protocol. A registration process is needed, however, to ensure that - the set of such values is developed in an orderly, well-specified, - and public manner. - - This document defines registration procedures which use the Internet - Assigned Numbers Authority (IANA) as a central registry for such - values. - - Historical Note: The registration process for media types was - initially defined in the context of the asynchronous Internet mail - environment. In this mail environment there is a need to limit the - number of possible media types to increase the likelihood of - interoperability when the capabilities of the remote mail system are - not known. As media types are used in new environments, where the - proliferation of media types is not a hindrance to interoperability, - the original procedure was excessively restrictive and had to be - generalized. - - - - - -Freed, et. al. Best Current Practice [Page 3] - -RFC 2048 MIME Registration Procedures November 1996 - - -2. Media Type Registration - - Registration of a new media type or types starts with the - construction of a registration proposal. Registration may occur in - several different registration trees, which have different - requirements as discussed below. In general, the new registration - proposal is circulated and reviewed in a fashion appropriate to the - tree involved. The media type is then registered if the proposal is - acceptable. The following sections describe the requirements and - procedures used for each of the different registration trees. - -2.1. Registration Trees and Subtype Names - - In order to increase the efficiency and flexibility of the - registration process, different structures of subtype names may be - registered to accomodate the different natural requirements for, - e.g., a subtype that will be recommended for wide support and - implementation by the Internet Community or a subtype that is used to - move files associated with proprietary software. The following - subsections define registration "trees", distinguished by the use of - faceted names (e.g., names of the form "tree.subtree...type"). Note - that some media types defined prior to this document do not conform - to the naming conventions described below. See Appendix A for a - discussion of them. - -2.1.1. IETF Tree - - The IETF tree is intended for types of general interest to the - Internet Community. Registration in the IETF tree requires approval - by the IESG and publication of the media type registration as some - form of RFC. - - Media types in the IETF tree are normally denoted by names that are - not explicitly faceted, i.e., do not contain period (".", full stop) - characters. - - The "owner" of a media type registration in the IETF tree is assumed - to be the IETF itself. Modification or alteration of the - specification requires the same level of processing (e.g. standards - track) required for the initial registration. - -2.1.2. Vendor Tree - - The vendor tree is used for media types associated with commercially - available products. "Vendor" or "producer" are construed as - equivalent and very broadly in this context. - - - - - -Freed, et. al. Best Current Practice [Page 4] - -RFC 2048 MIME Registration Procedures November 1996 - - - A registration may be placed in the vendor tree by anyone who has - need to interchange files associated with the particular product. - However, the registration formally belongs to the vendor or - organization producing the software or file format. Changes to the - specification will be made at their request, as discussed in - subsequent sections. - - Registrations in the vendor tree will be distinguished by the leading - facet "vnd.". That may be followed, at the discretion of the - registration, by either a media type name from a well-known producer - (e.g., "vnd.mudpie") or by an IANA-approved designation of the - producer's name which is then followed by a media type or product - designation (e.g., vnd.bigcompany.funnypictures). - - While public exposure and review of media types to be registered in - the vendor tree is not required, using the ietf-types list for review - is strongly encouraged to improve the quality of those - specifications. Registrations in the vendor tree may be submitted - directly to the IANA. - -2.1.3. Personal or Vanity Tree - - Registrations for media types created experimentally or as part of - products that are not distributed commercially may be registered in - the personal or vanity tree. The registrations are distinguished by - the leading facet "prs.". - - The owner of "personal" registrations and associated specifications - is the person or entity making the registration, or one to whom - responsibility has been transferred as described below. - - While public exposure and review of media types to be registered in - the personal tree is not required, using the ietf-types list for - review is strongly encouraged to improve the quality of those - specifications. Registrations in the personl tree may be submitted - directly to the IANA. - -2.1.4. Special `x.' Tree - - For convenience and symmetry with this registration scheme, media - type names with "x." as the first facet may be used for the same - purposes for which names starting in "x-" are normally used. These - types are unregistered, experimental, and should be used only with - the active agreement of the parties exchanging them. - - - - - - - -Freed, et. al. Best Current Practice [Page 5] - -RFC 2048 MIME Registration Procedures November 1996 - - - However, with the simplified registration procedures described above - for vendor and personal trees, it should rarely, if ever, be - necessary to use unregistered experimental types, and as such use of - both "x-" and "x." forms is discouraged. - -2.1.5. Additional Registration Trees - - From time to time and as required by the community, the IANA may, - with the advice and consent of the IESG, create new top-level - registration trees. It is explicitly assumed that these trees may be - created for external registration and management by well-known - permanent bodies, such as scientific societies for media types - specific to the sciences they cover. In general, the quality of - review of specifications for one of these additional registration - trees is expected to be equivalent to that which IETF would give to - registrations in its own tree. Establishment of these new trees will - be announced through RFC publication approved by the IESG. - -2.2. Registration Requirements - - Media type registration proposals are all expected to conform to - various requirements laid out in the following sections. Note that - requirement specifics sometimes vary depending on the registration - tree, again as detailed in the following sections. - -2.2.1. Functionality Requirement - - Media types must function as an actual media format: Registration of - things that are better thought of as a transfer encoding, as a - character set, or as a collection of separate entities of another - type, is not allowed. For example, although applications exist to - decode the base64 transfer encoding [RFC 2045], base64 cannot be - registered as a media type. - - This requirement applies regardless of the registration tree - involved. - -2.2.2. Naming Requirements - - All registered media types must be assigned MIME type and subtype - names. The combination of these names then serves to uniquely - identify the media type and the format of the subtype name identifies - the registration tree. - - The choice of top-level type name must take the nature of media type - involved into account. For example, media normally used for - representing still images should be a subtype of the image content - type, whereas media capable of representing audio information belongs - - - -Freed, et. al. Best Current Practice [Page 6] - -RFC 2048 MIME Registration Procedures November 1996 - - - under the audio content type. See RFC 2046 for additional information - on the basic set of top-level types and their characteristics. - - New subtypes of top-level types must conform to the restrictions of - the top-level type, if any. For example, all subtypes of the - multipart content type must use the same encapsulation syntax. - - In some cases a new media type may not "fit" under any currently - defined top-level content type. Such cases are expected to be quite - rare. However, if such a case arises a new top-level type can be - defined to accommodate it. Such a definition must be done via - standards-track RFC; no other mechanism can be used to define - additional top-level content types. - - These requirements apply regardless of the registration tree - involved. - -2.2.3. Parameter Requirements - - Media types may elect to use one or more MIME content type - parameters, or some parameters may be automatically made available to - the media type by virtue of being a subtype of a content type that - defines a set of parameters applicable to any of its subtypes. In - either case, the names, values, and meanings of any parameters must - be fully specified when a media type is registered in the IETF tree, - and should be specified as completely as possible when media types - are registered in the vendor or personal trees. - - New parameters must not be defined as a way to introduce new - functionality in types registered in the IETF tree, although new - parameters may be added to convey additional information that does - not otherwise change existing functionality. An example of this - would be a "revision" parameter to indicate a revision level of an - external specification such as JPEG. Similar behavior is encouraged - for media types registered in the vendor or personal trees but is not - required. - -2.2.4. Canonicalization and Format Requirements - - All registered media types must employ a single, canonical data - format, regardless of registration tree. - - A precise and openly available specification of the format of each - media type is required for all types registered in the IETF tree and - must at a minimum be referenced by, if it isn't actually included in, - the media type registration proposal itself. - - - - - -Freed, et. al. Best Current Practice [Page 7] - -RFC 2048 MIME Registration Procedures November 1996 - - - The specifications of format and processing particulars may or may - not be publically available for media types registered in the vendor - tree, and such registration proposals are explicitly permitted to - include only a specification of which software and version produce or - process such media types. References to or inclusion of format - specifications in registration proposals is encouraged but not - required. - - Format specifications are still required for registration in the - personal tree, but may be either published as RFCs or otherwise - deposited with IANA. The deposited specifications will meet the same - criteria as those required to register a well-known TCP port and, in - particular, need not be made public. - - Some media types involve the use of patented technology. The - registration of media types involving patented technology is - specifically permitted. However, the restrictions set forth in RFC - 1602 on the use of patented technology in standards-track protocols - must be respected when the specification of a media type is part of a - standards-track protocol. - -2.2.5. Interchange Recommendations - - Media types should, whenever possible, interoperate across as many - systems and applications as possible. However, some media types will - inevitably have problems interoperating across different platforms. - Problems with different versions, byte ordering, and specifics of - gateway handling can and will arise. - - Universal interoperability of media types is not required, but known - interoperability issues should be identified whenever possible. - Publication of a media type does not require an exhaustive review of - interoperability, and the interoperability considerations section is - subject to continuing evaluation. - - These recommendations apply regardless of the registration tree - involved. - -2.2.6. Security Requirements - - An analysis of security issues is required for for all types - registered in the IETF Tree. (This is in accordance with the basic - requirements for all IETF protocols.) A similar analysis for media - types registered in the vendor or personal trees is encouraged but - not required. However, regardless of what security analysis has or - has not been done, all descriptions of security issues must be as - accurate as possible regardless of registration tree. In particular, - a statement that there are "no security issues associated with this - - - -Freed, et. al. Best Current Practice [Page 8] - -RFC 2048 MIME Registration Procedures November 1996 - - - type" must not be confused with "the security issues associates with - this type have not been assessed". - - There is absolutely no requirement that media types registered in any - tree be secure or completely free from risks. Nevertheless, all - known security risks must be identified in the registration of a - media type, again regardless of registration tree. - - The security considerations section of all registrations is subject - to continuing evaluation and modification, and in particular may be - extended by use of the "comments on media types" mechanism described - in subsequent sections. - - Some of the issues that should be looked at in a security analysis of - a media type are: - - (1) Complex media types may include provisions for - directives that institute actions on a recipient's - files or other resources. In many cases provision is - made for originators to specify arbitrary actions in an - unrestricted fashion which may then have devastating - effects. See the registration of the - application/postscript media type in RFC 2046 for - an example of such directives and how to handle them. - - (2) Complex media types may include provisions for - directives that institute actions which, while not - directly harmful to the recipient, may result in - disclosure of information that either facilitates a - subsequent attack or else violates a recipient's - privacy in some way. Again, the registration of the - application/postscript media type illustrates how such - directives can be handled. - - (3) A media type might be targeted for applications that - require some sort of security assurance but not provide - the necessary security mechanisms themselves. For - example, a media type could be defined for storage of - confidential medical information which in turn requires - an external confidentiality service. - -2.2.7. Usage and Implementation Non-requirements - - In the asynchronous mail environment, where information on the - capabilities of the remote mail agent is frequently not available to - the sender, maximum interoperability is attained by restricting the - number of media types used to those "common" formats expected to be - widely implemented. This was asserted in the past as a reason to - - - -Freed, et. al. Best Current Practice [Page 9] - -RFC 2048 MIME Registration Procedures November 1996 - - - limit the number of possible media types and resulted in a - registration process with a significant hurdle and delay for those - registering media types. - - However, the need for "common" media types does not require limiting - the registration of new media types. If a limited set of media types - is recommended for a particular application, that should be asserted - by a separate applicability statement specific for the application - and/or environment. - - As such, universal support and implementation of a media type is NOT - a requirement for registration. If, however, a media type is - explicitly intended for limited use, this should be noted in its - registration. - -2.2.8. Publication Requirements - - Proposals for media types registered in the IETF tree must be - published as RFCs. RFC publication of vendor and personal media type - proposals is encouraged but not required. In all cases IANA will - retain copies of all media type proposals and "publish" them as part - of the media types registration tree itself. - - Other than in the IETF tree, the registration of a data type does not - imply endorsement, approval, or recommendation by IANA or IETF or - even certification that the specification is adequate. To become - Internet Standards, protocol, data objects, or whatever must go - through the IETF standards process. This is too difficult and too - lengthy a process for the convenient registration of media types. - - The IETF tree exists for media types that do require require a - substantive review and approval process with the vendor and personal - trees exist for those that do not. It is expected that applicability - statements for particular applications will be published from time to - time that recommend implementation of, and support for, media types - that have proven particularly useful in those contexts. - - As discussed above, registration of a top-level type requires - standards-track processing and, hence, RFC publication. - -2.2.9. Additional Information - - Various sorts of optional information may be included in the - specification of a media type if it is available: - - (1) Magic number(s) (length, octet values). Magic numbers - are byte sequences that are always present and thus can - be used to identify entities as being of a given media - - - -Freed, et. al. Best Current Practice [Page 10] - -RFC 2048 MIME Registration Procedures November 1996 - - - type. - - (2) File extension(s) commonly used on one or more - platforms to indicate that some file containing a given - type of media. - - (3) Macintosh File Type code(s) (4 octets) used to label - files containing a given type of media. - - Such information is often quite useful to implementors and if - available should be provided. - -2.3. Registration Procedure - - The following procedure has been implemented by the IANA for review - and approval of new media types. This is not a formal standards - process, but rather an administrative procedure intended to allow - community comment and sanity checking without excessive time delay. - For registration in the IETF tree, the normal IETF processes should - be followed, treating posting of an internet-draft and announcement - on the ietf-types list (as described in the next subsection) as a - first step. For registrations in the vendor or personal tree, the - initial review step described below may be omitted and the type - registered directly by submitting the template and an explanation - directly to IANA (at iana@iana.org). However, authors of vendor or - personal media type specifications are encouraged to seek community - review and comment whenever that is feasible. - -2.3.1. Present the Media Type to the Community for Review - - Send a proposed media type registration to the "ietf-types@iana.org" - mailing list for a two week review period. This mailing list has - been established for the purpose of reviewing proposed media and - access types. Proposed media types are not formally registered and - must not be used; the "x-" prefix specified in RFC 2045 can be used - until registration is complete. - - The intent of the public posting is to solicit comments and feedback - on the choice of type/subtype name, the unambiguity of the references - with respect to versions and external profiling information, and a - review of any interoperability or security considerations. The - submitter may submit a revised registration, or withdraw the - registration completely, at any time. - - - - - - - - -Freed, et. al. Best Current Practice [Page 11] - -RFC 2048 MIME Registration Procedures November 1996 - - -2.3.2. IESG Approval - - Media types registered in the IETF tree must be submitted to the IESG - for approval. - -2.3.3. IANA Registration - - Provided that the media type meets the requirements for media types - and has obtained approval that is necessary, the author may submit - the registration request to the IANA, which will register the media - type and make the media type registration available to the community. - -2.4. Comments on Media Type Registrations - - Comments on registered media types may be submitted by members of the - community to IANA. These comments will be passed on to the "owner" - of the media type if possible. Submitters of comments may request - that their comment be attached to the media type registration itself, - and if IANA approves of this the comment will be made accessible in - conjunction with the type registration itself. - -2.5. Location of Registered Media Type List - - Media type registrations will be posted in the anonymous FTP - directory "ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/" - and all registered media types will be listed in the periodically - issued "Assigned Numbers" RFC [currently STD 2, RFC 1700]. The media - type description and other supporting material may also be published - as an Informational RFC by sending it to "rfc-editor@isi.edu" (please - follow the instructions to RFC authors [RFC-1543]). - -2.6. IANA Procedures for Registering Media Types - - The IANA will only register media types in the IETF tree in response - to a communication from the IESG stating that a given registration - has been approved. Vendor and personal types will be registered by - the IANA automatically and without any formal review as long as the - following minimal conditions are met: - - (1) Media types must function as an actual media format. - In particular, character sets and transfer encodings - may not be registered as media types. - - (2) All media types must have properly formed type and - subtype names. All type names must be defined by a - standards-track RFC. All subtype names must be unique, - must conform to the MIME grammar for such names, and - must contain the proper tree prefix. - - - -Freed, et. al. Best Current Practice [Page 12] - -RFC 2048 MIME Registration Procedures November 1996 - - - (3) Types registered in the personal tree must either - provide a format specification or a pointer to one. - - (4) Any security considerations given must not be obviously - bogus. (It is neither possible nor necessary for the - IANA to conduct a comprehensive security review of - media type registrations. Nevertheless, IANA has the - authority to identify obviously incompetent material - and exclude it.) - -2.7. Change Control - - Once a media type has been published by IANA, the author may request - a change to its definition. The descriptions of the different - registration trees above designate the "owners" of each type of - registration. The change request follows the same procedure as the - registration request: - - (1) Publish the revised template on the ietf-types list. - - (2) Leave at least two weeks for comments. - - (3) Publish using IANA after formal review if required. - - Changes should be requested only when there are serious omission or - errors in the published specification. When review is required, a - change request may be denied if it renders entities that were valid - under the previous definition invalid under the new definition. - - The owner of a content type may pass responsibility for the content - type to another person or agency by informing IANA and the ietf-types - list; this can be done without discussion or review. - - The IESG may reassign responsibility for a media type. The most - common case of this will be to enable changes to be made to types - where the author of the registration has died, moved out of contact - or is otherwise unable to make changes that are important to the - community. - - Media type registrations may not be deleted; media types which are no - longer believed appropriate for use can be declared OBSOLETE by a - change to their "intended use" field; such media types will be - clearly marked in the lists published by IANA. - - - - - - - - -Freed, et. al. Best Current Practice [Page 13] - -RFC 2048 MIME Registration Procedures November 1996 - - -2.8. Registration Template - - To: ietf-types@iana.org - Subject: Registration of MIME media type XXX/YYY - - MIME media type name: - - MIME subtype name: - - Required parameters: - - Optional parameters: - - Encoding considerations: - - Security considerations: - - Interoperability considerations: - - Published specification: - - Applications which use this media type: - - Additional information: - - Magic number(s): - File extension(s): - Macintosh File Type Code(s): - - Person & email address to contact for further information: - - Intended usage: - - (One of COMMON, LIMITED USE or OBSOLETE) - - Author/Change controller: - - (Any other information that the author deems interesting may be - added below this line.) - -3. External Body Access Types - - RFC 2046 defines the message/external-body media type, whereby a MIME - entity can act as pointer to the actual body data in lieu of - including the data directly in the entity body. Each - message/external-body reference specifies an access type, which - determines the mechanism used to retrieve the actual body data. RFC - 2046 defines an initial set of access types, but allows for the - - - -Freed, et. al. Best Current Practice [Page 14] - -RFC 2048 MIME Registration Procedures November 1996 - - - registration of additional access types to accommodate new retrieval - mechanisms. - -3.1. Registration Requirements - - New access type specifications must conform to a number of - requirements as described below. - -3.1.1. Naming Requirements - - Each access type must have a unique name. This name appears in the - access-type parameter in the message/external-body content-type - header field, and must conform to MIME content type parameter syntax. - -3.1.2. Mechanism Specification Requirements - - All of the protocols, transports, and procedures used by a given - access type must be described, either in the specification of the - access type itself or in some other publicly available specification, - in sufficient detail for the access type to be implemented by any - competent implementor. Use of secret and/or proprietary methods in - access types are expressly prohibited. The restrictions imposed by - RFC 1602 on the standardization of patented algorithms must be - respected as well. - -3.1.3. Publication Requirements - - All access types must be described by an RFC. The RFC may be - informational rather than standards-track, although standard-track - review and approval are encouraged for all access types. - -3.1.4. Security Requirements - - Any known security issues that arise from the use of the access type - must be completely and fully described. It is not required that the - access type be secure or that it be free from risks, but that the - known risks be identified. Publication of a new access type does not - require an exhaustive security review, and the security - considerations section is subject to continuing evaluation. - Additional security considerations should be addressed by publishing - revised versions of the access type specification. - -3.2. Registration Procedure - - Registration of a new access type starts with the construction of a - draft of an RFC. - - - - - -Freed, et. al. Best Current Practice [Page 15] - -RFC 2048 MIME Registration Procedures November 1996 - -