Which RFC(s) for media type should we refer to?

Arms, Caroline caar at loc.gov
Mon Oct 20 23:04:53 CEST 2014


All,

I started back on the Content type vs. Media type issue and ran into the question of which RFC(s) we should refer to, thinking that would be a good place to start thinking about rewording things.  It's not so simple!

Part 2 currently refers to RFC 2616, which may not have been the most appropriate RFC but that is now moot, because 2616 is obsolete and has been replaced by a group of RFCs including RFC 7231 which refers to RFC 2046 in its Media Type subclause but does not elaborate on what media-type actually is.  RFC 7231 provides ABNF for media-type, but you need to refer to RFC 7230 for an explanation of "OWS" -- used in the ABNF.  RFC 2046 lists the top-level media types and common subtypes.  It discusses parameters.  Its introduction refers to RFC 2045 for the Content Type context and to RFC 822 for all relevant ABNF not found in its Appendix A: Collected Grammar.  Media-type is not mentioned in Appendix A.  RFC 2045 has a copy of the relevant ABNF from RFC 822.

More detailed detective work with URLs  is attached below.

The question will be how best to refer to this in Part 2.   RFC 7231 is most convenient for getting the ABNF syntax, but you need RFC 2046 to understand the semantics.

   To be continued, no doubt ...

   Caroline

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **

==== DETAILED detective work ====

Part 2 currently refers to RFC 2616

https://www.mnot.net/blog/2014/06/07/rfc2616_is_dead

http://www.rfc-editor.org/info/rfc2616  is marked as obsolete

So I went to one of the replacement RFCs

http://tools.ietf.org/html/rfc7231

3.1.1.1. Media Type

   HTTP uses Internet media types [RFC2046] in the Content-Type
   (Section 3.1.1.5) and Accept (Section 5.3.2) header fields in order
   to provide open and extensible data typing and type negotiation.
   Media types define both a data format and various processing models:
   how to process that data in accordance with each context in which it
   is received.

     media-type = type "/" subtype *( OWS ";" OWS parameter )
     type       = token
     subtype    = token

   The type/subtype MAY be followed by parameters in the form of
   name=value pairs.

     parameter      = token "=" ( token / quoted-string )

   The type, subtype, and parameter name tokens are case-insensitive.
   Parameter values might or might not be case-sensitive, depending on
   the semantics of the parameter name.  The presence or absence of a
   parameter might be significant to the processing of a media-type,
   depending on its definition within the media type registry.

   A parameter value that matches the token production can be
   transmitted either as a token or within a quoted-string.  The quoted
   and unquoted values are equivalent.  For example, the following
   examples are all equivalent, but the first is preferred for
   consistency:

     text/html;charset=utf-8
     text/html;charset=UTF-8
     Text/HTML;Charset="utf-8"
     text/html; charset="utf-8"

   Internet media types ought to be registered with IANA according to
   the procedures defined in [BCP13].

      Note: Unlike some similar constructs in other header fields, media
      type parameters do not allow whitespace (even "bad" whitespace)
      around the "=" character.

===  aside on OWS  -- optional whitespace ===

    OWS           = <OWS, see [RFC7230], Section 3.2.3>

http://tools.ietf.org/html/rfc7230#section-3.2.3

3.2.3. Whitespace

   This specification uses three rules to denote the use of linear
   whitespace: OWS (optional whitespace), RWS (required whitespace), and
   BWS ("bad" whitespace).

   The OWS rule is used where zero or more linear whitespace octets
   might appear.  For protocol elements where optional whitespace is
   preferred to improve readability, a sender SHOULD generate the
   optional whitespace as a single SP; otherwise, a sender SHOULD NOT
   generate optional whitespace except as needed to white out invalid or
   unwanted protocol elements during in-place message filtering.

   The RWS rule is used when at least one linear whitespace octet is
   required to separate field tokens.  A sender SHOULD generate RWS as a
   single SP.

   The BWS rule is used where the grammar allows optional whitespace
   only for historical reasons.  A sender MUST NOT generate BWS in
   messages.  A recipient MUST parse for such bad whitespace and remove
   it before interpreting the protocol element.

     OWS            = *( SP / HTAB )
                    ; optional whitespace
     RWS            = 1*( SP / HTAB )
                    ; required whitespace
     BWS            = OWS
                    ; "bad" whitespace

==== end of OWS digression


http://tools.ietf.org/html/rfc2046

Multipurpose Internet Mail Extensions (MIME) Part Two:  Media Types

Introduction

   The first document in this set, RFC 2045, defines a number of header
   fields, including Content-Type. The Content-Type field is used to
   specify the nature of the data in the body of a MIME entity, by
   giving media type and subtype identifiers, and by providing auxiliary
   information that may be required for certain media types.  After the
   type and subtype names, the remainder of the header field is simply a
   set of parameters, specified in an attribute/value notation.  The
   ordering of parameters is not significant.

   In general, the top-level media type is used to declare the general
   type of data, while the subtype specifies a specific format for that
   type of data.  Thus, a media type of "image/xyz" is enough to tell a
   user agent that the data is an image, even if the user agent has no
   knowledge of the specific image format "xyz".  Such information can
   be used, for example, to decide whether or not to show a user the raw
   data from an unrecognized subtype -- such an action might be
   reasonable for unrecognized subtypes of "text", but not for
   unrecognized subtypes of "image" or "audio".  For this reason,
   registered subtypes of "text", "image", "audio", and "video" should
   not contain embedded information that is really of a different type.
   Such compound formats should be represented using the "multipart" or
   "application" types.

   Parameters are modifiers of the media subtype, and as such do not
   fundamentally affect the nature of the content.  The set of
   meaningful parameters depends on the media type and subtype.  Most
   parameters are associated with a single specific subtype.  However, a
   given top-level media type may define parameters which are applicable
   to any subtype of that type.  Parameters may be required by their
   defining media type or subtype or they may be optional.  MIME
   implementations must also ignore any parameters whose names they do
   not recognize.

RFC 2046 lists the top-level media types and their subtypes.  As shown in the excerpt above, it refers to RFC 2045 for the Content Type header field in the Introduction.

ABNF for media type is not defined in RFC 2046 but is defined in RFC
2045 which copies it from RFC 822.  RFC 2046 has a Collected Grammar appendix which refers to RFC 822.

http://tools.ietf.org/html/rfc2045#page-12

5.1. Syntax of the Content-Type Header Field

   In the Augmented BNF notation of RFC 822, a Content-Type header field
   value is defined as follows:

     content := "Content-Type" ":" type "/" subtype
                *(";" parameter)
                ; Matching of media type and subtype
                ; is ALWAYS case-insensitive.

     type := discrete-type / composite-type

     discrete-type := "text" / "image" / "audio" / "video" /
                      "application" / extension-token

     composite-type := "message" / "multipart" / extension-token

     extension-token := ietf-token / x-token

     ietf-token := <An extension token defined by a
                    standards-track RFC and registered
                    with IANA.>

     x-token := <The two characters "X-" or "x-" followed, with
                 no intervening white space, by any token>

     subtype := extension-token / iana-token

     iana-token := <A publicly-defined extension token. Tokens
                    of this form must be registered with IANA
                    as specified in RFC 2048.>

     parameter := attribute "=" value

     attribute := token
                  ; Matching of attributes
                  ; is ALWAYS case-insensitive.

     value := token / quoted-string

     token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                 or tspecials>

     tspecials :=  "(" / ")" / "<" / ">" / "@" /
                   "," / ";" / ":" / "\" / <">
                   "/" / "[" / "]" / "?" / "="
                   ; Must be in quoted-string,
                   ; to use within parameter values


More information about the sc34wg4 mailing list