Revised text for OPC

Arms, Caroline caar at loc.gov
Mon Jun 8 23:06:57 CEST 2015


I promised that I would comment on Murata-san’s latest rewrite of the text related to media/content types.
I chose to ignore my earlier comments and approach it again as a new reader.  So there may be some repetition of points I made before that Murata-san argued against.  Most of the changes look fine.

All page numbers relate to OPC WD1 DSig-JH2 and C2MRev.docx as I see it but automated changes relating to renumbering in Clause 4 resulted in added pages in my copy.

General issues:

1.  Inconsistent capitalization for Media Types, Media Types Stream, and Media Types Stream Markup.  This is a problem throughout.  The question is what capitalization choices SHOULD be made given that the name of the stream is [/[Content_Types].xml]

Some clause/subclause headings are not capitalized where they should be.  I find the use of "Media Types stream" in the narrative slightly confusing given that it does not match the stream name.  I might prefer "media types stream" except in headings. I suggest the group as a whole discuss this issue in London.

2.  How early should we indicate that the "media types stream" is called [/[Content_Types].xml]
Currently it is not explicitly stated as text until a table in 9.3.1 and in the narrative in 9.3.7.  However, "ContentType"  shows up in diagrams 9.2.3.3.  I strongly believe that the stream's name should be mentioned informatively earlier than 9.3.  My preference would be to add a note to the definition of "media type stream" in Clause 4.  Alternatively (or additionally), add a mention in 9.2.3.2.

3.  I think we have to use different phrases for the relationship of media types to 2046 and 7231.  The current text uses "as defined in RFC xxxx" for both.  Careful readers will see that as an inconsistency.  My suggestion is to use "using the syntax provided in RFC 7231 3.1.1.1".  Note that this clause in RFC 7231 refers to RFC 2046, which is useful.  Or we might refer to Appendix D which also connects OWS (optional white space as included in the BNF in 3.1.1.1) to its definition in RFC 7230.

4.  I suspect that Murata-san's changes do not take into account what I believe to be an automated relationship between requirements in the text and those in Annex G.  I would like to emphasize that the separate list of requirements is likely to be very useful to folks doing validation of OPC packages, for example in the workflow for ingest into a preservation repository.

Detailed issues:

Clause 2.  Conformance (p.  2)
  I suggest we pick a different conformance example (used to show how requirements are indicated in the text) to raising the issue about the stream that is called [/[Content_Types].xml] but now described by the phrase "media types stream."  Perhaps use one of the XML Usage requirements in 8.2.5.  Several of these requirements seem self-explanatory.

Clause 3. Normative References (p. 3)
  Just noting that RFC 2616 is still referenced in 8.2.3.  However, I believe that is in a section that Murata-san hopes to delete.

Clause 4. Terms and Definitions (p. 5)
  I see no reason for upper case in the term here.  Change "specially" to "special"

My preference would be to add a note to indicate that the stream is called [/[Content_Types].xml] when using a ZIP-based package as described in clause 9.

Clause 8. Package Model

8.2.3 (p.17)

I suggest a slight rewording of Murata-san's added text.  This also needs to be assigned a requirement number.

"This specification uses MIME media types as defined in RFC 2046 to identify the type of content stored in parts.  Each part shall have a media type (e.g., application/xml) which may have a set of parameters."

Would it be more appropriate to use as an example a media type/subtype combination that appears in RFC 2046?

With all the changes, I am unable to figure out exactly which paragraphs Murata-san wishes to delete.  Looking at in my copy of Word it looks as though the proposal includes deleting what seem to be significant requirements, e.g., requirement M1.15.  I see no problem in deleting the syntax presentation if others agree that it is covered adequately in cited RFCs.

However, since RFC 2046 does not provide full BNF for the media type and I believe the syntax for a media type should be specified in Clause 8 and not only in Clause 9, I would like to see an RFC that does supply the BNF cited here too.  M1.13 (here, but possibly in the proposed deletion) is the syntax requirement for mediatypes. Murata-san did modify the text for M1.13 in Annex G to refer to RFC 7231.

I'll leave it to the BNF and regex experts to thrash out what really needs to be in this Clause 8 section.  RFC 7231 doesn't get mentioned until Clause 9.

8.2.5  (p. 18)
There might usefully be a pointer to Annex C for the schemas for the "XML Content" this subclause relates to, perhaps in the [Note] in the first para.

8.3.5 (p. 23)
[M7.3] doesn't seem to exist any more.  [not to do with media/content type -- just noticed in passing, since an apparent reference to a requirement rather than a subclause seemed odd.]

8.5.4 third para (p. 29)
may have missing space after mediatype

Clause 9. Physical package.

Note: If M1.13 is not in Clause 8, an equivalent requirement certainly needs to be added to the main text in clause 9 and refer to RFC 7231.

9.2.3.2 (p. 34)
  I suggest rewording
"For all other physical package formats, the package implementer should include an XML stream, known as the Media Types stream, in the package. [S2.2] The Media Types stream shall not be mapped to a part by the package implementer. [M2.1] This stream is therefore not URI-addressable. However, it can be interleaved in the physical package using the same mechanisms used for interleaving parts."
  to
"For all other physical package formats, the package implementer should include an XML stream, described in this specification as the Media Types stream, in the package. [S2.2] The Media Types stream shall not be mapped to a part by the package implementer. [M2.1] This stream is therefore not URI-addressable. However, it can be interleaved in the physical package using the same mechanisms used for interleaving parts."

If people are against adding a note to the definition of "media types stream", could we insert an example here that gives the /[Content_Types].xml name as an example for a stream in a ZIP Archive as described in Clause 9.3.?  That would partly explain the ContentType attributes appearing in the tables below.  Otherwise perhaps we need a reference to Annex C for the schema for the Media Types Stream.


9.2.3.3 (starts at p. 35)
Suggest changing each use of "as defined in RFC 7231" to "using the syntax provided in RFC 7231 3.1.1.1".  See general issue 3.


9.3.7 (p.45)
Murata-san is proposing deletion of two paragraphs, one of which is requirement M3.10.   Perhaps need to apply number M3.10 to the new paragraph above.

I’ve attached a text file with (almost) the same content.  A just fixed a character set problem for [ and ]  -- presumably generated  when moving a text file from the Mac.

I hope this will be useful at the meeting in London.  I am going to be sailing for a month starting this weekend and will not be checking my Library of Congress e-mail while away.

   Have a good time in London.

   Caroline




Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource    http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **

From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA Makoto
Sent: Sunday, May 10, 2015 5:23 AM
To: SC34
Subject: Revised text for OPC

Folks,

In the draft minutes, I was tasked to "make a further update to the
draft text to incorporate references to RFC 2045 and/or RFC 2046 for
the concepts of media type and subtype and to RFC 7231 (replacing the
reference to RFC 2616) for the syntactic definition (ABNF) in clause 8
Package Model."  I tried to do the task.

Later, Caroline sent some useful comments from a more careful look at
the latest draft.  I accepted most of her comments and even proposed
text (thank you, Caroline!).  My disposition is shown in an attached
document ("MediaTypeCheck...").

Please find my rewrite.  Although I already sent a rewrite of the
Content Type regular expression (see the threead "Rewriting the regexp
for @ContentType in opc-contentTypes.xsd"), I have not incorporated
my rewrite yet.

Regards,
Makoto


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20150608/008de103/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Review20150608.txt
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20150608/008de103/attachment-0001.txt>


More information about the sc34wg4 mailing list