Notes on clause 8
MURATA Makoto
eb2m-mrt at asahi-net.or.jp
Fri Aug 15 01:35:54 CEST 2014
Caroline,
Thank you very much for your careful review and
comments. This is very helpful. I agree that a
high-level overview is missing.
[snip]
>
> ===
>
> I started to read clause 8 and found the section very hard to understand in a holistic way, either by reading in order or through the headings in the table of contents. The clause reads like chunks of un-related text. The forest is obscured by the trees (or weeds).
>
> I believe we need to:
> A. do a much better job of explaining things at a higher level (not only "what" but "why") and providing "sign-posts" in the narrative that guide the reader;
I hope that SC34-WG4_N0279 (esp. "My understanding of Design
Goals of the pack URI scheme" in page 7) can provide a basis
of such an overview. At least, it provides an answer to one
of your comments (see below).
> B. consider whether a different organization into clauses and sub-clauses is warranted, particularly related to where the Pack URI stuff goes.
>
> B really needs to be done before A. I don't have a particular recommendation to make on re-organization (at least not at this stage) but believe there should be a discussion on this before long. Here are some questions/comments I have:
>
> Clauses 8.2.4 (Growth Hint), and 8.2.5 (XML Usage) seem completely overwhelmed by the detailed content of 8.2.2 (Part Names) and 8.3 (Pack URI scheme) which appear to be (and indeed are) related. The flow between 8.2.5 and 8.3 is particularly jolting.
This is true. From the beginning, 8.2.4 and 8.2.5 are not related with
the rest of 8.2.
> Is the Pack URI scheme really part of the Package Model (the top heading for clause 8)? I believe the objective of the pack: scheme is to permit references to parts (and part fragments) from OUTSIDE the package. To me, it is important to understand the key internal structures first, including Relationships. For one thing, I see relationships parts as soon as I look inside a package. I don't see Pack URIs, only path segments (or isegments).
SC34-WG4_N0279 (esp. "My understanding of Design
Goals of the pack URI scheme" in page 7) provides four goals.
One of them is:
C)The same mechanism should be used for referencing from inside
and outside.
I strongly believe that Pack URI schemes are for both for INSIDE
and OUTSIDE references. This is the key reason why I put
pack URIs in 8.3.
To me, part names, pack URIs, base URIs, and relative references
are first class citizens. They can possibly be extended for
other package formats. Relationships look second-class citizens.
They are specific to OPC , and is intended to make the migration
from 8.3 file names easier.
INSIDE references to parts are relative references.
Resolution of relative references is not understandable unless
you understand what is a base URI. Pack URIs provide base
URIs (8.4 now says so).
> Treating Pack URI in an annex did have the advantage of being able to ignore it until you had grasped the essential elements of the package model. And the headings for clauses and subclauses in clause 8 of the previous edition did a somewhat better job of suggesting a logical structure that helped in understanding the model as a whole and knowing where to find what. However, the function of Pack URI was obscured because its function was inadequately introduced in the main text. I realize that I don't have a good sense of when it is used in practice.
I think that treating Pack URIs in an annex significantly hampers
understanding of the package model. Without understanding
pack URIs, we cannot understand the OPC package model.
> Did you have a particular reason for locating Pack URI as 8.3? Might it fit better after 8.6 Relationships or in a new top level clause between 8 (Package Model) and 9 (Physical Package).
See above.
>
> I'm not certain whether 8.4 Base IRIs and 8.5 Resolving Relative References (non-normative) apply only to the use of pack: or also in other contexts. That would determine where those clauses should go. I do find myself wondering whether we should revive Part Addressing as a clause or subclause heading and put Pack URI scheme, Base IRIs, and Resolving Relative References as subsidiary clauses.
Base URIs and resolution of relative references are introduced
in RFCs 3986 and 3987. 8.4 and 8.5 explain how generic rules
in the two RFCs are adjusted for OPC.
>
> ======= now for some notes on some particular subclauses===========
>
> 8.2.2 Part Names
>
> ***Related terms and definitions
> Text in 8.2.2.1 doesn't seem compatible with the trimmed definition in Terms
> and Definitions. 8.2.2.1 is better.
>
> old version in Terms and Defs:
> 4.27
> part name
> The path component of a pack URI. Part names are used to refer to
>a part in the context of a package, typically
> as part of a URI.
>
> Rex trimmed to path component of a pack URI
> Suggested replacement: Unicode string that identifies a part in the context of a package
Your suggestion looks nice to me.
> I think the fact that a path component of a pack URI is a part name belongs in the text not a definition.
>
> The trimmed definition of Pack URI (URI scheme that allows URIs to be used as a uniform mechanism for addressing parts within a package) also needs work. As we use the term, a "Pack URI" is not a scheme, but an instance compliant with the scheme syntax and rules. Do we need a clause 3 entry for Pack IRI too?
We should replace "pack URI" by "pack URI scheme" in Clause 2.
"Pack URI" is a casual term for "URIs of the pack URI scheme".
>
> *** References to RFC 3986 and RFC 3987 -- and URI/IRI confusion ***
> This is really several issues.
> 1. I think we need a sentence or two to introduce URI and IRI, particularly given that you have added a section defining a pack_IRI as distinct from a pack_URI. I need to fully understand the intent of the changes wrt URI and IRI in order to come up with suggested wording and look for inconsistencies. I believe that one important modification you are looking for to Part 3 is to let part names contain non-ASCII characters without percent-encoding. Is that the only change to the model?
Historically, OPC part names were restricted to ASCII names.
This means that non-ASCII characters have to be percent encoded.
However, Annex A appear to allow verbatim non-ASCII as part
of relative references.
Because of the Japanese comment, verbatim non-ASCII was
allowed as part of OPC part names. (EPUB 3 does so too.)
But what is published is only halfly cooked.
Ideally, we should always use IRIs and avoid URIs. But
the way RFCs 3986 and 3987 are written does not allow
such a simple approach. We have to define both
URIs and IRIs of the pack scheme.
>
> 2. I think we need to introduce RFC 3986 and RFC 3987 briefly in the text and relate them to URI and IRI.
Agreed.
>This could be combined with addressing point 1, possibly near beginning of 8.2.2. Also applies to 8.3 (Pack URI scheme), 8.4 (Base IRIs/URIs), and 8.5 (Resolving relative references). I believe another of your objectives was to avoid duplication of content from the RFCs but to rely on referring to them. Am I right?
Yes, you are.
>Would it work to use an Annex to introduce URIs, IRIs, RFC 3986, RFC 3987 and how OPC uses them (and varies from them)? Should such an Annex be normative or informative?
I do not think an Annex would work. These terms
are so crucial for the package model.
> 3. There are various references to using terms as defined in the two RFCs. I think we need a consistent approach to listing these terms. This could be in the text near each ABNF chunk, making sure that all relevant terms are covered. Alternatively, it could be by putting all the terms into Terms and Definitions, with associated notes that indicate which RFC they are defined in.
I think that putting all such terms into Terms and Defs
is a good idea.
>
> 4. In 8.3.2 we should probably say explicitly that the syntax of the Pack URI scheme can be extended to IRIs. I haven't chased down whether a pack_IRI as you have added it in 8.3.2 is still a valid pack URI according to the provisional scheme registered with IANA or not. But clarifying text might be different depending on the answer.
The two sets of BNFs in 8.3.2 differ in two main points:
iunreserved and ipchar rather than unreserved and pchar.
Adding a note about this would help.
>
> 5. May need to change "URI" to "URI or IRI" or to "IRI" in more places. I'm very confused on this.
This is confusing, but unavoidable as long as we use
RFCs 3986 and 3987.
> 6. 8.4 is entitled Base IRIs but mostly discusses URIs and 3986.
True. This is because RFC 3987 simply relies on RFC 3986
for the resolution of relative references. See
http://tools.ietf.org/html/rfc3987#section-6.5
> *** Base URIs/IRIs
> 8.4
> For "This subclause defines" to be true, I think the first Note needs to be regular text, i.e., normative rather than informative.
The case statement is the core of 8.4, and is normative.
We have to make this clearer.
>
> Section 5.1 of RFC is NOT a four step model, but four ways (cases) of determining a Base URI in order of precedence. So I believe both notes to be incorrect, the first in detail, the second in substance.
I do not see any differences between
(1) steps and (2) cases having precedence.
Should we say "ways", since it is the term
used in RFC 3986?
> First Note: Use "case" (or some equivalent word) rather than "step."
I do not buy this.
> Second Note: Case 3 and 4 will only come into play if case 2 (being described for OPC in this subclause) is not applicable. The relationship between what is "defined" here and the other cases in 3986 needs to be clarified.
This is what I tried in the two notes in 8.4. Change
proposals are very welcome.
>
> ****Resolving relative references ***
> 8.5
> One underlying issue is now clear to me: there may be references that are intended to function like "relative references" as defined in RFC 3986, but are not technically valid as defined there or in 3987. For example, they may have backward slashes that should be interpreted as forward slashes. I will be happy to try and produce some clearer wording when I understand more fully. Does the resolution process as described in 8.5 come into play only within a pack URI? All the examples are with pack URIs. However, in the published Part 2, annex A is not tied to Pack URIs, but relates to URIs in general.
This what we extensively discussed in Prague. See "Relative
references (A.3)"
from John on August 9th, 2014.
Regards,
Makoto
>
>
>
>
> Caroline Arms
> Library of Congress Contractor
> Co-compiler of Sustainability of Digital Formats resource
> http://www.digitalpreservation.gov/formats/
>
> ** Views expressed are personal and not necessarily those of the institution **
> ________________________________________
> From: Arms, Caroline [caar at loc.gov]
> Sent: Wednesday, August 13, 2014 10:38 AM
> To: MURATA Makoto; SC34
> Subject: Notes on latest draft of Part 2
>
> All (particularly Rex and Murata-san),
>
> As promised, I have been reading the June MM0619 draft by Murata-san that is on the Assembla site.
>
> Since most of the changes relate to moving most of Annex A into clause 8, I will send notes for clause 8 and Annex A separately -- not immediately. Below (and attached as a text file) are notes on the other clauses.
>
> 1. Scope
>
> General editorial comments:
> (a) This clause shows several significant deletions. Somewhat equivalent text is in 7. Overview. We should check how things will look in the ISO preview (online browsing platform). Scope will show but I'm not sure that clause 7 would. We should check whether any text deleted from Scope should be recovered to get relevant words into either Scope or Overview.
>
> (b) The ... bulleted list seems to be temporary. It definitely needs work for grammatical consistency.
>
> Typo: "dissiminated" should be "disseminated"
>
> 2. Conformance
>
> General editorial comment:
> (a) Could do with improved distinction between statement about conformance for OPC (this part) and conformance in the multi-part standard.
>
> 3. Normative References
> Presumably should update the Augmented BNF RFC 4234 to RFC 5234, since 4234 is obsolete. [Note: New text from Murata-san refers to RFC 2234 (an even older obsolete version) and uses EBNF rather than ABNF. Presumably all references should use RFC 5234 and "ABNF."]
>
> 4. Terms and Definitions
> I notice that some ISO standards have Notes associated with definitions. If this is acceptable practice we might consider whether any of the terms warrant a note as clarification.
>
> Clearly this a clause to revisit when the rest of the document is done. The first term confused me but I assume we will make a holistic pass through Terms and Definitions later and haven't reviewed this clause in detail.
>
> access style
> style in which local access or networked access is conducted
>
> Proposed replacement: style in which local or networked access to a package is conducted
> [Aside: Personally, I might use "mode" rather than "style", but I assume we are only addressing DRs and significant shortcomings in the document.]
>
> Murata-san introduced the terms NFC and NFD for the two normalization approaches applied commonly for UNICODE. We should either spell out in full and prefix with "Unicode" (my preference) or add to Terms and Definitions.
>
> 5. Notational Conventions
> Trusting this to Rex.
>
> 6. General Description
> Will need to look back at this. Murata-san's changes have made Annex A informative, for example.
>
> 7. Overview
> See my notes on 1. Scope above.
>
> 8. Package Model
> Will deal with in separate set of notes apart from a few small details.
>
> There are still several internal references and example captions that need A to be changed to 8.
>
> I assume "EBNF" should be "ABNF". Latest RFC for ABNF is RFC 5234. 2234 and 4234 are obsolete.
>
> 9. Physical Package
> 10. Core Properties
> 11. Thumbnails
> No changes have been made. Will need to check Annex references.
>
> 12. Digital Signatures
> Ignoring for now.
>
> Annexes
> Ignoring from now. Will make notes on Annex A in conjunction with clause 8.
>
> Hoping this helps. Caroline
>
> Caroline Arms
> Library of Congress Contractor
> Co-compiler of Sustainability of Digital Formats resource
> http://www.digitalpreservation.gov/formats/
>
> ** Views expressed are personal and not necessarily those of the institution **
> ________________________________________
> From: eb2mmrt at gmail.com [eb2mmrt at gmail.com] On Behalf Of MURATA Makoto [eb2m-mrt at asahi-net.or.jp]
> Sent: Thursday, June 26, 2014 11:16 PM
> To: SC34
> Subject: My personal draft of OPC revision
>
> Dear colleagues,
>
> On the basis of discussions in Prague, I updated my personal
> rewrite. It is available at:
>
> https://www.assembla.com/spaces/IS29500/documents
>
> The first file (OfficeOpenXMLPart2-OpenPackagingConventionsMM0619)
> is the draft.
>
> Regards,
> Makoto
--
Praying for the victims of the Japan Tohoku earthquake
Makoto
More information about the sc34wg4
mailing list