Notes on clause 8

Arms, Caroline caar at loc.gov
Thu Aug 14 16:36:04 CEST 2014


All (not only Murata-san),

Understanding clause 8 has been a real struggle.  Below, and attached as a text file, are notes that I have accumulated over several readings, re-readings, referring to cited RFCs, and trying to see what has been lost from the 3rd edition and whether the deletion was deliberate.  What I am hoping is that my struggle (as reflected in the notes) can be a useful guide as to how to make the document more comprehensible.  It would be useful to get into the draft (perhaps as drafting notes) a list of the objectives for the revised standard, distinguishing between changes to the technical specification and changes to the presentation of the specification.  

The notes are not necessarily complete, but I have no more time to devote to this task today (and may be not tomorrow either).  I wanted to get something out in plenty of time for people to read it before the call next Thursday and possibly provide answers to questions I raise here about things I don't understand in time for me to take another look before the call.  

To be continued....   Caroline

===

I started to read clause 8 and found the section very hard to understand in a holistic way, either by reading in order or through the headings in the table of contents.  The clause reads like chunks of un-related text.  The forest is obscured by the trees (or weeds).  

I believe we need to:
A.  do a much better job of explaining things at a higher level (not only "what" but "why") and providing "sign-posts" in the narrative that guide the reader;
B. consider whether a different organization into clauses and sub-clauses is warranted, particularly related to where the Pack URI stuff goes.

B really needs to be done before A.  I don't have a particular recommendation to make on re-organization (at least not at this stage) but believe there should be a discussion on this before long.  Here are some questions/comments I have:

Clauses 8.2.4 (Growth Hint), and 8.2.5 (XML Usage) seem completely overwhelmed by the detailed content of 8.2.2 (Part Names) and 8.3 (Pack URI scheme) which appear to be (and indeed are) related.  The flow between 8.2.5 and 8.3 is particularly jolting.

Is the Pack URI scheme really part of the Package Model (the top heading for clause 8)?  I believe the objective of the pack: scheme is to permit references to parts (and part fragments) from OUTSIDE the package.  To me, it is important to understand the key internal structures first, including Relationships.  For one thing, I see relationships parts as soon as I look inside a package.  I don't see Pack URIs, only path segments (or isegments).

Treating Pack URI in an annex did have the advantage of being able to ignore it until you had grasped the essential elements of the package model. And the headings for clauses and subclauses in clause 8 of the previous edition did a somewhat better job of suggesting a logical structure that helped in understanding the model as a whole and knowing where to find what.  However, the function of Pack URI was obscured because its function was inadequately introduced in the main text.  I realize that I don't have a good sense of when it is used in practice.

Did you have a particular reason for locating Pack URI as 8.3?  Might it fit better after 8.6 Relationships or in a new top level clause between 8 (Package Model) and 9 (Physical Package).  

I'm not certain whether 8.4 Base IRIs and 8.5 Resolving Relative References (non-normative) apply only to the use of pack: or also in other contexts.  That would determine where those clauses should go. I do find myself wondering whether we should revive Part Addressing as a clause or subclause heading and put Pack URI scheme, Base IRIs, and Resolving Relative References as subsidiary clauses.

======= now for some notes on some particular subclauses===========

8.2.2  Part Names

***Related terms and definitions
Text in 8.2.2.1 doesn't seem compatible with the trimmed definition in Terms and Definitions.  8.2.2.1 is better.

old version in Terms and Defs:
4.27
part name
The path component of a pack URI. Part names are used to refer to a part in the context of a package, typically
as part of a URI.

Rex trimmed to     path component of a pack URI 
Suggested replacement:   Unicode string that identifies a part in the context of a package

I think the fact that a path component of a pack URI is a part name belongs in the text not a definition.

The trimmed definition of Pack URI (URI scheme that allows URIs to be used as a uniform mechanism for addressing parts within a package) also needs work.  As we use the term, a "Pack URI" is not a scheme, but an instance compliant with the scheme syntax and rules.  Do we need a clause 3 entry for Pack IRI too?

*** References to RFC 3986 and RFC 3987 -- and URI/IRI confusion ***
This is really several issues.  
1.  I think we need a sentence or two to introduce URI and IRI, particularly given that you have added a section defining a pack_IRI as distinct from a pack_URI.  I need to fully understand the intent of the changes wrt URI and IRI in order to come up with suggested wording and look for inconsistencies.  I believe that one important modification you are looking for to Part 3 is to let part names contain non-ASCII characters without percent-encoding.  Is that the only change to the model?  

2.  I think we need to introduce RFC 3986 and RFC 3987 briefly in the text and relate them to URI and IRI.  This could be combined with addressing point 1, possibly near beginning of 8.2.2.  Also applies to 8.3 (Pack URI scheme), 8.4 (Base IRIs/URIs), and 8.5 (Resolving relative references).  I believe another of your objectives was to avoid duplication of content from the RFCs but to rely on referring to them.  Am I right?  Would it work to use an Annex to introduce URIs, IRIs, RFC 3986, RFC 3987 and how OPC uses them (and varies from them)?  Should such an Annex be normative or informative?

3.  There are various references to using terms as defined in the two RFCs.  I think we need a consistent approach to listing these terms.  This could be in the text near each ABNF chunk, making sure that all relevant terms are covered.  Alternatively, it could be by putting all the terms into Terms and Definitions, with associated notes that indicate which RFC they are defined in.

4.  In 8.3.2 we should probably say explicitly that the syntax of the Pack URI scheme can be extended to IRIs.  I haven't chased down whether a pack_IRI as you have added it in 8.3.2 is still a valid pack URI according to the provisional scheme registered with IANA or not.  But clarifying text might be different depending on the answer.

5.  May need to change "URI" to "URI or IRI" or to "IRI" in more places.  I'm very confused on this.

6.  8.4 is entitled Base IRIs but mostly discusses URIs and 3986.

*** Base URIs/IRIs  
8.4
For "This subclause defines" to be true, I think the first Note needs to be regular text, i.e., normative rather than informative.

Section 5.1 of RFC is NOT a four step model, but four ways (cases) of determining a Base URI in order of precedence.  So I believe both notes to be incorrect, the first in detail, the second in substance.  

First Note:  Use "case" (or some equivalent word) rather than "step."
Second Note:  Case 3 and 4 will only come into play if case 2 (being described for OPC in this subclause) is not applicable.  The relationship  between what is "defined" here and the other cases in 3986 needs to be clarified.

****Resolving relative references ***
8.5
One underlying issue is now clear to me: there may be references that are intended to function like "relative references" as defined in RFC 3986, but are not technically valid as defined there or in 3987.  For example, they may have backward slashes that should be interpreted as forward slashes.  I will be happy to try and produce some clearer wording when I understand more fully.  Does the resolution process as described in 8.5 come into play only within a pack URI?  All the examples are with pack URIs.  However, in the published Part 2, annex A is not tied to Pack URIs, but relates to URIs in general.  


 

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource
http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **
________________________________________
From: Arms, Caroline [caar at loc.gov]
Sent: Wednesday, August 13, 2014 10:38 AM
To: MURATA Makoto; SC34
Subject: Notes on latest draft of Part 2

All (particularly Rex and Murata-san),

As promised, I have been reading the June MM0619 draft by Murata-san that is on the Assembla site.

Since most of the changes relate to moving most of Annex A into clause 8, I will send notes for clause 8 and Annex A separately -- not immediately.  Below (and attached as a text file) are notes on the other clauses.

1.  Scope

General editorial comments:
(a) This clause shows several significant deletions.  Somewhat equivalent text is in 7. Overview.  We should check how things will look in the ISO preview (online browsing platform).  Scope will show but I'm not sure that clause 7 would.  We should check whether any text deleted from Scope should be recovered to get relevant words into either Scope or Overview.

(b) The ... bulleted list seems to be temporary.  It definitely needs work for grammatical consistency.

Typo:  "dissiminated" should be "disseminated"

2.  Conformance

General editorial comment:
(a) Could do with improved distinction between statement about conformance for OPC (this part) and conformance in the multi-part standard.

3.  Normative References
Presumably should update the Augmented BNF RFC 4234 to RFC 5234, since 4234 is obsolete.  [Note: New text from Murata-san refers to RFC 2234 (an even older obsolete version) and uses EBNF rather than ABNF.  Presumably all references should use RFC 5234 and "ABNF."]

4.  Terms and Definitions
I notice that some ISO standards have Notes associated with definitions.  If this is acceptable practice we might consider whether any of the terms warrant a note as clarification.

Clearly this a clause to revisit when the rest of the document is done.  The first term confused me but I assume we will make a holistic pass through Terms and Definitions later and haven't reviewed this clause in detail.

access style
style in which local access or networked access is conducted

Proposed replacement: style in which local or networked access to a package is conducted
[Aside: Personally, I might use "mode" rather than "style", but I assume we  are only addressing DRs and significant shortcomings in the document.]

Murata-san introduced the terms NFC and NFD for the two normalization approaches applied commonly for UNICODE.  We should either spell out in full and prefix with "Unicode" (my preference) or add to Terms and Definitions.

5.  Notational Conventions
Trusting this to Rex.

6.  General Description
Will need to look back at this.  Murata-san's changes have made Annex A informative, for example.

7.  Overview
See my notes on 1. Scope above.

8.  Package Model
Will deal with in separate set of notes apart from a few small details.

There are still several internal references and example captions that need A to be changed to 8.

I assume "EBNF" should be "ABNF".  Latest RFC for ABNF is RFC 5234.  2234 and 4234 are obsolete.

9. Physical Package
10. Core Properties
11.  Thumbnails
No changes have been made.  Will need to check Annex references.

12. Digital Signatures
Ignoring for now.

Annexes
Ignoring from now.  Will make notes on Annex A in conjunction with clause 8.

  Hoping this helps.      Caroline

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource
http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **
________________________________________
From: eb2mmrt at gmail.com [eb2mmrt at gmail.com] On Behalf Of MURATA Makoto [eb2m-mrt at asahi-net.or.jp]
Sent: Thursday, June 26, 2014 11:16 PM
To: SC34
Subject: My personal draft of OPC revision

Dear colleagues,

On the basis of discussions in Prague, I updated my personal
rewrite.  It is available at:

https://www.assembla.com/spaces/IS29500/documents

The first file (OfficeOpenXMLPart2-OpenPackagingConventionsMM0619)
is the draft.

Regards,
Makoto
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Part2Clause8notes_20140812.txt
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20140814/c35083f3/attachment-0001.txt>


More information about the sc34wg4 mailing list