OPC part names and referenes

Tue Feb 11 19:27:52 CET 2014

Hi all ?
As I mentioned on the recent WG 4 call, Chris, Jim and I have been doing research with some of the original people behind OPC.  We have some information on pack URI for evening reading leading up to the Berlin meeting.  

The [Design Rationale] section is commentary directly from one of the authors.  Much like our discussion with Chuck J. last year, these are his own thoughts, so don’t read it as corporate dogma.  I also (anonymously) floated Murata-san’s comments on dropping pack URI in favor of “base OPC part name” for resolving relative references.  See the [Resolving Relative References] section for (again, personal) comments from the authors.  Note that the folks who initially worked on this are more web technologies people, so they have an interesting different perspective than those of us who are more document-oriented.

John

[Design Rationale]
Several interesting requirements that drove the design of the pack URI scheme:

1)      If a resource embedded with an OPC package is of a pre-existing MIME type that itself embeds relative references, then a MIME-type handler associated with that MIME type could use ordinary text-based relative-reference resolution mechanisms to resolve relative-references into URIs addressing other resources embedded within the same OPC package.
2)      Allow for the deep-references from outside of a package to an individual embedded resource inside the package, while still supporting MIME-type-specific fragment identifiers to identify subobjects within the addressed resource.
3)      Pack-scheme-aware client code could address and efficiently retrieve (via HTTP 1.1 byte-range requests) embedded resources from a package residing on a web-server without having any pack-scheme-aware or OPC-aware code running on the server.  (I don’t recall which, if any, of the multiple OPC implementations across Microsoft might have actually realized this design goal.)

The second requirement implied that a fragment identifier couldn’t be used to target an embedded resources, because there is no MIME-type independent standard for composing  fragment identifiers (in this case, what would have been composing the fragment identifier identifying an embedded resource together with the fragment identifier identifying some sub-element within that resource).

The first requirement implied that query parameters could not be used to address individual embedded resources, because any such mechanism for using query parameter to specify “paths” to individual embedded resources wouldn’t relate in any way to ordinary relative references.

I sometime think of an OPC package as web-server-in-a-box.  The authority component of a the URI identifies a self-contained domain.  And composing an authority-free relative reference with an absolute URI reference to an embedded resource can never escape the confines of the containing package, just as composing an authority-free relative reference with an absolute URI reference to web-server-hosted resource can never escape the confines of that web-server .  And, contained within the package is sufficient information to report the mime-type of every embedded resource, just as is typical of an HTTP server.

If one has a web-site of resources that reference each other only via relative references, then those resources can be zipped up into an OPC package without tampering with any of those relative references. (Although one would have to explicitly capture the MIME-types of the embedded resources as they would be reported by the original HTTP server.)

[Resolving Relative References]
IMO the benefit of having pack uri defined in the OPC is to leverage (and to be compliant with) the Reference Resolution spec, which is much more complicated than just prepending the source part name. If we do not define the base URI for the part as pack://<authority>/<part name>, we’ll have to specify the resolution algorithm in the standard. Now we just say in A.3 -

10.   Resolve the relative reference against the base URI of the part holding the Unicode string, as it is defined in §5.2 of RFC 3986. The path component of the resulting absolute URI is the part name.

Also the schema give us the standard place to define case-insensitive path (and part name).

-----Original Message-----
From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA Makoto
Sent: Wednesday, January 8, 2014 10:56 PM
To: SC34
Subject: Re: OPC part names and referenes

John,

>> 2) Resolution of relative URI references
> I'm not sure I completely understand this.  Are you proposing to eliminate using pack URIs to resolve a relative reference (either an absolute-path reference or a relative-path reference) that is the value of a Target attribute of a Relationship element?

Right.  To resolve relative-path references, we only have to prepend the source part name.  I'm not saying that the pack scheme should not be used anywhere.  I'm just saying that it should not be used for resolving absolute-path or relative-path references.

>> Neither OPC part references (or Unicode strings as specified in Annex A) nor OPC part names contain schemes (e.g., http:).
> If by "OPC part reference" you mean a pack URI, technically it does have a scheme - "pack".  The address of the external package, a part inside of which is being referenced, that is being referenced by the relationship contains a scheme, but that address' scheme+authority+path is munged into a string that can be stored as the pack URI's authority component.

I mean "Unicode string" in Annex A by  "OPC part reference".
They do not contain schemes, as demonstrated by A.5.

>
>> 3) xml:base
> I think it's already been implemented and is used by XAML or .Net.

I am skeptical.  How can we resolve an relative-path OPC part reference if xml:base="http://www.example.com/foo.html"?  We have to guarantee that the base is an OPC part within the current OPC package.

>> I think that we should limit our concern to MS Office.  The .Net implementation of OPC does not implement Annex A of Part 2 at all.
> Channeling Chris here, I'd want to be careful we don't change something and omit looking at a known implementation on an assumption it won't be affected, even though your testing seems to indicate that.  If we come up with a short list of what appear to be implementation limitations (i.e., implementing a subset of what OPC allows), Chris/Jim/I can try to hunt down confirmations from the relevant product teams to confirm them and see if our final proposed changes to OPC would create incompatibilities.

I agree that we should not forget .Net.  But I also think that .Net is already very non-conformant, since it does not support Annex A.

Regards,
Makoto

>
> Re: Chris' e-mail:
>> I have a feeling that some of the sticking points we discovered with 
>> regard to relative references were related to XPS
> I think that was to do with XPS or Office always generating relationship targets with a "/" and the other without.  With our better understanding of relative references (relative-path reference vs. absolute-path reference), we found that both were correct and evaluate as expected.
>
> John
>
> -----Original Message-----
> From: Chris Rae [mailto:Chris.Rae at microsoft.com]
> Sent: Tuesday, January 7, 2014 10:14 AM
> To: MURATA Makoto; SC34
> Subject: RE: OPC part names and referenes
>
> I have a feeling that some of the sticking points we discovered with regard to relative references were related to XPS, but I can't remember exactly what the details were. I'll do some investigation.
>
> We'll have to tread somewhat carefully here, as OPC is the most widely implemented part of ISO/IEC 29500.
>
> Chris
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA 
> Makoto
> Sent: Monday, January 6, 2014 6:41 PM
> To: SC34
> Subject: Re: OPC part names and referenes
>
> Here are some further experiments.
>
> Summary:
>
> MS Word 2007 does not allow non-ASCII characters within part names even if they are percent-encoded.  %HH in OPC part references are decoded as long as they represent ASCII characters.
>
> Experiments:
>
> First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
> Specificaly:
>
> - Renamed the file "document.xml" under the directory "word" as "%E3%81%82.xml"
>
> - Renamed the file "document.xml.rels" under the directory "word/_rels" as "%E3%81%82.xml.rels"
>
> - Replaced "word/document.xml" in the file "_rels/.rels" by "word/%E3%81%82.xml"
>
> - Replaced "/word/document.xml" in "[Content_Types].xml" by "/word/%E3%81%82.xml"
>
> Then, MS Word 2007 cannot open the revised WML document.
>
> Second, I used "a" instead of "%E3%81%82" in the above four changes.
> Then, the document opened successfully.
>
> Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I also percent-encoded some other references (values of Relationship/@Target).
> Then, the document opened successfully.
>
> Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A) rather than %E3%81%82.  As expected, the document does not open.
>
> My conclusions:
>
> - Non-ASCII characters in part names are not allowed even if they are percent-encoded.
>
> - %HH in values of Relationship/@Target are decoded as long as they represent ASCII characters.
>
> Regards,
> Makoto
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA 
> Makoto
> Sent: 06 January 2014 17:20
> To: SC34
> Subject: Re: OPC part names and referenes
>
> I did some more experiments using MS Office 2007 and .Net.
>
> Here is my understanding.
>
> - MS Office 2007 converts %HH to characters at least when %HH represents ASCII characters.
>
> - MS Office 2007 resolves absolute-path references (which begins with "/") correctly.
>
> - MS Office 2007 resolves relative-path references (which does not begin with "/") correctly.
>
> - .Net (Package.GetPart) recognizes neither relative-path references 
> nor %HH
>
> I think that we should limit our concern to MS Office.  The .Net implementation of OPC does not implement Annex A of Part 2 at all.
>
> Regards,
> Makoto
>
> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> The more I think about OPC, the more confused I am.
>>
>> I have thought that references to OPC parts ("Unicode string"
>> in Annex A of OPC) can contain non-ASCII characaters and that such 
>> non-ASCII characters are percent-encoded before referenced OPC parts 
>> are located.  I have also thought that references to OPC parts are 
>> resolved relative to containing OPC parts when they do not begin with 
>> "/".
>>
>> However, my experiment with .Net in F# appears to show I am mistaken.
>> It reports errors if references to OPC parts contain non-ASCII 
>> characters.  Ir also reports errors if references to OPC parts do not 
>> begin with "/".
>>
>> I plan to manually edit OOXML documents and XPS documents and handle 
>> them by MS-Office and XPS viewers.
>>
>> Here is my F# program.
>>
>> open System.IO.Packaging
>> open System
>>
>> let readOPC() =
>>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>     let part =  package.GetPart(uri)
>>     let enum = part.GetRelationships().GetEnumerator()
>>     while (enum.MoveNext()) do
>>         let relship = enum.Current
>>         let targetURI = relship.TargetUri
>>         try
>>             let targetPart = package.GetPart(targetURI)
>>             let s = targetPart.GetStream()
>>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>> s.ReadByte())
>>         with
>>             | :? System.ArgumentException ->
>> System.Console.WriteLine("Error: {0}", targetURI)
>>     package.Close()
>>
>> readOPC()
>>
>>
>> Regards,
>> Makoto
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA 
> Makoto
> Sent: Tuesday, December 24, 2013 4:16 AM
> To: SC34
> Subject: OPC part names and referenes
>
> Dear colleagues,
>
> Merry Christmas!
>
> I am trying to implement Section 2 of the Japanese proposal
> (http://kikaku.itscj.ipsj.or.jp/sc34/wg4/archive/sc34-wg4-2011-0207.ht
> ml) for improving part names and referenes.
>
> While doing so, I studied the conversion from part references (which are relative) to part names again.  Here are some random thoughts.
>
> 1) Leading "/"
>
> In Seattle, we learned from OPC experts that references beginning with "/" and those not beginning with it are resolved differently.
>
>
> Proposal: Explicitly state differences between these two types of references.  A reference to RFC 3986 is not good enough.
>
> 2) Resolution of relative URI referennces
>
> Neither OPC part references (or Unicode strings as specified in Annex A) nor OPC part names contain schemes (e.g., http:).  Should we nevertheless rely on resolution of relative URI references for the conversion from OPC part references to OPC part names?  In other words, should we first create aboslute URIs thus introducing schemes and then construct OPC part names by removing schemes?
>
> Proposal: Stop relying on resolution of relative URI references.
> Rather, introduce "base OPC part name", which is the OPC part name of the containing OPC part, and introduce a procedure for merging base OPC part names and OPC part references.  This processing model does not have to touch schemes.
>
> 3) xml:base
>
> Do we really have to allow xml:base (and other similar mechanisms) to change the interpretation of OPC part references?  If such a mechaism specifies irrelevant URIs such as http://www.example.com, how should we interpret OPC part references?
>
> Proposal: Stop using xml:base (and other similar mechanisms).
>
>
> Regards,
> Makoto

-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto