OPC part names and referenes

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Thu Jan 9 07:56:22 CET 2014


John,


>> 2) Resolution of relative URI references
> I'm not sure I completely understand this.  Are you proposing to eliminate using pack URIs to resolve a relative reference (either an absolute-path reference or a relative-path reference) that is the value of a Target attribute of a Relationship element?

Right.  To resolve relative-path references, we only have
to prepend the source part name.  I'm not saying that
the pack scheme should not be used anywhere.  I'm just
saying that it should not be used for resolving absolute-path
or relative-path references.

>> Neither OPC part references (or Unicode strings as specified in Annex A) nor OPC part names contain schemes (e.g., http:).
> If by "OPC part reference" you mean a pack URI, technically it does have a scheme - "pack".  The address of the external package, a part inside of which is being referenced, that is being referenced by the relationship contains a scheme, but that address' scheme+authority+path is munged into a string that can be stored as the pack URI's authority component.

I mean "Unicode string" in Annex A by  "OPC part reference".
They do not contain schemes, as demonstrated by A.5.

>
>> 3) xml:base
> I think it's already been implemented and is used by XAML or .Net.

I am skeptical.  How can we resolve an relative-path OPC part reference
if xml:base="http://www.example.com/foo.html"?  We have to guarantee
that the base is an OPC part within the current OPC package.

>> I think that we should limit our concern to MS Office.  The .Net implementation of OPC does not implement Annex A of Part 2 at all.
> Channeling Chris here, I'd want to be careful we don't change something and omit looking at a known implementation on an assumption it won't be affected, even though your testing seems to indicate that.  If we come up with a short list of what appear to be implementation limitations (i.e., implementing a subset of what OPC allows), Chris/Jim/I can try to hunt down confirmations from the relevant product teams to confirm them and see if our final proposed changes to OPC would create incompatibilities.

I agree that we should not forget .Net.  But I also think that
.Net is already very non-conformant, since it does not
support Annex A.

Regards,
Makoto

>
> Re: Chris' e-mail:
>> I have a feeling that some of the sticking points we discovered with regard to relative references were related to XPS
> I think that was to do with XPS or Office always generating relationship targets with a "/" and the other without.  With our better understanding of relative references (relative-path reference vs. absolute-path reference), we found that both were correct and evaluate as expected.
>
> John
>
> -----Original Message-----
> From: Chris Rae [mailto:Chris.Rae at microsoft.com]
> Sent: Tuesday, January 7, 2014 10:14 AM
> To: MURATA Makoto; SC34
> Subject: RE: OPC part names and referenes
>
> I have a feeling that some of the sticking points we discovered with regard to relative references were related to XPS, but I can't remember exactly what the details were. I'll do some investigation.
>
> We'll have to tread somewhat carefully here, as OPC is the most widely implemented part of ISO/IEC 29500.
>
> Chris
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA Makoto
> Sent: Monday, January 6, 2014 6:41 PM
> To: SC34
> Subject: Re: OPC part names and referenes
>
> Here are some further experiments.
>
> Summary:
>
> MS Word 2007 does not allow non-ASCII characters within part names even if they are percent-encoded.  %HH in OPC part references are decoded as long as they represent ASCII characters.
>
> Experiments:
>
> First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
> Specificaly:
>
> - Renamed the file "document.xml" under the directory "word" as "%E3%81%82.xml"
>
> - Renamed the file "document.xml.rels" under the directory "word/_rels" as "%E3%81%82.xml.rels"
>
> - Replaced "word/document.xml" in the file "_rels/.rels" by "word/%E3%81%82.xml"
>
> - Replaced "/word/document.xml" in "[Content_Types].xml" by "/word/%E3%81%82.xml"
>
> Then, MS Word 2007 cannot open the revised WML document.
>
> Second, I used "a" instead of "%E3%81%82" in the above four changes.
> Then, the document opened successfully.
>
> Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I also percent-encoded some other references (values of Relationship/@Target).
> Then, the document opened successfully.
>
> Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A) rather than %E3%81%82.  As expected, the document does not open.
>
> My conclusions:
>
> - Non-ASCII characters in part names are not allowed even if they are percent-encoded.
>
> - %HH in values of Relationship/@Target are decoded as long as they represent ASCII characters.
>
> Regards,
> Makoto
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA Makoto
> Sent: 06 January 2014 17:20
> To: SC34
> Subject: Re: OPC part names and referenes
>
> I did some more experiments using MS Office 2007 and .Net.
>
> Here is my understanding.
>
> - MS Office 2007 converts %HH to characters at least when %HH represents ASCII characters.
>
> - MS Office 2007 resolves absolute-path references (which begins with "/") correctly.
>
> - MS Office 2007 resolves relative-path references (which does not begin with "/") correctly.
>
> - .Net (Package.GetPart) recognizes neither relative-path references nor %HH
>
> I think that we should limit our concern to MS Office.  The .Net implementation of OPC does not implement Annex A of Part 2 at all.
>
> Regards,
> Makoto
>
> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> The more I think about OPC, the more confused I am.
>>
>> I have thought that references to OPC parts ("Unicode string"
>> in Annex A of OPC) can contain non-ASCII characaters and that such
>> non-ASCII characters are percent-encoded before referenced OPC parts
>> are located.  I have also thought that references to OPC parts are
>> resolved relative to containing OPC parts when they do not begin with
>> "/".
>>
>> However, my experiment with .Net in F# appears to show I am mistaken.
>> It reports errors if references to OPC parts contain non-ASCII
>> characters.  Ir also reports errors if references to OPC parts do not
>> begin with "/".
>>
>> I plan to manually edit OOXML documents and XPS documents and handle
>> them by MS-Office and XPS viewers.
>>
>> Here is my F# program.
>>
>> open System.IO.Packaging
>> open System
>>
>> let readOPC() =
>>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>     let part =  package.GetPart(uri)
>>     let enum = part.GetRelationships().GetEnumerator()
>>     while (enum.MoveNext()) do
>>         let relship = enum.Current
>>         let targetURI = relship.TargetUri
>>         try
>>             let targetPart = package.GetPart(targetURI)
>>             let s = targetPart.GetStream()
>>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>> s.ReadByte())
>>         with
>>             | :? System.ArgumentException ->
>> System.Console.WriteLine("Error: {0}", targetURI)
>>     package.Close()
>>
>> readOPC()
>>
>>
>> Regards,
>> Makoto
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA Makoto
> Sent: Tuesday, December 24, 2013 4:16 AM
> To: SC34
> Subject: OPC part names and referenes
>
> Dear colleagues,
>
> Merry Christmas!
>
> I am trying to implement Section 2 of the Japanese proposal
> (http://kikaku.itscj.ipsj.or.jp/sc34/wg4/archive/sc34-wg4-2011-0207.html)
> for improving part names and referenes.
>
> While doing so, I studied the conversion from part references (which are relative) to part names again.  Here are some random thoughts.
>
> 1) Leading "/"
>
> In Seattle, we learned from OPC experts that references beginning with "/" and those not beginning with it are resolved differently.
>
>
> Proposal: Explicitly state differences between these two types of references.  A reference to RFC 3986 is not good enough.
>
> 2) Resolution of relative URI referennces
>
> Neither OPC part references (or Unicode strings as specified in Annex A) nor OPC part names contain schemes (e.g., http:).  Should we nevertheless rely on resolution of relative URI references for the conversion from OPC part references to OPC part names?  In other words, should we first create aboslute URIs thus introducing schemes and then construct OPC part names by removing schemes?
>
> Proposal: Stop relying on resolution of relative URI references.
> Rather, introduce "base OPC part name", which is the OPC part name of the containing OPC part, and introduce a procedure for merging base OPC part names and OPC part references.  This processing model does not have to touch schemes.
>
> 3) xml:base
>
> Do we really have to allow xml:base (and other similar mechanisms) to change the interpretation of OPC part references?  If such a mechaism specifies irrelevant URIs such as http://www.example.com, how should we interpret OPC part references?
>
> Proposal: Stop using xml:base (and other similar mechanisms).
>
>
> Regards,
> Makoto



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto


More information about the sc34wg4 mailing list