OPC part names and referenes

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Sat Mar 1 06:32:43 CET 2014


Based on the feedback from the original developer of OPC,
I think that three things have to be clearly defined.

1) Part names
2) Pack Scheme
3) Base URIs

The pack scheme provides a big picture, while part names
are pieces.

There are no reasons for introducing something like "OPC part
references", since they are nothing but relative references.
What is important is the determination of base URIs within
OPC packages.

Base URIs should be defined in a manner consistent with
RFCs 3986 and 3987.  In particular, we should explicitly
mention which clause allows package-level base URIs.

The resolution of relative references to pack URIs
is just a special case of relative reference resolution
as specified in RFCs 3986 and 3987.  29500-2:2012
provides normative Annex A, but what is needed is
a non-normative summary of the special case.

In Berlin, I will provide a summary of recent discussions.

Regards,
Makoto


2014-02-25 22:05 GMT+09:00 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:

> John,
>
>
> Thanks for contacting the original developers.  I now have
> better understanding of the design.  In particular, I now
> understand that OPC part references are nothing but
> relative references and also understand why queries or
> fragment identifiers were not used for referencing OPC
> parts.
>
>
>> [Design Rationale]
>> Several interesting requirements that drove the design of the pack URI
>> scheme:
>>
>> 1)      If a resource embedded with an OPC package is of a pre-existing
>> MIME type that itself embeds relative references, then a MIME-type handler
>> associated with that MIME type could use ordinary text-based
>> relative-reference resolution mechanisms to resolve relative-references
>> into URIs addressing other resources embedded within the same OPC package.
>>
>
> I  spent a lot of time on this para.  Here
> is my understanding of his intention:
>
>   The base URI for an OPC part reference within
>   an OPC part shall  be a pack URI, which is composed from the URI
>   of the OPC package and the part name of the OPC part.
>
> This is implied by the second paragraph in Clause 9.2.1
> of 29500-2:2012.
>
>   Relative references from a part are interpreted relative
>   to the base URI of that part. By default, the base URI of a
>   part is derived from the name of the part, as defined in §B.3.
>
> This is also implied by the last bullet in A.3, namely:
>
>   Resolve the relative reference against the base URI of the
>   part holding the Unicode string, as it is defined in §5.2 of
>   RFC 3986. The path component of the resulting absolute
>   URI is the part name
>
> Let us review how base URIs are determined in
> RFCs 3986 and 3987.
>
> First, determination of base URIs can be MIME-type dependent.
> 5.1.1 of RFC 3986 (http://tools.ietf.org/html/rfc3986#section-5.1.1)
> allows base URIs to be embedded in an MIME-type dependent
> manner.
>
> Second, specifying base URIs at the OPC package level is
> allowed by 5.1.2 (http://tools.ietf.org/html/rfc3986#section-5.1.2).
> That's why OPC can specify that pack URIs are base URIs.
> (We should explicitly mention 5.1.2)
>
> In the case of relationship parts, it is guaranteed that
> base URIs are pack URIs.  This is because the use
> of xml:base for relationship parts is prohibited by the
> sentence immediately preceding to 9.3.2.2 in
> 29500-2:2012, which is:
>   The xml:base attribute shall not be used to specify a
>   base URI for relationship XML content.
>
>
> Perhaps, almost everything (except the last
> bullet in A.3) in Annex A is a restatement of
> the resolution algorithm in RFC 3986 and
> RFC 3987.  A.1 and A.2 in 29500-2:2012
> are such restatements.  I believe that A.3 is
> questionable, because it deviates from
> the generic algorithm in these RFCs and also
> because it is not implemented by MS Office.
>
> 2)      Allow for the deep-references from outside of a package to an
>> individual embedded resource inside the package, while still supporting
>> MIME-type-specific fragment identifiers to identify subobjects within the
>> addressed resource.
>>
>
> So, fragment identifiers of HTML should be usable for
> locating fragments of HTML documents within OPC packages.
> Thus, we should not use fragment identifiers for referencing
> OPC parts.  We have to provide something as part of URIs
> for referencing OPC parts.  Makes sense.
>
>
>> 3)      Pack-scheme-aware client code could address and efficiently
>> retrieve (via HTTP 1.1 byte-range requests) embedded resources from a
>> package residing on a web-server without having any pack-scheme-aware or
>> OPC-aware code running on the server.  (I don’t recall which, if any, of
>> the multiple OPC implementations across Microsoft might have actually
>> realized this design goal.)
>>
>
> This deserves to be explicitly specified as part of 29500-2.
>
>>
>> The second requirement implied that a fragment identifier couldn’t be
>> used to target an embedded resources, because there is no MIME-type
>> independent standard for composing  fragment identifiers (in this case,
>> what would have been composing the fragment identifier identifying an
>> embedded resource together with the fragment identifier identifying some
>> sub-element within that resource).
>>
>
> Yes.
>
>
>> The first requirement implied that query parameters could not be used to
>> address individual embedded resources, because any such mechanism for using
>> query parameter to specify “paths” to individual embedded resources
>> wouldn’t relate in any way to ordinary relative references.
>>
>>
> I agree that query parameters cannot be used for addressing OPC
> parts, if we want to use existing fragment identifiers for OPC parts
> (MIME entities) within OPC packages.
>
>
>> I sometime think of an OPC package as web-server-in-a-box.
>
>
> Incidentally, I have thought something very similar for
> EPUB packages, which are going to have content-
> negotiation-in-a-box.
>
>
>> The authority component of a the URI identifies a self-contained domain.
>
>
> True.
>
>
>> And composing an authority-free relative reference with an absolute URI
>> reference to an embedded resource can never escape the confines of the
>> containing package, just as
>
>
> What I suggested is different.   But I won't go into the details.
> I would like to honor the original design motivation and improve
> the text.  I see a *lot* of rooms for improvements.
>
>
> composing an authority-free relative reference with an absolute URI
>> reference to web-server-hosted resource can never escape the confines of
>> that web-server .  And, contained within the package is sufficient
>> information to report the mime-type of every embedded resource, just as is
>> typical of an HTTP server.
>>
>> If one has a web-site of resources that reference each other only via
>> relative references, then those resources can be zipped up into an OPC
>> package without tampering with any of those relative references. (Although
>> one would have to explicitly capture the MIME-types of the embedded
>> resources as they would be reported by the original HTTP server.)
>>
>>
> I doubt this, since relative references within a relationship part are
> resolved
> with respect to the source part rather than the relationship part.
>
>
>>
>> [Resolving Relative References]
>> IMO the benefit of having pack uri defined in the OPC is to leverage (and
>> to be compliant with) the Reference Resolution spec, which is much more
>> complicated than just prepending the source part name. If we do not define
>> the base URI for the part as pack://<authority>/<part name>, we’ll have to
>> specify the resolution algorithm in the standard. Now we just say in A.3 -
>>
>
> I think that using pack URIs from both inside and outside is
> a bigger advantage.  I still doubt A.3, since it deviates from
> the resolution algorithm in the two RFCs.
>
> Regards,
> Makoto
>
>
>> 10.   Resolve the relative reference against the base URI of the part
>> holding the Unicode string, as it is defined in §5.2 of RFC 3986. The path
>> component of the resulting absolute URI is the part name.
>>
>> Also the schema give us the standard place to define case-insensitive
>> path (and part name).
>>
>>
>> -----Original Message-----
>> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
>> Makoto
>> Sent: Wednesday, January 8, 2014 10:56 PM
>> To: SC34
>> Subject: Re: OPC part names and referenes
>>
>> John,
>>
>>
>> >> 2) Resolution of relative URI references
>> > I'm not sure I completely understand this.  Are you proposing to
>> eliminate using pack URIs to resolve a relative reference (either an
>> absolute-path reference or a relative-path reference) that is the value of
>> a Target attribute of a Relationship element?
>>
>> Right.  To resolve relative-path references, we only have to prepend the
>> source part name.  I'm not saying that the pack scheme should not be used
>> anywhere.  I'm just saying that it should not be used for resolving
>> absolute-path or relative-path references.
>>
>> >> Neither OPC part references (or Unicode strings as specified in Annex
>> A) nor OPC part names contain schemes (e.g., http:).
>> > If by "OPC part reference" you mean a pack URI, technically it does
>> have a scheme - "pack".  The address of the external package, a part inside
>> of which is being referenced, that is being referenced by the relationship
>> contains a scheme, but that address' scheme+authority+path is munged into a
>> string that can be stored as the pack URI's authority component.
>>
>> I mean "Unicode string" in Annex A by  "OPC part reference".
>> They do not contain schemes, as demonstrated by A.5.
>>
>> >
>> >> 3) xml:base
>> > I think it's already been implemented and is used by XAML or .Net.
>>
>> I am skeptical.  How can we resolve an relative-path OPC part reference
>> if xml:base="http://www.example.com/foo.html"?  We have to guarantee
>> that the base is an OPC part within the current OPC package.
>>
>> >> I think that we should limit our concern to MS Office.  The .Net
>> implementation of OPC does not implement Annex A of Part 2 at all.
>> > Channeling Chris here, I'd want to be careful we don't change something
>> and omit looking at a known implementation on an assumption it won't be
>> affected, even though your testing seems to indicate that.  If we come up
>> with a short list of what appear to be implementation limitations (i.e.,
>> implementing a subset of what OPC allows), Chris/Jim/I can try to hunt down
>> confirmations from the relevant product teams to confirm them and see if
>> our final proposed changes to OPC would create incompatibilities.
>>
>> I agree that we should not forget .Net.  But I also think that .Net is
>> already very non-conformant, since it does not support Annex A.
>>
>> Regards,
>> Makoto
>>
>> >
>> > Re: Chris' e-mail:
>> >> I have a feeling that some of the sticking points we discovered with
>> >> regard to relative references were related to XPS
>> > I think that was to do with XPS or Office always generating
>> relationship targets with a "/" and the other without.  With our better
>> understanding of relative references (relative-path reference vs.
>> absolute-path reference), we found that both were correct and evaluate as
>> expected.
>> >
>> > John
>> >
>> > -----Original Message-----
>> > From: Chris Rae [mailto:Chris.Rae at microsoft.com]
>> > Sent: Tuesday, January 7, 2014 10:14 AM
>> > To: MURATA Makoto; SC34
>> > Subject: RE: OPC part names and referenes
>> >
>> > I have a feeling that some of the sticking points we discovered with
>> regard to relative references were related to XPS, but I can't remember
>> exactly what the details were. I'll do some investigation.
>> >
>> > We'll have to tread somewhat carefully here, as OPC is the most widely
>> implemented part of ISO/IEC 29500.
>> >
>> > Chris
>> >
>> > -----Original Message-----
>> > From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
>> > Makoto
>> > Sent: Monday, January 6, 2014 6:41 PM
>> > To: SC34
>> > Subject: Re: OPC part names and referenes
>> >
>> > Here are some further experiments.
>> >
>> > Summary:
>> >
>> > MS Word 2007 does not allow non-ASCII characters within part names even
>> if they are percent-encoded.  %HH in OPC part references are decoded as
>> long as they represent ASCII characters.
>> >
>> > Experiments:
>> >
>> > First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
>> > Specificaly:
>> >
>> > - Renamed the file "document.xml" under the directory "word" as
>> "%E3%81%82.xml"
>> >
>> > - Renamed the file "document.xml.rels" under the directory "word/_rels"
>> as "%E3%81%82.xml.rels"
>> >
>> > - Replaced "word/document.xml" in the file "_rels/.rels" by
>> "word/%E3%81%82.xml"
>> >
>> > - Replaced "/word/document.xml" in "[Content_Types].xml" by
>> "/word/%E3%81%82.xml"
>> >
>> > Then, MS Word 2007 cannot open the revised WML document.
>> >
>> > Second, I used "a" instead of "%E3%81%82" in the above four changes.
>> > Then, the document opened successfully.
>> >
>> > Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I
>> also percent-encoded some other references (values of Relationship/@Target).
>> > Then, the document opened successfully.
>> >
>> > Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A)
>> rather than %E3%81%82.  As expected, the document does not open.
>> >
>> > My conclusions:
>> >
>> > - Non-ASCII characters in part names are not allowed even if they are
>> percent-encoded.
>> >
>> > - %HH in values of Relationship/@Target are decoded as long as they
>> represent ASCII characters.
>> >
>> > Regards,
>> > Makoto
>> >
>> > -----Original Message-----
>> > From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
>> > Makoto
>> > Sent: 06 January 2014 17:20
>> > To: SC34
>> > Subject: Re: OPC part names and referenes
>> >
>> > I did some more experiments using MS Office 2007 and .Net.
>> >
>> > Here is my understanding.
>> >
>> > - MS Office 2007 converts %HH to characters at least when %HH
>> represents ASCII characters.
>> >
>> > - MS Office 2007 resolves absolute-path references (which begins with
>> "/") correctly.
>> >
>> > - MS Office 2007 resolves relative-path references (which does not
>> begin with "/") correctly.
>> >
>> > - .Net (Package.GetPart) recognizes neither relative-path references
>> > nor %HH
>> >
>> > I think that we should limit our concern to MS Office.  The .Net
>> implementation of OPC does not implement Annex A of Part 2 at all.
>> >
>> > Regards,
>> > Makoto
>> >
>> > 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> >> The more I think about OPC, the more confused I am.
>> >>
>> >> I have thought that references to OPC parts ("Unicode string"
>> >> in Annex A of OPC) can contain non-ASCII characaters and that such
>> >> non-ASCII characters are percent-encoded before referenced OPC parts
>> >> are located.  I have also thought that references to OPC parts are
>> >> resolved relative to containing OPC parts when they do not begin with
>> >> "/".
>> >>
>> >> However, my experiment with .Net in F# appears to show I am mistaken.
>> >> It reports errors if references to OPC parts contain non-ASCII
>> >> characters.  Ir also reports errors if references to OPC parts do not
>> >> begin with "/".
>> >>
>> >> I plan to manually edit OOXML documents and XPS documents and handle
>> >> them by MS-Office and XPS viewers.
>> >>
>> >> Here is my F# program.
>> >>
>> >> open System.IO.Packaging
>> >> open System
>> >>
>> >> let readOPC() =
>> >>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>> >>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>> >>     let part =  package.GetPart(uri)
>> >>     let enum = part.GetRelationships().GetEnumerator()
>> >>     while (enum.MoveNext()) do
>> >>         let relship = enum.Current
>> >>         let targetURI = relship.TargetUri
>> >>         try
>> >>             let targetPart = package.GetPart(targetURI)
>> >>             let s = targetPart.GetStream()
>> >>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>> >> s.ReadByte())
>> >>         with
>> >>             | :? System.ArgumentException ->
>> >> System.Console.WriteLine("Error: {0}", targetURI)
>> >>     package.Close()
>> >>
>> >> readOPC()
>> >>
>> >>
>> >> Regards,
>> >> Makoto
>> >
>> > -----Original Message-----
>> > From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
>> > Makoto
>> > Sent: Tuesday, December 24, 2013 4:16 AM
>> > To: SC34
>> > Subject: OPC part names and referenes
>> >
>> > Dear colleagues,
>> >
>> > Merry Christmas!
>> >
>> > I am trying to implement Section 2 of the Japanese proposal
>> > (http://kikaku.itscj.ipsj.or.jp/sc34/wg4/archive/sc34-wg4-2011-0207.ht
>> > ml) for improving part names and referenes.
>> >
>> > While doing so, I studied the conversion from part references (which
>> are relative) to part names again.  Here are some random thoughts.
>> >
>> > 1) Leading "/"
>> >
>> > In Seattle, we learned from OPC experts that references beginning with
>> "/" and those not beginning with it are resolved differently.
>> >
>> >
>> > Proposal: Explicitly state differences between these two types of
>> references.  A reference to RFC 3986 is not good enough.
>> >
>> > 2) Resolution of relative URI referennces
>> >
>> > Neither OPC part references (or Unicode strings as specified in Annex
>> A) nor OPC part names contain schemes (e.g., http:).  Should we
>> nevertheless rely on resolution of relative URI references for the
>> conversion from OPC part references to OPC part names?  In other words,
>> should we first create aboslute URIs thus introducing schemes and then
>> construct OPC part names by removing schemes?
>> >
>> > Proposal: Stop relying on resolution of relative URI references.
>> > Rather, introduce "base OPC part name", which is the OPC part name of
>> the containing OPC part, and introduce a procedure for merging base OPC
>> part names and OPC part references.  This processing model does not have to
>> touch schemes.
>> >
>> > 3) xml:base
>> >
>> > Do we really have to allow xml:base (and other similar mechanisms) to
>> change the interpretation of OPC part references?  If such a mechaism
>> specifies irrelevant URIs such as http://www.example.com, how should we
>> interpret OPC part references?
>> >
>> > Proposal: Stop using xml:base (and other similar mechanisms).
>> >
>> >
>> > Regards,
>> > Makoto
>>
>>
>>
>> --
>>
>> Praying for the victims of the Japan Tohoku earthquake
>>
>> Makoto
>>
>
>
>
> --
>
> Praying for the victims of the Japan Tohoku earthquake
>
> Mako
>



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20140301/2d4c94ff/attachment-0001.html>


More information about the sc34wg4 mailing list