OPC part names and referenes

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Tue Feb 25 14:05:35 CET 2014


John,


Thanks for contacting the original developers.  I now have
better understanding of the design.  In particular, I now
understand that OPC part references are nothing but
relative references and also understand why queries or
fragment identifiers were not used for referencing OPC
parts.


> [Design Rationale]
> Several interesting requirements that drove the design of the pack URI
> scheme:
>
> 1)      If a resource embedded with an OPC package is of a pre-existing
> MIME type that itself embeds relative references, then a MIME-type handler
> associated with that MIME type could use ordinary text-based
> relative-reference resolution mechanisms to resolve relative-references
> into URIs addressing other resources embedded within the same OPC package.
>

I  spent a lot of time on this para.  Here
is my understanding of his intention:

  The base URI for an OPC part reference within
  an OPC part shall  be a pack URI, which is composed from the URI
  of the OPC package and the part name of the OPC part.

This is implied by the second paragraph in Clause 9.2.1
of 29500-2:2012.

  Relative references from a part are interpreted relative
  to the base URI of that part. By default, the base URI of a
  part is derived from the name of the part, as defined in §B.3.

This is also implied by the last bullet in A.3, namely:

  Resolve the relative reference against the base URI of the
  part holding the Unicode string, as it is defined in §5.2 of
  RFC 3986. The path component of the resulting absolute
  URI is the part name

Let us review how base URIs are determined in
RFCs 3986 and 3987.

First, determination of base URIs can be MIME-type dependent.
5.1.1 of RFC 3986 (http://tools.ietf.org/html/rfc3986#section-5.1.1)
allows base URIs to be embedded in an MIME-type dependent
manner.

Second, specifying base URIs at the OPC package level is
allowed by 5.1.2 (http://tools.ietf.org/html/rfc3986#section-5.1.2).
That's why OPC can specify that pack URIs are base URIs.
(We should explicitly mention 5.1.2)

In the case of relationship parts, it is guaranteed that
base URIs are pack URIs.  This is because the use
of xml:base for relationship parts is prohibited by the
sentence immediately preceding to 9.3.2.2 in
29500-2:2012, which is:
  The xml:base attribute shall not be used to specify a
  base URI for relationship XML content.


Perhaps, almost everything (except the last
bullet in A.3) in Annex A is a restatement of
the resolution algorithm in RFC 3986 and
RFC 3987.  A.1 and A.2 in 29500-2:2012
are such restatements.  I believe that A.3 is
questionable, because it deviates from
the generic algorithm in these RFCs and also
because it is not implemented by MS Office.

2)      Allow for the deep-references from outside of a package to an
> individual embedded resource inside the package, while still supporting
> MIME-type-specific fragment identifiers to identify subobjects within the
> addressed resource.
>

So, fragment identifiers of HTML should be usable for
locating fragments of HTML documents within OPC packages.
Thus, we should not use fragment identifiers for referencing
OPC parts.  We have to provide something as part of URIs
for referencing OPC parts.  Makes sense.


> 3)      Pack-scheme-aware client code could address and efficiently
> retrieve (via HTTP 1.1 byte-range requests) embedded resources from a
> package residing on a web-server without having any pack-scheme-aware or
> OPC-aware code running on the server.  (I don’t recall which, if any, of
> the multiple OPC implementations across Microsoft might have actually
> realized this design goal.)
>

This deserves to be explicitly specified as part of 29500-2.

>
> The second requirement implied that a fragment identifier couldn’t be used
> to target an embedded resources, because there is no MIME-type independent
> standard for composing  fragment identifiers (in this case, what would have
> been composing the fragment identifier identifying an embedded resource
> together with the fragment identifier identifying some sub-element within
> that resource).
>

Yes.


> The first requirement implied that query parameters could not be used to
> address individual embedded resources, because any such mechanism for using
> query parameter to specify “paths” to individual embedded resources
> wouldn’t relate in any way to ordinary relative references.
>
>
I agree that query parameters cannot be used for addressing OPC
parts, if we want to use existing fragment identifiers for OPC parts
(MIME entities) within OPC packages.


> I sometime think of an OPC package as web-server-in-a-box.


Incidentally, I have thought something very similar for
EPUB packages, which are going to have content-
negotiation-in-a-box.


> The authority component of a the URI identifies a self-contained domain.


True.


> And composing an authority-free relative reference with an absolute URI
> reference to an embedded resource can never escape the confines of the
> containing package, just as


What I suggested is different.   But I won't go into the details.
I would like to honor the original design motivation and improve
the text.  I see a *lot* of rooms for improvements.


composing an authority-free relative reference with an absolute URI
> reference to web-server-hosted resource can never escape the confines of
> that web-server .  And, contained within the package is sufficient
> information to report the mime-type of every embedded resource, just as is
> typical of an HTTP server.
>
> If one has a web-site of resources that reference each other only via
> relative references, then those resources can be zipped up into an OPC
> package without tampering with any of those relative references. (Although
> one would have to explicitly capture the MIME-types of the embedded
> resources as they would be reported by the original HTTP server.)
>
>
I doubt this, since relative references within a relationship part are
resolved
with respect to the source part rather than the relationship part.


>
> [Resolving Relative References]
> IMO the benefit of having pack uri defined in the OPC is to leverage (and
> to be compliant with) the Reference Resolution spec, which is much more
> complicated than just prepending the source part name. If we do not define
> the base URI for the part as pack://<authority>/<part name>, we’ll have to
> specify the resolution algorithm in the standard. Now we just say in A.3 -
>

I think that using pack URIs from both inside and outside is
a bigger advantage.  I still doubt A.3, since it deviates from
the resolution algorithm in the two RFCs.

Regards,
Makoto


> 10.   Resolve the relative reference against the base URI of the part
> holding the Unicode string, as it is defined in §5.2 of RFC 3986. The path
> component of the resulting absolute URI is the part name.
>
> Also the schema give us the standard place to define case-insensitive path
> (and part name).
>
>
> -----Original Message-----
> From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
> Makoto
> Sent: Wednesday, January 8, 2014 10:56 PM
> To: SC34
> Subject: Re: OPC part names and referenes
>
> John,
>
>
> >> 2) Resolution of relative URI references
> > I'm not sure I completely understand this.  Are you proposing to
> eliminate using pack URIs to resolve a relative reference (either an
> absolute-path reference or a relative-path reference) that is the value of
> a Target attribute of a Relationship element?
>
> Right.  To resolve relative-path references, we only have to prepend the
> source part name.  I'm not saying that the pack scheme should not be used
> anywhere.  I'm just saying that it should not be used for resolving
> absolute-path or relative-path references.
>
> >> Neither OPC part references (or Unicode strings as specified in Annex
> A) nor OPC part names contain schemes (e.g., http:).
> > If by "OPC part reference" you mean a pack URI, technically it does have
> a scheme - "pack".  The address of the external package, a part inside of
> which is being referenced, that is being referenced by the relationship
> contains a scheme, but that address' scheme+authority+path is munged into a
> string that can be stored as the pack URI's authority component.
>
> I mean "Unicode string" in Annex A by  "OPC part reference".
> They do not contain schemes, as demonstrated by A.5.
>
> >
> >> 3) xml:base
> > I think it's already been implemented and is used by XAML or .Net.
>
> I am skeptical.  How can we resolve an relative-path OPC part reference if
> xml:base="http://www.example.com/foo.html"?  We have to guarantee that
> the base is an OPC part within the current OPC package.
>
> >> I think that we should limit our concern to MS Office.  The .Net
> implementation of OPC does not implement Annex A of Part 2 at all.
> > Channeling Chris here, I'd want to be careful we don't change something
> and omit looking at a known implementation on an assumption it won't be
> affected, even though your testing seems to indicate that.  If we come up
> with a short list of what appear to be implementation limitations (i.e.,
> implementing a subset of what OPC allows), Chris/Jim/I can try to hunt down
> confirmations from the relevant product teams to confirm them and see if
> our final proposed changes to OPC would create incompatibilities.
>
> I agree that we should not forget .Net.  But I also think that .Net is
> already very non-conformant, since it does not support Annex A.
>
> Regards,
> Makoto
>
> >
> > Re: Chris' e-mail:
> >> I have a feeling that some of the sticking points we discovered with
> >> regard to relative references were related to XPS
> > I think that was to do with XPS or Office always generating relationship
> targets with a "/" and the other without.  With our better understanding of
> relative references (relative-path reference vs. absolute-path reference),
> we found that both were correct and evaluate as expected.
> >
> > John
> >
> > -----Original Message-----
> > From: Chris Rae [mailto:Chris.Rae at microsoft.com]
> > Sent: Tuesday, January 7, 2014 10:14 AM
> > To: MURATA Makoto; SC34
> > Subject: RE: OPC part names and referenes
> >
> > I have a feeling that some of the sticking points we discovered with
> regard to relative references were related to XPS, but I can't remember
> exactly what the details were. I'll do some investigation.
> >
> > We'll have to tread somewhat carefully here, as OPC is the most widely
> implemented part of ISO/IEC 29500.
> >
> > Chris
> >
> > -----Original Message-----
> > From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
> > Makoto
> > Sent: Monday, January 6, 2014 6:41 PM
> > To: SC34
> > Subject: Re: OPC part names and referenes
> >
> > Here are some further experiments.
> >
> > Summary:
> >
> > MS Word 2007 does not allow non-ASCII characters within part names even
> if they are percent-encoded.  %HH in OPC part references are decoded as
> long as they represent ASCII characters.
> >
> > Experiments:
> >
> > First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
> > Specificaly:
> >
> > - Renamed the file "document.xml" under the directory "word" as
> "%E3%81%82.xml"
> >
> > - Renamed the file "document.xml.rels" under the directory "word/_rels"
> as "%E3%81%82.xml.rels"
> >
> > - Replaced "word/document.xml" in the file "_rels/.rels" by
> "word/%E3%81%82.xml"
> >
> > - Replaced "/word/document.xml" in "[Content_Types].xml" by
> "/word/%E3%81%82.xml"
> >
> > Then, MS Word 2007 cannot open the revised WML document.
> >
> > Second, I used "a" instead of "%E3%81%82" in the above four changes.
> > Then, the document opened successfully.
> >
> > Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I
> also percent-encoded some other references (values of Relationship/@Target).
> > Then, the document opened successfully.
> >
> > Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A) rather
> than %E3%81%82.  As expected, the document does not open.
> >
> > My conclusions:
> >
> > - Non-ASCII characters in part names are not allowed even if they are
> percent-encoded.
> >
> > - %HH in values of Relationship/@Target are decoded as long as they
> represent ASCII characters.
> >
> > Regards,
> > Makoto
> >
> > -----Original Message-----
> > From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
> > Makoto
> > Sent: 06 January 2014 17:20
> > To: SC34
> > Subject: Re: OPC part names and referenes
> >
> > I did some more experiments using MS Office 2007 and .Net.
> >
> > Here is my understanding.
> >
> > - MS Office 2007 converts %HH to characters at least when %HH represents
> ASCII characters.
> >
> > - MS Office 2007 resolves absolute-path references (which begins with
> "/") correctly.
> >
> > - MS Office 2007 resolves relative-path references (which does not begin
> with "/") correctly.
> >
> > - .Net (Package.GetPart) recognizes neither relative-path references
> > nor %HH
> >
> > I think that we should limit our concern to MS Office.  The .Net
> implementation of OPC does not implement Annex A of Part 2 at all.
> >
> > Regards,
> > Makoto
> >
> > 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
> >> The more I think about OPC, the more confused I am.
> >>
> >> I have thought that references to OPC parts ("Unicode string"
> >> in Annex A of OPC) can contain non-ASCII characaters and that such
> >> non-ASCII characters are percent-encoded before referenced OPC parts
> >> are located.  I have also thought that references to OPC parts are
> >> resolved relative to containing OPC parts when they do not begin with
> >> "/".
> >>
> >> However, my experiment with .Net in F# appears to show I am mistaken.
> >> It reports errors if references to OPC parts contain non-ASCII
> >> characters.  Ir also reports errors if references to OPC parts do not
> >> begin with "/".
> >>
> >> I plan to manually edit OOXML documents and XPS documents and handle
> >> them by MS-Office and XPS viewers.
> >>
> >> Here is my F# program.
> >>
> >> open System.IO.Packaging
> >> open System
> >>
> >> let readOPC() =
> >>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
> >>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
> >>     let part =  package.GetPart(uri)
> >>     let enum = part.GetRelationships().GetEnumerator()
> >>     while (enum.MoveNext()) do
> >>         let relship = enum.Current
> >>         let targetURI = relship.TargetUri
> >>         try
> >>             let targetPart = package.GetPart(targetURI)
> >>             let s = targetPart.GetStream()
> >>             System.Console.WriteLine("Success: {0} {1}", targetURI,
> >> s.ReadByte())
> >>         with
> >>             | :? System.ArgumentException ->
> >> System.Console.WriteLine("Error: {0}", targetURI)
> >>     package.Close()
> >>
> >> readOPC()
> >>
> >>
> >> Regards,
> >> Makoto
> >
> > -----Original Message-----
> > From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA
> > Makoto
> > Sent: Tuesday, December 24, 2013 4:16 AM
> > To: SC34
> > Subject: OPC part names and referenes
> >
> > Dear colleagues,
> >
> > Merry Christmas!
> >
> > I am trying to implement Section 2 of the Japanese proposal
> > (http://kikaku.itscj.ipsj.or.jp/sc34/wg4/archive/sc34-wg4-2011-0207.ht
> > ml) for improving part names and referenes.
> >
> > While doing so, I studied the conversion from part references (which are
> relative) to part names again.  Here are some random thoughts.
> >
> > 1) Leading "/"
> >
> > In Seattle, we learned from OPC experts that references beginning with
> "/" and those not beginning with it are resolved differently.
> >
> >
> > Proposal: Explicitly state differences between these two types of
> references.  A reference to RFC 3986 is not good enough.
> >
> > 2) Resolution of relative URI referennces
> >
> > Neither OPC part references (or Unicode strings as specified in Annex A)
> nor OPC part names contain schemes (e.g., http:).  Should we nevertheless
> rely on resolution of relative URI references for the conversion from OPC
> part references to OPC part names?  In other words, should we first create
> aboslute URIs thus introducing schemes and then construct OPC part names by
> removing schemes?
> >
> > Proposal: Stop relying on resolution of relative URI references.
> > Rather, introduce "base OPC part name", which is the OPC part name of
> the containing OPC part, and introduce a procedure for merging base OPC
> part names and OPC part references.  This processing model does not have to
> touch schemes.
> >
> > 3) xml:base
> >
> > Do we really have to allow xml:base (and other similar mechanisms) to
> change the interpretation of OPC part references?  If such a mechaism
> specifies irrelevant URIs such as http://www.example.com, how should we
> interpret OPC part references?
> >
> > Proposal: Stop using xml:base (and other similar mechanisms).
> >
> >
> > Regards,
> > Makoto
>
>
>
> --
>
> Praying for the victims of the Japan Tohoku earthquake
>
> Makoto
>



-- 

Praying for the victims of the Japan Tohoku earthquake

Mako
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20140225/34b74968/attachment-0001.html>


More information about the sc34wg4 mailing list