OPC part names and referenes

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Tue Jan 7 03:40:42 CET 2014


Here are some further experiments.

Summary:

MS Word 2007 does not allow non-ASCII characters within part names
even if they are percent-encoded.  %HH in OPC part references are
decoded as long as they represent ASCII characters.

Experiments:

First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
Specificaly:

- Renamed the file "document.xml" under the directory "word" as
  "%E3%81%82.xml"

- Renamed the file "document.xml.rels" under the directory "word/_rels" as
  "%E3%81%82.xml.rels"

- Replaced "word/document.xml" in the file "_rels/.rels"
   by "word/%E3%81%82.xml"

- Replaced "/word/document.xml" in "[Content_Types].xml"
   by "/word/%E3%81%82.xml"

Then, MS Word 2007 cannot open the revised WML document.

Second, I used "a" instead of "%E3%81%82" in the above four changes.
Then, the document opened successfully.

Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I also
percent-encoded some other references (values of Relationship/@Target).
Then, the document opened successfully.

Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A)
rather than %E3%81%82.  As expected, the document does not open.

My conclusions:

- Non-ASCII characters in part names are not allowed even if they
  are percent-encoded.

- %HH in values of Relationship/@Target are decoded as long as they
   represent ASCII characters.

Regards,
Makoto

2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
> I did some more experiments using MS Office 2007 and .Net.
>
> Here is my understanding.
>
> - MS Office 2007 converts %HH to characters at
>   least when %HH represents ASCII characters.
>
> - MS Office 2007 resolves absolute-path references (
>   which begins with "/") correctly.
>
> - MS Office 2007 resolves relative-path references (
>   which does not begin with "/") correctly.
>
> - .Net (Package.GetPart) recognizes neither relative-path
>   references nor %HH
>
> I think that we should limit our concern to MS Office.  The .Net
> implementation of OPC does not implement Annex A of Part 2
> at all.
>
> Regards,
> Makoto
>
> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> The more I think about OPC, the more confused I am.
>>
>> I have thought that references to OPC parts ("Unicode string"
>> in Annex A of OPC) can contain non-ASCII characaters and
>> that such non-ASCII characters are percent-encoded before
>> referenced OPC parts are located.  I have also thought that
>> references to OPC parts are resolved relative to containing
>> OPC parts when they do not begin with "/".
>>
>> However, my experiment with .Net in F# appears to
>> show I am mistaken.  It reports errors if references to
>> OPC parts contain non-ASCII characters.  Ir also
>> reports errors if references to OPC parts do not begin
>> with "/".
>>
>> I plan to manually edit OOXML documents and XPS
>> documents and handle them by MS-Office and XPS
>> viewers.
>>
>> Here is my F# program.
>>
>> open System.IO.Packaging
>> open System
>>
>> let readOPC() =
>>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>     let part =  package.GetPart(uri)
>>     let enum = part.GetRelationships().GetEnumerator()
>>     while (enum.MoveNext()) do
>>         let relship = enum.Current
>>         let targetURI = relship.TargetUri
>>         try
>>             let targetPart = package.GetPart(targetURI)
>>             let s = targetPart.GetStream()
>>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>> s.ReadByte())
>>         with
>>             | :? System.ArgumentException ->
>> System.Console.WriteLine("Error: {0}", targetURI)
>>     package.Close()
>>
>> readOPC()
>>
>>
>> Regards,
>> Makoto
>
>
>
> --
>
> Praying for the victims of the Japan Tohoku earthquake
>
> Makoto



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto


More information about the sc34wg4 mailing list