Design choices: (was Re: OPC part names and referenes)

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Tue Jan 7 04:27:29 CET 2014


Dear  colleagues,

I can think of four design choices.  Which do you prefer?

Option 1:

No non-ASCII characters.  No %HH.  No I18N.

Option 2 (Office 2007):

No non-ASCII characters.  %HH in OPC part references are allowed but
they are required to represent ASCII characters.  (Note: What's the point
of allowing %HH only for representing ASCII?)  No I18N.

Option 3:

Verbatim non-ASCII characters everywhere.  Note: it remains to decide
when and where %HH should be allowed.  I18N based on the latest
version of ZIP.

Option 4 (close to current wording in 29500-2):

Verbatim non-ASCII characters are allowed as part of OPC part names
and OPC part references.  But ZIP item names are ASCII.  Conversion
to OPC part names to ZIP item names invokes percent encoding.  I18N
without requiring the latest version of ZIP.  Note: it remains to decide
when and where %HH should be allowed.

Regards,
Makoto

2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
> Here are some further experiments.
>
> Summary:
>
> MS Word 2007 does not allow non-ASCII characters within part names
> even if they are percent-encoded.  %HH in OPC part references are
> decoded as long as they represent ASCII characters.
>
> Experiments:
>
> First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
> Specificaly:
>
> - Renamed the file "document.xml" under the directory "word" as
>   "%E3%81%82.xml"
>
> - Renamed the file "document.xml.rels" under the directory "word/_rels" as
>   "%E3%81%82.xml.rels"
>
> - Replaced "word/document.xml" in the file "_rels/.rels"
>    by "word/%E3%81%82.xml"
>
> - Replaced "/word/document.xml" in "[Content_Types].xml"
>    by "/word/%E3%81%82.xml"
>
> Then, MS Word 2007 cannot open the revised WML document.
>
> Second, I used "a" instead of "%E3%81%82" in the above four changes.
> Then, the document opened successfully.
>
> Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I also
> percent-encoded some other references (values of Relationship/@Target).
> Then, the document opened successfully.
>
> Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A)
> rather than %E3%81%82.  As expected, the document does not open.
>
> My conclusions:
>
> - Non-ASCII characters in part names are not allowed even if they
>   are percent-encoded.
>
> - %HH in values of Relationship/@Target are decoded as long as they
>    represent ASCII characters.
>
> Regards,
> Makoto
>
> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> I did some more experiments using MS Office 2007 and .Net.
>>
>> Here is my understanding.
>>
>> - MS Office 2007 converts %HH to characters at
>>   least when %HH represents ASCII characters.
>>
>> - MS Office 2007 resolves absolute-path references (
>>   which begins with "/") correctly.
>>
>> - MS Office 2007 resolves relative-path references (
>>   which does not begin with "/") correctly.
>>
>> - .Net (Package.GetPart) recognizes neither relative-path
>>   references nor %HH
>>
>> I think that we should limit our concern to MS Office.  The .Net
>> implementation of OPC does not implement Annex A of Part 2
>> at all.
>>
>> Regards,
>> Makoto
>>
>> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>> The more I think about OPC, the more confused I am.
>>>
>>> I have thought that references to OPC parts ("Unicode string"
>>> in Annex A of OPC) can contain non-ASCII characaters and
>>> that such non-ASCII characters are percent-encoded before
>>> referenced OPC parts are located.  I have also thought that
>>> references to OPC parts are resolved relative to containing
>>> OPC parts when they do not begin with "/".
>>>
>>> However, my experiment with .Net in F# appears to
>>> show I am mistaken.  It reports errors if references to
>>> OPC parts contain non-ASCII characters.  Ir also
>>> reports errors if references to OPC parts do not begin
>>> with "/".
>>>
>>> I plan to manually edit OOXML documents and XPS
>>> documents and handle them by MS-Office and XPS
>>> viewers.
>>>
>>> Here is my F# program.
>>>
>>> open System.IO.Packaging
>>> open System
>>>
>>> let readOPC() =
>>>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>>     let part =  package.GetPart(uri)
>>>     let enum = part.GetRelationships().GetEnumerator()
>>>     while (enum.MoveNext()) do
>>>         let relship = enum.Current
>>>         let targetURI = relship.TargetUri
>>>         try
>>>             let targetPart = package.GetPart(targetURI)
>>>             let s = targetPart.GetStream()
>>>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>>> s.ReadByte())
>>>         with
>>>             | :? System.ArgumentException ->
>>> System.Console.WriteLine("Error: {0}", targetURI)
>>>     package.Close()
>>>
>>> readOPC()
>>>
>>>
>>> Regards,
>>> Makoto
>>
>>
>>
>> --
>>
>> Praying for the victims of the Japan Tohoku earthquake
>>
>> Makoto
>
>
>
> --
>
> Praying for the victims of the Japan Tohoku earthquake
>
> Makoto



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto


More information about the sc34wg4 mailing list