[sc34wg4] Design choices: (was Re: OPC part names and referenes)

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Tue Jan 7 05:18:23 CET 2014


Dear Murata-san,

Thank you very much for difficult experiments.
Considering the results, I think
The option 1 (pure ASCII, no %HH) is the best if the standard
prioritizes for the document interchange between the existing
(i.e. including some bugs) implementations.
The option 3 is the second best, because it would be the most
expected situation.

Regards,
suzuki toshiya, Hiroshima University, Japan


MURATA Makoto wrote:
> Dear  colleagues,
> 
> I can think of four design choices.  Which do you prefer?
> 
> Option 1:
> 
> No non-ASCII characters.  No %HH.  No I18N.
> 
> Option 2 (Office 2007):
> 
> No non-ASCII characters.  %HH in OPC part references are allowed but
> they are required to represent ASCII characters.  (Note: What's the point
> of allowing %HH only for representing ASCII?)  No I18N.
> 
> Option 3:
> 
> Verbatim non-ASCII characters everywhere.  Note: it remains to decide
> when and where %HH should be allowed.  I18N based on the latest
> version of ZIP.
> 
> Option 4 (close to current wording in 29500-2):
> 
> Verbatim non-ASCII characters are allowed as part of OPC part names
> and OPC part references.  But ZIP item names are ASCII.  Conversion
> to OPC part names to ZIP item names invokes percent encoding.  I18N
> without requiring the latest version of ZIP.  Note: it remains to decide
> when and where %HH should be allowed.
> 
> Regards,
> Makoto
> 
> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> Here are some further experiments.
>>
>> Summary:
>>
>> MS Word 2007 does not allow non-ASCII characters within part names
>> even if they are percent-encoded.  %HH in OPC part references are
>> decoded as long as they represent ASCII characters.
>>
>> Experiments:
>>
>> First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
>> Specificaly:
>>
>> - Renamed the file "document.xml" under the directory "word" as
>>   "%E3%81%82.xml"
>>
>> - Renamed the file "document.xml.rels" under the directory "word/_rels" as
>>   "%E3%81%82.xml.rels"
>>
>> - Replaced "word/document.xml" in the file "_rels/.rels"
>>    by "word/%E3%81%82.xml"
>>
>> - Replaced "/word/document.xml" in "[Content_Types].xml"
>>    by "/word/%E3%81%82.xml"
>>
>> Then, MS Word 2007 cannot open the revised WML document.
>>
>> Second, I used "a" instead of "%E3%81%82" in the above four changes.
>> Then, the document opened successfully.
>>
>> Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I also
>> percent-encoded some other references (values of Relationship/@Target).
>> Then, the document opened successfully.
>>
>> Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A)
>> rather than %E3%81%82.  As expected, the document does not open.
>>
>> My conclusions:
>>
>> - Non-ASCII characters in part names are not allowed even if they
>>   are percent-encoded.
>>
>> - %HH in values of Relationship/@Target are decoded as long as they
>>    represent ASCII characters.
>>
>> Regards,
>> Makoto
>>
>> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>> I did some more experiments using MS Office 2007 and .Net.
>>>
>>> Here is my understanding.
>>>
>>> - MS Office 2007 converts %HH to characters at
>>>   least when %HH represents ASCII characters.
>>>
>>> - MS Office 2007 resolves absolute-path references (
>>>   which begins with "/") correctly.
>>>
>>> - MS Office 2007 resolves relative-path references (
>>>   which does not begin with "/") correctly.
>>>
>>> - .Net (Package.GetPart) recognizes neither relative-path
>>>   references nor %HH
>>>
>>> I think that we should limit our concern to MS Office.  The .Net
>>> implementation of OPC does not implement Annex A of Part 2
>>> at all.
>>>
>>> Regards,
>>> Makoto
>>>
>>> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>>> The more I think about OPC, the more confused I am.
>>>>
>>>> I have thought that references to OPC parts ("Unicode string"
>>>> in Annex A of OPC) can contain non-ASCII characaters and
>>>> that such non-ASCII characters are percent-encoded before
>>>> referenced OPC parts are located.  I have also thought that
>>>> references to OPC parts are resolved relative to containing
>>>> OPC parts when they do not begin with "/".
>>>>
>>>> However, my experiment with .Net in F# appears to
>>>> show I am mistaken.  It reports errors if references to
>>>> OPC parts contain non-ASCII characters.  Ir also
>>>> reports errors if references to OPC parts do not begin
>>>> with "/".
>>>>
>>>> I plan to manually edit OOXML documents and XPS
>>>> documents and handle them by MS-Office and XPS
>>>> viewers.
>>>>
>>>> Here is my F# program.
>>>>
>>>> open System.IO.Packaging
>>>> open System
>>>>
>>>> let readOPC() =
>>>>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>>>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>>>     let part =  package.GetPart(uri)
>>>>     let enum = part.GetRelationships().GetEnumerator()
>>>>     while (enum.MoveNext()) do
>>>>         let relship = enum.Current
>>>>         let targetURI = relship.TargetUri
>>>>         try
>>>>             let targetPart = package.GetPart(targetURI)
>>>>             let s = targetPart.GetStream()
>>>>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>>>> s.ReadByte())
>>>>         with
>>>>             | :? System.ArgumentException ->
>>>> System.Console.WriteLine("Error: {0}", targetURI)
>>>>     package.Close()
>>>>
>>>> readOPC()
>>>>
>>>>
>>>> Regards,
>>>> Makoto
>>>
>>>
>>> --
>>>
>>> Praying for the victims of the Japan Tohoku earthquake
>>>
>>> Makoto
>>
>>
>> --
>>
>> Praying for the victims of the Japan Tohoku earthquake
>>
>> Makoto
> 
> 
> 



More information about the sc34wg4 mailing list