[sc34wg4] Design choices: (was Re: OPC part names and referenes)
suzuki toshiya
mpsuzuki at hiroshima-u.ac.jp
Tue Jan 7 05:18:23 CET 2014
Dear Murata-san,
Thank you very much for difficult experiments.
Considering the results, I think
The option 1 (pure ASCII, no %HH) is the best if the standard
prioritizes for the document interchange between the existing
(i.e. including some bugs) implementations.
The option 3 is the second best, because it would be the most
expected situation.
Regards,
suzuki toshiya, Hiroshima University, Japan
MURATA Makoto wrote:
> Dear colleagues,
>
> I can think of four design choices. Which do you prefer?
>
> Option 1:
>
> No non-ASCII characters. No %HH. No I18N.
>
> Option 2 (Office 2007):
>
> No non-ASCII characters. %HH in OPC part references are allowed but
> they are required to represent ASCII characters. (Note: What's the point
> of allowing %HH only for representing ASCII?) No I18N.
>
> Option 3:
>
> Verbatim non-ASCII characters everywhere. Note: it remains to decide
> when and where %HH should be allowed. I18N based on the latest
> version of ZIP.
>
> Option 4 (close to current wording in 29500-2):
>
> Verbatim non-ASCII characters are allowed as part of OPC part names
> and OPC part references. But ZIP item names are ASCII. Conversion
> to OPC part names to ZIP item names invokes percent encoding. I18N
> without requiring the latest version of ZIP. Note: it remains to decide
> when and where %HH should be allowed.
>
> Regards,
> Makoto
>
> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> Here are some further experiments.
>>
>> Summary:
>>
>> MS Word 2007 does not allow non-ASCII characters within part names
>> even if they are percent-encoded. %HH in OPC part references are
>> decoded as long as they represent ASCII characters.
>>
>> Experiments:
>>
>> First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
>> Specificaly:
>>
>> - Renamed the file "document.xml" under the directory "word" as
>> "%E3%81%82.xml"
>>
>> - Renamed the file "document.xml.rels" under the directory "word/_rels" as
>> "%E3%81%82.xml.rels"
>>
>> - Replaced "word/document.xml" in the file "_rels/.rels"
>> by "word/%E3%81%82.xml"
>>
>> - Replaced "/word/document.xml" in "[Content_Types].xml"
>> by "/word/%E3%81%82.xml"
>>
>> Then, MS Word 2007 cannot open the revised WML document.
>>
>> Second, I used "a" instead of "%E3%81%82" in the above four changes.
>> Then, the document opened successfully.
>>
>> Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml". I also
>> percent-encoded some other references (values of Relationship/@Target).
>> Then, the document opened successfully.
>>
>> Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A)
>> rather than %E3%81%82. As expected, the document does not open.
>>
>> My conclusions:
>>
>> - Non-ASCII characters in part names are not allowed even if they
>> are percent-encoded.
>>
>> - %HH in values of Relationship/@Target are decoded as long as they
>> represent ASCII characters.
>>
>> Regards,
>> Makoto
>>
>> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>> I did some more experiments using MS Office 2007 and .Net.
>>>
>>> Here is my understanding.
>>>
>>> - MS Office 2007 converts %HH to characters at
>>> least when %HH represents ASCII characters.
>>>
>>> - MS Office 2007 resolves absolute-path references (
>>> which begins with "/") correctly.
>>>
>>> - MS Office 2007 resolves relative-path references (
>>> which does not begin with "/") correctly.
>>>
>>> - .Net (Package.GetPart) recognizes neither relative-path
>>> references nor %HH
>>>
>>> I think that we should limit our concern to MS Office. The .Net
>>> implementation of OPC does not implement Annex A of Part 2
>>> at all.
>>>
>>> Regards,
>>> Makoto
>>>
>>> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>>> The more I think about OPC, the more confused I am.
>>>>
>>>> I have thought that references to OPC parts ("Unicode string"
>>>> in Annex A of OPC) can contain non-ASCII characaters and
>>>> that such non-ASCII characters are percent-encoded before
>>>> referenced OPC parts are located. I have also thought that
>>>> references to OPC parts are resolved relative to containing
>>>> OPC parts when they do not begin with "/".
>>>>
>>>> However, my experiment with .Net in F# appears to
>>>> show I am mistaken. It reports errors if references to
>>>> OPC parts contain non-ASCII characters. Ir also
>>>> reports errors if references to OPC parts do not begin
>>>> with "/".
>>>>
>>>> I plan to manually edit OOXML documents and XPS
>>>> documents and handle them by MS-Office and XPS
>>>> viewers.
>>>>
>>>> Here is my F# program.
>>>>
>>>> open System.IO.Packaging
>>>> open System
>>>>
>>>> let readOPC() =
>>>> let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>>> let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>>> let part = package.GetPart(uri)
>>>> let enum = part.GetRelationships().GetEnumerator()
>>>> while (enum.MoveNext()) do
>>>> let relship = enum.Current
>>>> let targetURI = relship.TargetUri
>>>> try
>>>> let targetPart = package.GetPart(targetURI)
>>>> let s = targetPart.GetStream()
>>>> System.Console.WriteLine("Success: {0} {1}", targetURI,
>>>> s.ReadByte())
>>>> with
>>>> | :? System.ArgumentException ->
>>>> System.Console.WriteLine("Error: {0}", targetURI)
>>>> package.Close()
>>>>
>>>> readOPC()
>>>>
>>>>
>>>> Regards,
>>>> Makoto
>>>
>>>
>>> --
>>>
>>> Praying for the victims of the Japan Tohoku earthquake
>>>
>>> Makoto
>>
>>
>> --
>>
>> Praying for the victims of the Japan Tohoku earthquake
>>
>> Makoto
>
>
>
More information about the sc34wg4
mailing list