[sc34wg4] Design choices: (was Re: OPC part names and referenes)

John Haug johnhaug at exchange.microsoft.com
Wed Jan 8 01:12:19 CET 2014


Yes, coming in the mail I'm sending shortly.  I also needed to get a better understanding of the big picture before getting lost in the details, which may change based on a better understanding of the goals and past discussions.

-----Original Message-----
From: Jim Thatcher [mailto:Jim.Thatcher at microsoft.com] 
Sent: Tuesday, January 7, 2014 11:22 AM
To: Chris Rae; suzuki toshiya; MURATA Makoto
Cc: SC34
Subject: RE: [sc34wg4] Design choices: (was Re: OPC part names and referenes)

It would help me consider which of the options proposed by Murata-san I would advocate if I had a good idea of what use cases we are trying to cover. Does anyone want to take a stab at describing use cases that our work on Part 2 is intended to enable?

Jim

-----Original Message-----
From: Chris Rae [mailto:Chris.Rae at microsoft.com]
Sent: Tuesday, January 7, 2014 10:08 AM
To: suzuki toshiya; MURATA Makoto
Cc: SC34
Subject: RE: [sc34wg4] Design choices: (was Re: OPC part names and referenes)

Hello gents (and happy new year!). Am I right in understanding that option 1 effectively subsumes option 2? I.e. "hello%20there" is valid in both cases, but in the second case %20 represents a space and in the first case it's just the literal characters?

Chris

-----Original Message-----
From: suzuki toshiya [mailto:mpsuzuki at hiroshima-u.ac.jp]
Sent: 06 January 2014 20:18
To: MURATA Makoto
Cc: SC34
Subject: Re: [sc34wg4] Design choices: (was Re: OPC part names and referenes)

Dear Murata-san,

Thank you very much for difficult experiments.
Considering the results, I think
The option 1 (pure ASCII, no %HH) is the best if the standard prioritizes for the document interchange between the existing (i.e. including some bugs) implementations.
The option 3 is the second best, because it would be the most expected situation.

Regards,
suzuki toshiya, Hiroshima University, Japan


MURATA Makoto wrote:
> Dear  colleagues,
> 
> I can think of four design choices.  Which do you prefer?
> 
> Option 1:
> 
> No non-ASCII characters.  No %HH.  No I18N.
> 
> Option 2 (Office 2007):
> 
> No non-ASCII characters.  %HH in OPC part references are allowed but 
> they are required to represent ASCII characters.  (Note: What's the 
> point of allowing %HH only for representing ASCII?)  No I18N.
> 
> Option 3:
> 
> Verbatim non-ASCII characters everywhere.  Note: it remains to decide 
> when and where %HH should be allowed.  I18N based on the latest 
> version of ZIP.
> 
> Option 4 (close to current wording in 29500-2):
> 
> Verbatim non-ASCII characters are allowed as part of OPC part names 
> and OPC part references.  But ZIP item names are ASCII.  Conversion to 
> OPC part names to ZIP item names invokes percent encoding.  I18N 
> without requiring the latest version of ZIP.  Note: it remains to 
> decide when and where %HH should be allowed.
> 
> Regards,
> Makoto
> 
> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>> Here are some further experiments.
>>
>> Summary:
>>
>> MS Word 2007 does not allow non-ASCII characters within part names 
>> even if they are percent-encoded.  %HH in OPC part references are 
>> decoded as long as they represent ASCII characters.
>>
>> Experiments:
>>
>> First, I replaced "document.xml" in a WML document by "%E3%81%82.xml".
>> Specificaly:
>>
>> - Renamed the file "document.xml" under the directory "word" as
>>   "%E3%81%82.xml"
>>
>> - Renamed the file "document.xml.rels" under the directory "word/_rels" as
>>   "%E3%81%82.xml.rels"
>>
>> - Replaced "word/document.xml" in the file "_rels/.rels"
>>    by "word/%E3%81%82.xml"
>>
>> - Replaced "/word/document.xml" in "[Content_Types].xml"
>>    by "/word/%E3%81%82.xml"
>>
>> Then, MS Word 2007 cannot open the revised WML document.
>>
>> Second, I used "a" instead of "%E3%81%82" in the above four changes.
>> Then, the document opened successfully.
>>
>> Third, I replaced "a.xml" in the file "_rels/.rels" by "%61.xml".  I 
>> also percent-encoded some other references (values of Relationship/@Target).
>> Then, the document opened successfully.
>>
>> Fourth, just in case, I tried verbatim U+3042 (HIRAGANA LETTER A) 
>> rather than %E3%81%82.  As expected, the document does not open.
>>
>> My conclusions:
>>
>> - Non-ASCII characters in part names are not allowed even if they
>>   are percent-encoded.
>>
>> - %HH in values of Relationship/@Target are decoded as long as they
>>    represent ASCII characters.
>>
>> Regards,
>> Makoto
>>
>> 2014/1/7 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>> I did some more experiments using MS Office 2007 and .Net.
>>>
>>> Here is my understanding.
>>>
>>> - MS Office 2007 converts %HH to characters at
>>>   least when %HH represents ASCII characters.
>>>
>>> - MS Office 2007 resolves absolute-path references (
>>>   which begins with "/") correctly.
>>>
>>> - MS Office 2007 resolves relative-path references (
>>>   which does not begin with "/") correctly.
>>>
>>> - .Net (Package.GetPart) recognizes neither relative-path
>>>   references nor %HH
>>>
>>> I think that we should limit our concern to MS Office.  The .Net 
>>> implementation of OPC does not implement Annex A of Part 2 at all.
>>>
>>> Regards,
>>> Makoto
>>>
>>> 2013/12/28 MURATA Makoto <eb2m-mrt at asahi-net.or.jp>:
>>>> The more I think about OPC, the more confused I am.
>>>>
>>>> I have thought that references to OPC parts ("Unicode string"
>>>> in Annex A of OPC) can contain non-ASCII characaters and that such 
>>>> non-ASCII characters are percent-encoded before referenced OPC 
>>>> parts are located.  I have also thought that references to OPC 
>>>> parts are resolved relative to containing OPC parts when they do 
>>>> not begin with "/".
>>>>
>>>> However, my experiment with .Net in F# appears to show I am 
>>>> mistaken.  It reports errors if references to OPC parts contain 
>>>> non-ASCII characters.  Ir also reports errors if references to OPC 
>>>> parts do not begin with "/".
>>>>
>>>> I plan to manually edit OOXML documents and XPS documents and 
>>>> handle them by MS-Office and XPS viewers.
>>>>
>>>> Here is my F# program.
>>>>
>>>> open System.IO.Packaging
>>>> open System
>>>>
>>>> let readOPC() =
>>>>     let package = Package.Open("f:test.opc", IO.FileMode.Open)
>>>>     let uri = new Uri(Uri.EscapeUriString "/fあ/f1", UriKind.Relative)
>>>>     let part =  package.GetPart(uri)
>>>>     let enum = part.GetRelationships().GetEnumerator()
>>>>     while (enum.MoveNext()) do
>>>>         let relship = enum.Current
>>>>         let targetURI = relship.TargetUri
>>>>         try
>>>>             let targetPart = package.GetPart(targetURI)
>>>>             let s = targetPart.GetStream()
>>>>             System.Console.WriteLine("Success: {0} {1}", targetURI,
>>>> s.ReadByte())
>>>>         with
>>>>             | :? System.ArgumentException ->
>>>> System.Console.WriteLine("Error: {0}", targetURI)
>>>>     package.Close()
>>>>
>>>> readOPC()
>>>>
>>>>
>>>> Regards,
>>>> Makoto
>>>
>>>
>>> --
>>>
>>> Praying for the victims of the Japan Tohoku earthquake
>>>
>>> Makoto
>>
>>
>> --
>>
>> Praying for the victims of the Japan Tohoku earthquake
>>
>> Makoto
> 
> 
> 



More information about the sc34wg4 mailing list