DR-16-0022: Shared ML: Escaping strings in ST_Xstring
Francis Cave
francis at franciscave.com
Fri Nov 10 11:40:31 CET 2017
Murata-san
I think you meant _[0-9a-fA-F]{4}_
I have tested this by creating a simple spreadsheet in LibreOffice. It seems that LibreOffice does not support this feature. See attached. Here is the string table:
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="4" uniqueCount="4">
<si>
<t xml:space="preserve">_xaaaa_</t>
</si>
<si>
<t xml:space="preserve">_xaaaa</t>
</si>
<si>
<t xml:space="preserve">_xgggg_</t>
</si>
<si>
<t xml:space="preserve">_xaaa_</t>
</si>
</sst>
If I open the document in LibreOffice, the first string is displayed as ‘_xaaaa_’. If I open the same document in Excel, the first string is converted to the Unicode character #xaaaa.
Kind regards,
Francis
From: eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] On Behalf Of MURATA Makoto
Sent: 10 November 2017 02:00
To: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
Subject: Re: DR-16-0022: Shared ML: Escaping strings in ST_Xstring
Francis,
Thanks for your comments. I now understand.
I did some experiments. Excel appears to escape an underscore
only when it is the first character of a string matching _[0-9a-zA-F]{4}_
Regards,
Makoto
<si>
<t>SW_x005F_x3850_CPU</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_x3850_CPU</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xzxcv</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xzxcv_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xzxcwev_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xFFFFFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x2000B_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x3000</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_x3000_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_xFFFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x0F_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xGG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xGGG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xGGGG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x000G_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xFFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_xffff_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_xf3f2_</t>
<phoneticPr fontId="1"/>
</si>
2017-11-09 23:50 GMT+09:00 Francis Cave <francis at franciscave.com <mailto:francis at franciscave.com> >:
Murata-san
I think that this DR is asking how to serialise the literal string “SW_x3850_CPU”, not “SW㡐CPU”. If “_xHHHH_” is interpreted as the Unicode character #xHHHH, any literal string in the form “_xHHHH_” has to have the initial “_” escaped, which is what Charlie is saying that Excel does. But does this mean that “_” is always escaped by Excel, or only escaped in certain contexts, such as if followed by “x”? Does this need to be tested?
Kind regards,
Francis
From: eb2mmrt at gmail.com <mailto:eb2mmrt at gmail.com> [mailto:eb2mmrt at gmail.com <mailto:eb2mmrt at gmail.com> ] On Behalf Of MURATA Makoto
Sent: 09 November 2017 00:55
To: SC 34 WG4 <e-SC34-WG4 at ecma-international.org <mailto:e-SC34-WG4 at ecma-international.org> >
Subject: Re: DR-16-0022: Shared ML: Escaping strings in ST_Xstring
>§22.9.2.19, “ST_Xstring (Escaped String)” says:
>
>For all characters that cannot be represented in XML as defined by the
>XML 1.0 specification, the characters are escaped using the Unicode
>numerical character representation escape character format _xHHHH_, where H
>represents a hexadecimal character in the character's value.
>[Example: The Unicode character 8 is not permitted
> in an XML 1.0 document, so it must be escaped as _x0008_. end example]
> But it's not clear from this if all such combinations should be escaped?
> or just those in the range [001-031]. Excel itself handles such sequences by
> escaping the first underscore but unfortunately other consumers such as
> OpenOffice do not remove the escaping so I think this needs clarifying.
W3C XML clearly defines which character is legal. We should
mention Well-formedness constraint: Legal Character.
https://www.w3.org/TR/2006/REC-xml-20060816/#wf-Legalchar
Or, does this DR ask how we can represent a literal such as "_x2345"?
Regards,
Makoto
2016-12-07 5:20 GMT+09:00 Rex Jaeschke <rex at rexjaeschke.com <mailto:rex at rexjaeschke.com> >:
Here's a new DR from Charlie.
Rex
--
Praying for the victims of the Japan Tohoku earthquake
Makoto
--
Praying for the victims of the Japan Tohoku earthquake
Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20171110/c7a4c28d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: escaped characters LO.XLSX
Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Size: 4756 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20171110/c7a4c28d/attachment-0001.xlsx>
More information about the sc34wg4
mailing list