DR-16-0022: Shared ML: Escaping strings in ST_Xstring
MURATA Makoto
eb2m-mrt at asahi-net.or.jp
Fri Nov 10 02:59:40 CET 2017
Francis,
Thanks for your comments. I now understand.
I did some experiments. Excel appears to escape an underscore
only when it is the first character of a string matching _[0-9a-zA-F]{4}_
Regards,
Makoto
<si>
<t>SW_x005F_x3850_CPU</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_x3850_CPU</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xzxcv</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xzxcv_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xzxcwev_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xFFFFFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x2000B_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x3000</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_x3000_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_xFFFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x0F_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xGG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xGGG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xGGGG_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x000G_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_xFFF_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_xffff_</t>
<phoneticPr fontId="1"/>
</si>
<si>
<t>_x005F_xf3f2_</t>
<phoneticPr fontId="1"/>
</si>
2017-11-09 23:50 GMT+09:00 Francis Cave <francis at franciscave.com>:
> Murata-san
>
>
>
> I think that this DR is asking how to serialise the literal string
> “SW_x3850_CPU”, not “SW㡐CPU”. If “_xHHHH_” is interpreted as the Unicode
> character #xHHHH, any literal string in the form “_xHHHH_” has to have the
> initial “_” escaped, which is what Charlie is saying that Excel does. But
> does this mean that “_” is always escaped by Excel, or only escaped in
> certain contexts, such as if followed by “x”? Does this need to be tested?
>
>
>
> Kind regards,
>
>
>
> Francis
>
>
>
>
>
>
>
> *From:* eb2mmrt at gmail.com [mailto:eb2mmrt at gmail.com] *On Behalf Of *MURATA
> Makoto
> *Sent:* 09 November 2017 00:55
> *To:* SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> *Subject:* Re: DR-16-0022: Shared ML: Escaping strings in ST_Xstring
>
>
>
> >§22.9.2.19, “ST_Xstring (Escaped String)” says:
> >
> >For all characters that cannot be represented in XML as defined by the
>
> >XML 1.0 specification, the characters are escaped using the Unicode
>
> >numerical character representation escape character format _xHHHH_, where
> H
> >represents a hexadecimal character in the character's value.
>
> >[Example: The Unicode character 8 is not permitted
> > in an XML 1.0 document, so it must be escaped as _x0008_. end example]
>
> > But it's not clear from this if all such combinations should be escaped?
> > or just those in the range [001-031]. Excel itself handles such
> sequences by
>
> > escaping the first underscore but unfortunately other consumers such as
>
> > OpenOffice do not remove the escaping so I think this needs clarifying.
>
>
>
>
>
> W3C XML clearly defines which character is legal. We should
>
> mention *Well-formedness constraint: Legal Character.*
>
>
>
> https://www.w3.org/TR/2006/REC-xml-20060816/#wf-Legalchar
>
>
>
> Or, does this DR ask how we can represent a literal such as "_x2345"?
>
>
>
> Regards,
>
> Makoto
>
>
>
> 2016-12-07 5:20 GMT+09:00 Rex Jaeschke <rex at rexjaeschke.com>:
>
> Here's a new DR from Charlie.
>
> Rex
>
>
>
>
>
> --
>
>
> Praying for the victims of the Japan Tohoku earthquake
>
> Makoto
>
--
Praying for the victims of the Japan Tohoku earthquake
Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20171110/94e7d507/attachment.html>
More information about the sc34wg4
mailing list