DR-16-0018: WML: need sort method for special characters [for today's call]

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Tue Jul 18 14:43:00 CEST 2017


ISO/IEC 14651 defines a reference comparison method and a common
template table for ordering text data.  It is necessary to tailor the
common template table for a given language's ordering.  For example,
the order of CJK ideographic characters is based on UCS code points,
and thus look meaningless to human CJK users.

Meanwhile, Japanese standard (JIS X 4061:1996) defines ordering of
Japanese text.  JIS X 4061 is NOT based on ISO/IEC 14651.  Its
ordering is based on code points in JIS X 0208 and X0201 (which is
roughly equal to US-ASCII).  This ordernig makes sense for Japanese
users, since code points in these standards are based on the Japanese
alphabetical order of the kana transcription of each Kanji.

I do not believe that referencing ISO/IEC 14651 from OOXML is useful,
unless we provide our own variation of the common template table.
This variation should cover ordering in JIS X 0208.  I suppose that it
should also cover orderings in mailing China, Taiwan, HongKong, and
and many areas.  It might be technically possible to document do so,
but I do not think that it is practically possible.

Regards,
Makoto

2016-12-21 6:37 GMT+09:00 Francis Cave <francis at franciscave.com>:

>
> Hi Keld
>
> Thanks for your suggestions. We could certainly consider passing these on
> to
> the DR submitter as a possible approach to implementing the sort method for
> special characters. Whether or not they would match how existing
> implementers have tackled this issue is for those implementers to say,
> should they choose to do so.
>
> Kind regards,
>
> Francis
>
>
>
> -----Original Message-----
> From: keld at keldix.com [mailto:keld at keldix.com]
> Sent: 20 December 2016 19:24
> To: Francis Cave <francis at franciscave.com>
> Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> <e-SC34-WG4 at ecma-international.org>
> Subject: Re: DR-16-0018: WML: need sort method for special characters [for
> today's call]
>
> Hi Francis
>
> I understand that there are 3 valid values for sortMethod, Strokes, PinYin
> and none.
> The DR asks for what sorting to use for special characters.
> I propose to use the one defined by the null tailoring of ISO 14651.
>
> As strings can have all UCS charcters in it, the ordering just mentioned
> fits that bill nicely as it is defined on all UCS characters. ISO 14651
> also
> addresses other codesets than UCS, if that is relevant.
>
> It would make a difference on the first accent, all other characters
> considered equivalent, corresponding to the expected ordering in English
> and
> many other languages.
>
> Maybe one could use extLst to further describe the sorting method. I would
> advise to use normal internationalisation mechanisms, such as given by a
> locale, as this is a well-known concept, and readily available in most
> operating systems. This is not a new feature, so there is no need to have
> an
> amendment or revision of the standard, it could be handled by a DR with
> some
> guidance on how to do it.
>
> Some suggested advice could be: if the extList is empty, then the
> associated
> locale of the current environment should be used.
> Hereby I mean the language  setting for the text in question, it could be
> an
> English part of a Spanish document.
>
> extLst could also be a name, in that case it should be the name of an
> implementation-defined locale of the operating system, of which the sorting
> spec is to be used.
>
> I don't know if a new DR is needed or this could be part of the answer to
> DR-16-0018.
>
> Best regards
> keld
>
> On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> >
> > HI Keld
> >
> > My main concern is whether the sort method, if specified, affects what
> > the user sees when they open a document. Suppose that, for the sake of
> > argument, the default sort method of implementation A sorts accented
> > text according to the "normal" approach (first accent difference
> > determines the order), while implementation B sorts accented text
> > according to the "French" approach (last accent difference determines
> > the order). Suppose a spreadsheet is created by implementation A and
> > has an auto-filter applied that sorts according to its default sort
> > method.  If this spreadsheet is subsequently opened by implementation
> > B, will it appear the same as if it was re-opened by implementation A,
> > or will implementation B's default sort method be automatically
> > applied? My guess is that generally the latter will be the case, but this
> is obviously implementation-dependent.
> >
> > I suspect that this is a case where implementations must be free to
> > choose alternative approaches that aren't fully interoperable. Both
> > implementations A and B in my example will have had good reasons for
> > choosing different sort methods, e.g. based upon market demands.
> >
> > It might be nice if the sort method were spelt out in the document,
> > but this would definitely be a new feature. For now, in response to DR
> > 16-0018, I think we should simply ensure that the specification is
> > consistent with the schema and is clear about what are meant by the
> > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and 'stroke'.
> >
> > Interestingly, I note that the content model of sortState includes the
> > application-defined extension element extLst, which could be used to
> > specify the sort state in more detail, using MCE. So, in theory, it
> > would be possible to define an extension to OOXML without having to
> > amend the base standard. However, I'm not sure that there'd be sufficient
> demand for this.
> >
> > Kind regards,
> >
> > Francis
> >
> >
> >
> > -----Original Message-----
> > From: keld at keldix.com [mailto:keld at keldix.com]
> > Sent: 20 December 2016 09:37
> > To: Francis Cave <francis at franciscave.com>
> > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > <e-SC34-WG4 at ecma-international.org>
> > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > [for today's call]
> >
> > Hi Francis
> >
> > Yes, I understand that new requirements need an amendment or revision.
> >
> > However, this is not what I recommend as the immediate solution to
> > sorting special characters. I just propose to use the ISO 14651/UTS#10
> > kind of default spec.
> > And no locale choice.
> >
> > Best regards
> > keld
> >
> > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > >
> > > Try again...
> > >
> > > In the first paragraph, for "cannot be done" read "can only be done".
> > > Clearly a serious finger malfunction...
> > >
> > > Francis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Francis Cave [mailto:francis at franciscave.com]
> > > Sent: 19 December 2016 16:41
> > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > characters [for today's call]
> > >
> > >
> > > In the first paragraph or "cannot be done" ready "can only be done".
> > Sorry!
> > >
> > > Francis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Francis Cave [mailto:francis at franciscave.com]
> > > Sent: 19 December 2016 16:33
> > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > characters [for today's call]
> > >
> > >
> > > Keld
> > >
> > > You may or may not be correct in your view that there is a user
> > > requirement that fields be sortable according to the current locale.
> > > However, if there is such a user requirement, it isn't currently met
> > > by OOXML or by implementations. Support for specifying the current
> > > locale as the sort method would involve an extension to OOXML and as
> > > such cannot be done by amendment or revision of the standard, not by
> > > Technical Corrigendum, so this is a big deal. Although this might be
> > > a relatively simple change, e.g. by changing ST_SortMethod to allow
> > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could not be made
> > > mandatory for implementations to support these new values without
> > > breaking existing implementations, and if they do not already do so,
> > > that suggests that there hasn't be much market pressure that would
> > persuade them to implement the enhancement.
> > >
> > > I note that ODF doesn't appear to have this feature either (although
> > > it has a similar feature – §19.865 text:sort-algorithm – for sorting
> > > text,
> > e.g.
> > > bibliographies), and in §19.685 table:order there is the following
> note:
> > >
> > >     Note: Sorting is locale and implementation-dependent.
> > >
> > > It is hard to avoid the conclusion that demand for this feature is
> > > very limited, at least in office document applications.
> > >
> > > Francis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > Sent: 18 December 2016 20:19
> > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > characters [for today's call]
> > >
> > > I am not aware of the reason why this is so.
> > >
> > > But anyway, why not then use the 14651 tailorable ordering in its
> > > template form, which is equivalent to UTS#10 - as the universal
> > > sorting in
> > OOXML?
> > >
> > > I would think it was a user requirement that fields are sortable
> > > according to the current locale, eg a list of names.
> > >
> > > Having the sorting order not being changeable creates troubles for
> > > users too.
> > >
> > > best regards
> > > keld
> > >
> > >
> > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto wrote:
> > > > The sort order of Excel cannot be changed without causing troubles
> > > > to users.
> > > >
> > > > Regards,
> > > > Makoto
> > > >
> > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > >
> > > > > I propose that you use the locale of the current process, and
> > > > > the implied sorting sequence for special cheracters there, Or at
> > > > > least the sorting specified in ISO/IEC
> > > > > 14651
> > > > > or the equivalent Unicode specifcation.
> > > > >
> > > > > best regards
> > > > > keld Simonsen
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Praying for the victims of the Japan Tohoku earthquake
> > > >
> > > > Makoto
> > >
> > >
> > >
> > >
> > >
> >
>
>
>


-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20170718/660d12dd/attachment-0001.html>


More information about the sc34wg4 mailing list