DR-16-0018: WML: need sort method for special characters [for today's call]

keld at keldix.com keld at keldix.com
Wed Jul 19 11:00:45 CEST 2017


Dear Makato-san

I have asked a sorting expert, who is the editor of 14651, and he says that
for Japanese and Chinese, there surely is a need for tailoring the sorting template
in 14651. I also know that for my own language, Danish, a tailoring of 14651
is required.

A number of tailoring is already available for this, for many languages,
and also in use in the marketplace, and readily available as open source specifications.
It is many hundreds of specifications. I can come back with more information. 
But in the first place I can refer to the glibc specifications and the CLDR collections
of sorting specifications. They both build on the ISO 14651 template.

So I think a normative reference to ISO 14651 is very relevant, as this is what is done
in the marketplace, and then also a mention of glibc and CLDR sorting could be mentioned
in the bibliography.

Best regards
keld

On Tue, Jul 18, 2017 at 09:43:00PM +0900, MURATA Makoto wrote:
> ISO/IEC 14651 defines a reference comparison method and a common
> template table for ordering text data.  It is necessary to tailor the
> common template table for a given language's ordering.  For example,
> the order of CJK ideographic characters is based on UCS code points,
> and thus look meaningless to human CJK users.
> 
> Meanwhile, Japanese standard (JIS X 4061:1996) defines ordering of
> Japanese text.  JIS X 4061 is NOT based on ISO/IEC 14651.  Its
> ordering is based on code points in JIS X 0208 and X0201 (which is
> roughly equal to US-ASCII).  This ordernig makes sense for Japanese
> users, since code points in these standards are based on the Japanese
> alphabetical order of the kana transcription of each Kanji.
> 
> I do not believe that referencing ISO/IEC 14651 from OOXML is useful,
> unless we provide our own variation of the common template table.
> This variation should cover ordering in JIS X 0208.  I suppose that it
> should also cover orderings in mailing China, Taiwan, HongKong, and
> and many areas.  It might be technically possible to document do so,
> but I do not think that it is practically possible.
> 
> Regards,
> Makoto
> 
> 2016-12-21 6:37 GMT+09:00 Francis Cave <francis at franciscave.com>:
> 
> >
> > Hi Keld
> >
> > Thanks for your suggestions. We could certainly consider passing these on
> > to
> > the DR submitter as a possible approach to implementing the sort method for
> > special characters. Whether or not they would match how existing
> > implementers have tackled this issue is for those implementers to say,
> > should they choose to do so.
> >
> > Kind regards,
> >
> > Francis
> >
> >
> >
> > -----Original Message-----
> > From: keld at keldix.com [mailto:keld at keldix.com]
> > Sent: 20 December 2016 19:24
> > To: Francis Cave <francis at franciscave.com>
> > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > <e-SC34-WG4 at ecma-international.org>
> > Subject: Re: DR-16-0018: WML: need sort method for special characters [for
> > today's call]
> >
> > Hi Francis
> >
> > I understand that there are 3 valid values for sortMethod, Strokes, PinYin
> > and none.
> > The DR asks for what sorting to use for special characters.
> > I propose to use the one defined by the null tailoring of ISO 14651.
> >
> > As strings can have all UCS charcters in it, the ordering just mentioned
> > fits that bill nicely as it is defined on all UCS characters. ISO 14651
> > also
> > addresses other codesets than UCS, if that is relevant.
> >
> > It would make a difference on the first accent, all other characters
> > considered equivalent, corresponding to the expected ordering in English
> > and
> > many other languages.
> >
> > Maybe one could use extLst to further describe the sorting method. I would
> > advise to use normal internationalisation mechanisms, such as given by a
> > locale, as this is a well-known concept, and readily available in most
> > operating systems. This is not a new feature, so there is no need to have
> > an
> > amendment or revision of the standard, it could be handled by a DR with
> > some
> > guidance on how to do it.
> >
> > Some suggested advice could be: if the extList is empty, then the
> > associated
> > locale of the current environment should be used.
> > Hereby I mean the language  setting for the text in question, it could be
> > an
> > English part of a Spanish document.
> >
> > extLst could also be a name, in that case it should be the name of an
> > implementation-defined locale of the operating system, of which the sorting
> > spec is to be used.
> >
> > I don't know if a new DR is needed or this could be part of the answer to
> > DR-16-0018.
> >
> > Best regards
> > keld
> >
> > On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> > >
> > > HI Keld
> > >
> > > My main concern is whether the sort method, if specified, affects what
> > > the user sees when they open a document. Suppose that, for the sake of
> > > argument, the default sort method of implementation A sorts accented
> > > text according to the "normal" approach (first accent difference
> > > determines the order), while implementation B sorts accented text
> > > according to the "French" approach (last accent difference determines
> > > the order). Suppose a spreadsheet is created by implementation A and
> > > has an auto-filter applied that sorts according to its default sort
> > > method.  If this spreadsheet is subsequently opened by implementation
> > > B, will it appear the same as if it was re-opened by implementation A,
> > > or will implementation B's default sort method be automatically
> > > applied? My guess is that generally the latter will be the case, but this
> > is obviously implementation-dependent.
> > >
> > > I suspect that this is a case where implementations must be free to
> > > choose alternative approaches that aren't fully interoperable. Both
> > > implementations A and B in my example will have had good reasons for
> > > choosing different sort methods, e.g. based upon market demands.
> > >
> > > It might be nice if the sort method were spelt out in the document,
> > > but this would definitely be a new feature. For now, in response to DR
> > > 16-0018, I think we should simply ensure that the specification is
> > > consistent with the schema and is clear about what are meant by the
> > > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and 'stroke'.
> > >
> > > Interestingly, I note that the content model of sortState includes the
> > > application-defined extension element extLst, which could be used to
> > > specify the sort state in more detail, using MCE. So, in theory, it
> > > would be possible to define an extension to OOXML without having to
> > > amend the base standard. However, I'm not sure that there'd be sufficient
> > demand for this.
> > >
> > > Kind regards,
> > >
> > > Francis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > Sent: 20 December 2016 09:37
> > > To: Francis Cave <francis at franciscave.com>
> > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > <e-SC34-WG4 at ecma-international.org>
> > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > > [for today's call]
> > >
> > > Hi Francis
> > >
> > > Yes, I understand that new requirements need an amendment or revision.
> > >
> > > However, this is not what I recommend as the immediate solution to
> > > sorting special characters. I just propose to use the ISO 14651/UTS#10
> > > kind of default spec.
> > > And no locale choice.
> > >
> > > Best regards
> > > keld
> > >
> > > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > > >
> > > > Try again...
> > > >
> > > > In the first paragraph, for "cannot be done" read "can only be done".
> > > > Clearly a serious finger malfunction...
> > > >
> > > > Francis
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > Sent: 19 December 2016 16:41
> > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > characters [for today's call]
> > > >
> > > >
> > > > In the first paragraph or "cannot be done" ready "can only be done".
> > > Sorry!
> > > >
> > > > Francis
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > Sent: 19 December 2016 16:33
> > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > characters [for today's call]
> > > >
> > > >
> > > > Keld
> > > >
> > > > You may or may not be correct in your view that there is a user
> > > > requirement that fields be sortable according to the current locale.
> > > > However, if there is such a user requirement, it isn't currently met
> > > > by OOXML or by implementations. Support for specifying the current
> > > > locale as the sort method would involve an extension to OOXML and as
> > > > such cannot be done by amendment or revision of the standard, not by
> > > > Technical Corrigendum, so this is a big deal. Although this might be
> > > > a relatively simple change, e.g. by changing ST_SortMethod to allow
> > > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could not be made
> > > > mandatory for implementations to support these new values without
> > > > breaking existing implementations, and if they do not already do so,
> > > > that suggests that there hasn't be much market pressure that would
> > > persuade them to implement the enhancement.
> > > >
> > > > I note that ODF doesn't appear to have this feature either (although
> > > > it has a similar feature ??? §19.865 text:sort-algorithm ??? for sorting
> > > > text,
> > > e.g.
> > > > bibliographies), and in §19.685 table:order there is the following
> > note:
> > > >
> > > >     Note: Sorting is locale and implementation-dependent.
> > > >
> > > > It is hard to avoid the conclusion that demand for this feature is
> > > > very limited, at least in office document applications.
> > > >
> > > > Francis
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > Sent: 18 December 2016 20:19
> > > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > > characters [for today's call]
> > > >
> > > > I am not aware of the reason why this is so.
> > > >
> > > > But anyway, why not then use the 14651 tailorable ordering in its
> > > > template form, which is equivalent to UTS#10 - as the universal
> > > > sorting in
> > > OOXML?
> > > >
> > > > I would think it was a user requirement that fields are sortable
> > > > according to the current locale, eg a list of names.
> > > >
> > > > Having the sorting order not being changeable creates troubles for
> > > > users too.
> > > >
> > > > best regards
> > > > keld
> > > >
> > > >
> > > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto wrote:
> > > > > The sort order of Excel cannot be changed without causing troubles
> > > > > to users.
> > > > >
> > > > > Regards,
> > > > > Makoto
> > > > >
> > > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > > >
> > > > > > I propose that you use the locale of the current process, and
> > > > > > the implied sorting sequence for special cheracters there, Or at
> > > > > > least the sorting specified in ISO/IEC
> > > > > > 14651
> > > > > > or the equivalent Unicode specifcation.
> > > > > >
> > > > > > best regards
> > > > > > keld Simonsen
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > >
> > > > > Makoto
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> 
> 
> -- 
> 
> Praying for the victims of the Japan Tohoku earthquake
> 
> Makoto


More information about the sc34wg4 mailing list