DR-16-0018: WML: need sort method for special characters [for today's call]

keld at keldix.com keld at keldix.com
Wed Jul 19 11:33:03 CEST 2017


I think that MS excel uses 14651 or the equivalent Unicode UTS #10.
Microsoft normailly uses Unicode specifications in many of their
products, and 14651/UTS #10 is readily availiable in Microsoft's
operating systems, as far as I know.

But we can find out. 

Many applications use nowadays ISO 10646/Unicode, and just having the
sorting specifications for the characters of X0208 and X0201 is not enough in
todays Japanese environments, especially if you run modern
microsoft operating systems and use the MS Office application. 

But you can have a tailoring of the 14651 template that is conforming to 
JIS X 4061, so that all of the characters of X0208 and X0201 are sorted
correctly  - and then the rest of the Japanese characters, including those in X0212,
are also sorted in a culturally acceptable order.

I remember from our previous meetings, Makoto-san, that you seem to be fond 
of Unicode specifications. In this case the ISO and Unicode specs are technically the
same, so I hope that this fact will make you more positive about referencing
the ISO 14651 standard and template.

Best regards
Keld



On Wed, Jul 19, 2017 at 06:07:06PM +0900, MURATA Makoto wrote:
> But in Japan, we did not rewrite JIS X 4061 on the basis of ISO 14651.
> I do not know if China has created their tailoring of 14651.  I do not
> think
> that MS Excel relies on 14651.
> 
> I agree that 14651 is good for future projects.  But I do not
> think that it is practically possible to document what MS Excel
> does using 14651.
> 
> Regards,
> Makoto
> 
> 2017-07-19 18:00 GMT+09:00 <keld at keldix.com>:
> 
> > Dear Makato-san
> >
> > I have asked a sorting expert, who is the editor of 14651, and he says that
> > for Japanese and Chinese, there surely is a need for tailoring the sorting
> > template
> > in 14651. I also know that for my own language, Danish, a tailoring of
> > 14651
> > is required.
> >
> > A number of tailoring is already available for this, for many languages,
> > and also in use in the marketplace, and readily available as open source
> > specifications.
> > It is many hundreds of specifications. I can come back with more
> > information.
> > But in the first place I can refer to the glibc specifications and the
> > CLDR collections
> > of sorting specifications. They both build on the ISO 14651 template.
> >
> > So I think a normative reference to ISO 14651 is very relevant, as this is
> > what is done
> > in the marketplace, and then also a mention of glibc and CLDR sorting
> > could be mentioned
> > in the bibliography.
> >
> > Best regards
> > keld
> >
> > On Tue, Jul 18, 2017 at 09:43:00PM +0900, MURATA Makoto wrote:
> > > ISO/IEC 14651 defines a reference comparison method and a common
> > > template table for ordering text data.  It is necessary to tailor the
> > > common template table for a given language's ordering.  For example,
> > > the order of CJK ideographic characters is based on UCS code points,
> > > and thus look meaningless to human CJK users.
> > >
> > > Meanwhile, Japanese standard (JIS X 4061:1996) defines ordering of
> > > Japanese text.  JIS X 4061 is NOT based on ISO/IEC 14651.  Its
> > > ordering is based on code points in JIS X 0208 and X0201 (which is
> > > roughly equal to US-ASCII).  This ordernig makes sense for Japanese
> > > users, since code points in these standards are based on the Japanese
> > > alphabetical order of the kana transcription of each Kanji.
> > >
> > > I do not believe that referencing ISO/IEC 14651 from OOXML is useful,
> > > unless we provide our own variation of the common template table.
> > > This variation should cover ordering in JIS X 0208.  I suppose that it
> > > should also cover orderings in mailing China, Taiwan, HongKong, and
> > > and many areas.  It might be technically possible to document do so,
> > > but I do not think that it is practically possible.
> > >
> > > Regards,
> > > Makoto
> > >
> > > 2016-12-21 6:37 GMT+09:00 Francis Cave <francis at franciscave.com>:
> > >
> > > >
> > > > Hi Keld
> > > >
> > > > Thanks for your suggestions. We could certainly consider passing these
> > on
> > > > to
> > > > the DR submitter as a possible approach to implementing the sort
> > method for
> > > > special characters. Whether or not they would match how existing
> > > > implementers have tackled this issue is for those implementers to say,
> > > > should they choose to do so.
> > > >
> > > > Kind regards,
> > > >
> > > > Francis
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > Sent: 20 December 2016 19:24
> > > > To: Francis Cave <francis at franciscave.com>
> > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > <e-SC34-WG4 at ecma-international.org>
> > > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > [for
> > > > today's call]
> > > >
> > > > Hi Francis
> > > >
> > > > I understand that there are 3 valid values for sortMethod, Strokes,
> > PinYin
> > > > and none.
> > > > The DR asks for what sorting to use for special characters.
> > > > I propose to use the one defined by the null tailoring of ISO 14651.
> > > >
> > > > As strings can have all UCS charcters in it, the ordering just
> > mentioned
> > > > fits that bill nicely as it is defined on all UCS characters. ISO 14651
> > > > also
> > > > addresses other codesets than UCS, if that is relevant.
> > > >
> > > > It would make a difference on the first accent, all other characters
> > > > considered equivalent, corresponding to the expected ordering in
> > English
> > > > and
> > > > many other languages.
> > > >
> > > > Maybe one could use extLst to further describe the sorting method. I
> > would
> > > > advise to use normal internationalisation mechanisms, such as given by
> > a
> > > > locale, as this is a well-known concept, and readily available in most
> > > > operating systems. This is not a new feature, so there is no need to
> > have
> > > > an
> > > > amendment or revision of the standard, it could be handled by a DR with
> > > > some
> > > > guidance on how to do it.
> > > >
> > > > Some suggested advice could be: if the extList is empty, then the
> > > > associated
> > > > locale of the current environment should be used.
> > > > Hereby I mean the language  setting for the text in question, it could
> > be
> > > > an
> > > > English part of a Spanish document.
> > > >
> > > > extLst could also be a name, in that case it should be the name of an
> > > > implementation-defined locale of the operating system, of which the
> > sorting
> > > > spec is to be used.
> > > >
> > > > I don't know if a new DR is needed or this could be part of the answer
> > to
> > > > DR-16-0018.
> > > >
> > > > Best regards
> > > > keld
> > > >
> > > > On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> > > > >
> > > > > HI Keld
> > > > >
> > > > > My main concern is whether the sort method, if specified, affects
> > what
> > > > > the user sees when they open a document. Suppose that, for the sake
> > of
> > > > > argument, the default sort method of implementation A sorts accented
> > > > > text according to the "normal" approach (first accent difference
> > > > > determines the order), while implementation B sorts accented text
> > > > > according to the "French" approach (last accent difference determines
> > > > > the order). Suppose a spreadsheet is created by implementation A and
> > > > > has an auto-filter applied that sorts according to its default sort
> > > > > method.  If this spreadsheet is subsequently opened by implementation
> > > > > B, will it appear the same as if it was re-opened by implementation
> > A,
> > > > > or will implementation B's default sort method be automatically
> > > > > applied? My guess is that generally the latter will be the case, but
> > this
> > > > is obviously implementation-dependent.
> > > > >
> > > > > I suspect that this is a case where implementations must be free to
> > > > > choose alternative approaches that aren't fully interoperable. Both
> > > > > implementations A and B in my example will have had good reasons for
> > > > > choosing different sort methods, e.g. based upon market demands.
> > > > >
> > > > > It might be nice if the sort method were spelt out in the document,
> > > > > but this would definitely be a new feature. For now, in response to
> > DR
> > > > > 16-0018, I think we should simply ensure that the specification is
> > > > > consistent with the schema and is clear about what are meant by the
> > > > > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and 'stroke'.
> > > > >
> > > > > Interestingly, I note that the content model of sortState includes
> > the
> > > > > application-defined extension element extLst, which could be used to
> > > > > specify the sort state in more detail, using MCE. So, in theory, it
> > > > > would be possible to define an extension to OOXML without having to
> > > > > amend the base standard. However, I'm not sure that there'd be
> > sufficient
> > > > demand for this.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Francis
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > Sent: 20 December 2016 09:37
> > > > > To: Francis Cave <francis at franciscave.com>
> > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > > > > [for today's call]
> > > > >
> > > > > Hi Francis
> > > > >
> > > > > Yes, I understand that new requirements need an amendment or
> > revision.
> > > > >
> > > > > However, this is not what I recommend as the immediate solution to
> > > > > sorting special characters. I just propose to use the ISO
> > 14651/UTS#10
> > > > > kind of default spec.
> > > > > And no locale choice.
> > > > >
> > > > > Best regards
> > > > > keld
> > > > >
> > > > > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > > > > >
> > > > > > Try again...
> > > > > >
> > > > > > In the first paragraph, for "cannot be done" read "can only be
> > done".
> > > > > > Clearly a serious finger malfunction...
> > > > > >
> > > > > > Francis
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > Sent: 19 December 2016 16:41
> > > > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > > characters [for today's call]
> > > > > >
> > > > > >
> > > > > > In the first paragraph or "cannot be done" ready "can only be
> > done".
> > > > > Sorry!
> > > > > >
> > > > > > Francis
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > Sent: 19 December 2016 16:33
> > > > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > > characters [for today's call]
> > > > > >
> > > > > >
> > > > > > Keld
> > > > > >
> > > > > > You may or may not be correct in your view that there is a user
> > > > > > requirement that fields be sortable according to the current
> > locale.
> > > > > > However, if there is such a user requirement, it isn't currently
> > met
> > > > > > by OOXML or by implementations. Support for specifying the current
> > > > > > locale as the sort method would involve an extension to OOXML and
> > as
> > > > > > such cannot be done by amendment or revision of the standard, not
> > by
> > > > > > Technical Corrigendum, so this is a big deal. Although this might
> > be
> > > > > > a relatively simple change, e.g. by changing ST_SortMethod to allow
> > > > > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could not be
> > made
> > > > > > mandatory for implementations to support these new values without
> > > > > > breaking existing implementations, and if they do not already do
> > so,
> > > > > > that suggests that there hasn't be much market pressure that would
> > > > > persuade them to implement the enhancement.
> > > > > >
> > > > > > I note that ODF doesn't appear to have this feature either
> > (although
> > > > > > it has a similar feature ??? §19.865 text:sort-algorithm ??? for
> > sorting
> > > > > > text,
> > > > > e.g.
> > > > > > bibliographies), and in §19.685 table:order there is the following
> > > > note:
> > > > > >
> > > > > >     Note: Sorting is locale and implementation-dependent.
> > > > > >
> > > > > > It is hard to avoid the conclusion that demand for this feature is
> > > > > > very limited, at least in office document applications.
> > > > > >
> > > > > > Francis
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > Sent: 18 December 2016 20:19
> > > > > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > > > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > > > > characters [for today's call]
> > > > > >
> > > > > > I am not aware of the reason why this is so.
> > > > > >
> > > > > > But anyway, why not then use the 14651 tailorable ordering in its
> > > > > > template form, which is equivalent to UTS#10 - as the universal
> > > > > > sorting in
> > > > > OOXML?
> > > > > >
> > > > > > I would think it was a user requirement that fields are sortable
> > > > > > according to the current locale, eg a list of names.
> > > > > >
> > > > > > Having the sorting order not being changeable creates troubles for
> > > > > > users too.
> > > > > >
> > > > > > best regards
> > > > > > keld
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto wrote:
> > > > > > > The sort order of Excel cannot be changed without causing
> > troubles
> > > > > > > to users.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Makoto
> > > > > > >
> > > > > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > > > > >
> > > > > > > > I propose that you use the locale of the current process, and
> > > > > > > > the implied sorting sequence for special cheracters there, Or
> > at
> > > > > > > > least the sorting specified in ISO/IEC
> > > > > > > > 14651
> > > > > > > > or the equivalent Unicode specifcation.
> > > > > > > >
> > > > > > > > best regards
> > > > > > > > keld Simonsen
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > > > >
> > > > > > > Makoto
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Praying for the victims of the Japan Tohoku earthquake
> > >
> > > Makoto
> >
> 
> 
> 
> -- 
> 
> Praying for the victims of the Japan Tohoku earthquake
> 
> Makoto


More information about the sc34wg4 mailing list