DR-16-0018: WML: need sort method for special characters [for today's call]
keld at keldix.com
keld at keldix.com
Wed Jul 19 12:04:25 CEST 2017
Dear Makoto-san
I found out that indeed JIS X 4061 is supported in the Unicode collection of sorting specifications,
which are tailoring of 14651 specs:
https://stackoverflow.com/questions/29874198/which-japanese-sorting-collation-orders-are-supported-by-icu-cldr-uca
Best regards
keld
On Wed, Jul 19, 2017 at 11:33:03AM +0200, keld at keldix.com wrote:
> I think that MS excel uses 14651 or the equivalent Unicode UTS #10.
> Microsoft normailly uses Unicode specifications in many of their
> products, and 14651/UTS #10 is readily availiable in Microsoft's
> operating systems, as far as I know.
>
> But we can find out.
>
> Many applications use nowadays ISO 10646/Unicode, and just having the
> sorting specifications for the characters of X0208 and X0201 is not enough in
> todays Japanese environments, especially if you run modern
> microsoft operating systems and use the MS Office application.
>
> But you can have a tailoring of the 14651 template that is conforming to
> JIS X 4061, so that all of the characters of X0208 and X0201 are sorted
> correctly - and then the rest of the Japanese characters, including those in X0212,
> are also sorted in a culturally acceptable order.
>
> I remember from our previous meetings, Makoto-san, that you seem to be fond
> of Unicode specifications. In this case the ISO and Unicode specs are technically the
> same, so I hope that this fact will make you more positive about referencing
> the ISO 14651 standard and template.
>
> Best regards
> Keld
>
>
>
> On Wed, Jul 19, 2017 at 06:07:06PM +0900, MURATA Makoto wrote:
> > But in Japan, we did not rewrite JIS X 4061 on the basis of ISO 14651.
> > I do not know if China has created their tailoring of 14651. I do not
> > think
> > that MS Excel relies on 14651.
> >
> > I agree that 14651 is good for future projects. But I do not
> > think that it is practically possible to document what MS Excel
> > does using 14651.
> >
> > Regards,
> > Makoto
> >
> > 2017-07-19 18:00 GMT+09:00 <keld at keldix.com>:
> >
> > > Dear Makato-san
> > >
> > > I have asked a sorting expert, who is the editor of 14651, and he says that
> > > for Japanese and Chinese, there surely is a need for tailoring the sorting
> > > template
> > > in 14651. I also know that for my own language, Danish, a tailoring of
> > > 14651
> > > is required.
> > >
> > > A number of tailoring is already available for this, for many languages,
> > > and also in use in the marketplace, and readily available as open source
> > > specifications.
> > > It is many hundreds of specifications. I can come back with more
> > > information.
> > > But in the first place I can refer to the glibc specifications and the
> > > CLDR collections
> > > of sorting specifications. They both build on the ISO 14651 template.
> > >
> > > So I think a normative reference to ISO 14651 is very relevant, as this is
> > > what is done
> > > in the marketplace, and then also a mention of glibc and CLDR sorting
> > > could be mentioned
> > > in the bibliography.
> > >
> > > Best regards
> > > keld
> > >
> > > On Tue, Jul 18, 2017 at 09:43:00PM +0900, MURATA Makoto wrote:
> > > > ISO/IEC 14651 defines a reference comparison method and a common
> > > > template table for ordering text data. It is necessary to tailor the
> > > > common template table for a given language's ordering. For example,
> > > > the order of CJK ideographic characters is based on UCS code points,
> > > > and thus look meaningless to human CJK users.
> > > >
> > > > Meanwhile, Japanese standard (JIS X 4061:1996) defines ordering of
> > > > Japanese text. JIS X 4061 is NOT based on ISO/IEC 14651. Its
> > > > ordering is based on code points in JIS X 0208 and X0201 (which is
> > > > roughly equal to US-ASCII). This ordernig makes sense for Japanese
> > > > users, since code points in these standards are based on the Japanese
> > > > alphabetical order of the kana transcription of each Kanji.
> > > >
> > > > I do not believe that referencing ISO/IEC 14651 from OOXML is useful,
> > > > unless we provide our own variation of the common template table.
> > > > This variation should cover ordering in JIS X 0208. I suppose that it
> > > > should also cover orderings in mailing China, Taiwan, HongKong, and
> > > > and many areas. It might be technically possible to document do so,
> > > > but I do not think that it is practically possible.
> > > >
> > > > Regards,
> > > > Makoto
> > > >
> > > > 2016-12-21 6:37 GMT+09:00 Francis Cave <francis at franciscave.com>:
> > > >
> > > > >
> > > > > Hi Keld
> > > > >
> > > > > Thanks for your suggestions. We could certainly consider passing these
> > > on
> > > > > to
> > > > > the DR submitter as a possible approach to implementing the sort
> > > method for
> > > > > special characters. Whether or not they would match how existing
> > > > > implementers have tackled this issue is for those implementers to say,
> > > > > should they choose to do so.
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Francis
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > Sent: 20 December 2016 19:24
> > > > > To: Francis Cave <francis at franciscave.com>
> > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > > [for
> > > > > today's call]
> > > > >
> > > > > Hi Francis
> > > > >
> > > > > I understand that there are 3 valid values for sortMethod, Strokes,
> > > PinYin
> > > > > and none.
> > > > > The DR asks for what sorting to use for special characters.
> > > > > I propose to use the one defined by the null tailoring of ISO 14651.
> > > > >
> > > > > As strings can have all UCS charcters in it, the ordering just
> > > mentioned
> > > > > fits that bill nicely as it is defined on all UCS characters. ISO 14651
> > > > > also
> > > > > addresses other codesets than UCS, if that is relevant.
> > > > >
> > > > > It would make a difference on the first accent, all other characters
> > > > > considered equivalent, corresponding to the expected ordering in
> > > English
> > > > > and
> > > > > many other languages.
> > > > >
> > > > > Maybe one could use extLst to further describe the sorting method. I
> > > would
> > > > > advise to use normal internationalisation mechanisms, such as given by
> > > a
> > > > > locale, as this is a well-known concept, and readily available in most
> > > > > operating systems. This is not a new feature, so there is no need to
> > > have
> > > > > an
> > > > > amendment or revision of the standard, it could be handled by a DR with
> > > > > some
> > > > > guidance on how to do it.
> > > > >
> > > > > Some suggested advice could be: if the extList is empty, then the
> > > > > associated
> > > > > locale of the current environment should be used.
> > > > > Hereby I mean the language setting for the text in question, it could
> > > be
> > > > > an
> > > > > English part of a Spanish document.
> > > > >
> > > > > extLst could also be a name, in that case it should be the name of an
> > > > > implementation-defined locale of the operating system, of which the
> > > sorting
> > > > > spec is to be used.
> > > > >
> > > > > I don't know if a new DR is needed or this could be part of the answer
> > > to
> > > > > DR-16-0018.
> > > > >
> > > > > Best regards
> > > > > keld
> > > > >
> > > > > On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> > > > > >
> > > > > > HI Keld
> > > > > >
> > > > > > My main concern is whether the sort method, if specified, affects
> > > what
> > > > > > the user sees when they open a document. Suppose that, for the sake
> > > of
> > > > > > argument, the default sort method of implementation A sorts accented
> > > > > > text according to the "normal" approach (first accent difference
> > > > > > determines the order), while implementation B sorts accented text
> > > > > > according to the "French" approach (last accent difference determines
> > > > > > the order). Suppose a spreadsheet is created by implementation A and
> > > > > > has an auto-filter applied that sorts according to its default sort
> > > > > > method. If this spreadsheet is subsequently opened by implementation
> > > > > > B, will it appear the same as if it was re-opened by implementation
> > > A,
> > > > > > or will implementation B's default sort method be automatically
> > > > > > applied? My guess is that generally the latter will be the case, but
> > > this
> > > > > is obviously implementation-dependent.
> > > > > >
> > > > > > I suspect that this is a case where implementations must be free to
> > > > > > choose alternative approaches that aren't fully interoperable. Both
> > > > > > implementations A and B in my example will have had good reasons for
> > > > > > choosing different sort methods, e.g. based upon market demands.
> > > > > >
> > > > > > It might be nice if the sort method were spelt out in the document,
> > > > > > but this would definitely be a new feature. For now, in response to
> > > DR
> > > > > > 16-0018, I think we should simply ensure that the specification is
> > > > > > consistent with the schema and is clear about what are meant by the
> > > > > > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and 'stroke'.
> > > > > >
> > > > > > Interestingly, I note that the content model of sortState includes
> > > the
> > > > > > application-defined extension element extLst, which could be used to
> > > > > > specify the sort state in more detail, using MCE. So, in theory, it
> > > > > > would be possible to define an extension to OOXML without having to
> > > > > > amend the base standard. However, I'm not sure that there'd be
> > > sufficient
> > > > > demand for this.
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Francis
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > Sent: 20 December 2016 09:37
> > > > > > To: Francis Cave <francis at franciscave.com>
> > > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > > > > > [for today's call]
> > > > > >
> > > > > > Hi Francis
> > > > > >
> > > > > > Yes, I understand that new requirements need an amendment or
> > > revision.
> > > > > >
> > > > > > However, this is not what I recommend as the immediate solution to
> > > > > > sorting special characters. I just propose to use the ISO
> > > 14651/UTS#10
> > > > > > kind of default spec.
> > > > > > And no locale choice.
> > > > > >
> > > > > > Best regards
> > > > > > keld
> > > > > >
> > > > > > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > > > > > >
> > > > > > > Try again...
> > > > > > >
> > > > > > > In the first paragraph, for "cannot be done" read "can only be
> > > done".
> > > > > > > Clearly a serious finger malfunction...
> > > > > > >
> > > > > > > Francis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > > Sent: 19 December 2016 16:41
> > > > > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > > > characters [for today's call]
> > > > > > >
> > > > > > >
> > > > > > > In the first paragraph or "cannot be done" ready "can only be
> > > done".
> > > > > > Sorry!
> > > > > > >
> > > > > > > Francis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > > Sent: 19 December 2016 16:33
> > > > > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > > > characters [for today's call]
> > > > > > >
> > > > > > >
> > > > > > > Keld
> > > > > > >
> > > > > > > You may or may not be correct in your view that there is a user
> > > > > > > requirement that fields be sortable according to the current
> > > locale.
> > > > > > > However, if there is such a user requirement, it isn't currently
> > > met
> > > > > > > by OOXML or by implementations. Support for specifying the current
> > > > > > > locale as the sort method would involve an extension to OOXML and
> > > as
> > > > > > > such cannot be done by amendment or revision of the standard, not
> > > by
> > > > > > > Technical Corrigendum, so this is a big deal. Although this might
> > > be
> > > > > > > a relatively simple change, e.g. by changing ST_SortMethod to allow
> > > > > > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could not be
> > > made
> > > > > > > mandatory for implementations to support these new values without
> > > > > > > breaking existing implementations, and if they do not already do
> > > so,
> > > > > > > that suggests that there hasn't be much market pressure that would
> > > > > > persuade them to implement the enhancement.
> > > > > > >
> > > > > > > I note that ODF doesn't appear to have this feature either
> > > (although
> > > > > > > it has a similar feature ??? §19.865 text:sort-algorithm ??? for
> > > sorting
> > > > > > > text,
> > > > > > e.g.
> > > > > > > bibliographies), and in §19.685 table:order there is the following
> > > > > note:
> > > > > > >
> > > > > > > Note: Sorting is locale and implementation-dependent.
> > > > > > >
> > > > > > > It is hard to avoid the conclusion that demand for this feature is
> > > > > > > very limited, at least in office document applications.
> > > > > > >
> > > > > > > Francis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > > Sent: 18 December 2016 20:19
> > > > > > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > > > > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > > > > > characters [for today's call]
> > > > > > >
> > > > > > > I am not aware of the reason why this is so.
> > > > > > >
> > > > > > > But anyway, why not then use the 14651 tailorable ordering in its
> > > > > > > template form, which is equivalent to UTS#10 - as the universal
> > > > > > > sorting in
> > > > > > OOXML?
> > > > > > >
> > > > > > > I would think it was a user requirement that fields are sortable
> > > > > > > according to the current locale, eg a list of names.
> > > > > > >
> > > > > > > Having the sorting order not being changeable creates troubles for
> > > > > > > users too.
> > > > > > >
> > > > > > > best regards
> > > > > > > keld
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto wrote:
> > > > > > > > The sort order of Excel cannot be changed without causing
> > > troubles
> > > > > > > > to users.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Makoto
> > > > > > > >
> > > > > > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > > > > > >
> > > > > > > > > I propose that you use the locale of the current process, and
> > > > > > > > > the implied sorting sequence for special cheracters there, Or
> > > at
> > > > > > > > > least the sorting specified in ISO/IEC
> > > > > > > > > 14651
> > > > > > > > > or the equivalent Unicode specifcation.
> > > > > > > > >
> > > > > > > > > best regards
> > > > > > > > > keld Simonsen
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > > > > >
> > > > > > > > Makoto
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Praying for the victims of the Japan Tohoku earthquake
> > > >
> > > > Makoto
> > >
> >
> >
> >
> > --
> >
> > Praying for the victims of the Japan Tohoku earthquake
> >
> > Makoto
More information about the sc34wg4
mailing list