DR-16-0018: WML: need sort method for special characters [for today's call]

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Wed Jul 19 13:44:10 CEST 2017


But the sort order for Japanese shown in this document is different from
what
MS Excel does and what X 4061 says.

Regards,
Makoto

2017-07-19 20:38 GMT+09:00 <keld at keldix.com>:

> Dear Makoto-san
>
> Microsoft has some information:
> https://msdn.microsoft.com/en-us/library/cc194863.aspx
>
> It does not refer 14651 nor UTS #10, but I think it is built on these
> specs.
> I believe all Microsoft sorting specs are built on 14651/UTS #10.
>
> It does mention a default system locale, and it does then refer to various
> ways
> of sorting chinese characters, incluting Unicode (10646) codepoint order,
> strokes,
> phonetical and some japanese shift jis order, which I think is conformant
> to X4061.
>
> Best regards
> keld
>
> On Wed, Jul 19, 2017 at 07:57:50PM +0900, MURATA Makoto wrote:
> > Keld,
> >
> > Thank you.  I did not know that.  I am still looking for the tailoring
> for
> > JIS X 4061,
> > but it is good to know it is somewhere in CLDR.  Are there tailoring for
> > Chinese?
> >
> > Regards,
> > Makoto
> >
> >
> > 2017-07-19 19:04 GMT+09:00 <keld at keldix.com>:
> >
> > > Dear Makoto-san
> > >
> > > I found out that indeed JIS X 4061 is supported in the Unicode
> collection
> > > of sorting specifications,
> > > which are tailoring of 14651 specs:
> > >
> > > https://stackoverflow.com/questions/29874198/which-
> > > japanese-sorting-collation-orders-are-supported-by-icu-cldr-uca
> > >
> > > Best regards
> > > keld
> > >
> > > On Wed, Jul 19, 2017 at 11:33:03AM +0200, keld at keldix.com wrote:
> > > > I think that MS excel uses 14651 or the equivalent Unicode UTS #10.
> > > > Microsoft normailly uses Unicode specifications in many of their
> > > > products, and 14651/UTS #10 is readily availiable in Microsoft's
> > > > operating systems, as far as I know.
> > > >
> > > > But we can find out.
> > > >
> > > > Many applications use nowadays ISO 10646/Unicode, and just having the
> > > > sorting specifications for the characters of X0208 and X0201 is not
> > > enough in
> > > > todays Japanese environments, especially if you run modern
> > > > microsoft operating systems and use the MS Office application.
> > > >
> > > > But you can have a tailoring of the 14651 template that is
> conforming to
> > > > JIS X 4061, so that all of the characters of X0208 and X0201 are
> sorted
> > > > correctly  - and then the rest of the Japanese characters, including
> > > those in X0212,
> > > > are also sorted in a culturally acceptable order.
> > > >
> > > > I remember from our previous meetings, Makoto-san, that you seem to
> be
> > > fond
> > > > of Unicode specifications. In this case the ISO and Unicode specs are
> > > technically the
> > > > same, so I hope that this fact will make you more positive about
> > > referencing
> > > > the ISO 14651 standard and template.
> > > >
> > > > Best regards
> > > > Keld
> > > >
> > > >
> > > >
> > > > On Wed, Jul 19, 2017 at 06:07:06PM +0900, MURATA Makoto wrote:
> > > > > But in Japan, we did not rewrite JIS X 4061 on the basis of ISO
> 14651.
> > > > > I do not know if China has created their tailoring of 14651.  I do
> not
> > > > > think
> > > > > that MS Excel relies on 14651.
> > > > >
> > > > > I agree that 14651 is good for future projects.  But I do not
> > > > > think that it is practically possible to document what MS Excel
> > > > > does using 14651.
> > > > >
> > > > > Regards,
> > > > > Makoto
> > > > >
> > > > > 2017-07-19 18:00 GMT+09:00 <keld at keldix.com>:
> > > > >
> > > > > > Dear Makato-san
> > > > > >
> > > > > > I have asked a sorting expert, who is the editor of 14651, and he
> > > says that
> > > > > > for Japanese and Chinese, there surely is a need for tailoring
> the
> > > sorting
> > > > > > template
> > > > > > in 14651. I also know that for my own language, Danish, a
> tailoring
> > > of
> > > > > > 14651
> > > > > > is required.
> > > > > >
> > > > > > A number of tailoring is already available for this, for many
> > > languages,
> > > > > > and also in use in the marketplace, and readily available as open
> > > source
> > > > > > specifications.
> > > > > > It is many hundreds of specifications. I can come back with more
> > > > > > information.
> > > > > > But in the first place I can refer to the glibc specifications
> and
> > > the
> > > > > > CLDR collections
> > > > > > of sorting specifications. They both build on the ISO 14651
> template.
> > > > > >
> > > > > > So I think a normative reference to ISO 14651 is very relevant,
> as
> > > this is
> > > > > > what is done
> > > > > > in the marketplace, and then also a mention of glibc and CLDR
> sorting
> > > > > > could be mentioned
> > > > > > in the bibliography.
> > > > > >
> > > > > > Best regards
> > > > > > keld
> > > > > >
> > > > > > On Tue, Jul 18, 2017 at 09:43:00PM +0900, MURATA Makoto wrote:
> > > > > > > ISO/IEC 14651 defines a reference comparison method and a
> common
> > > > > > > template table for ordering text data.  It is necessary to
> tailor
> > > the
> > > > > > > common template table for a given language's ordering.  For
> > > example,
> > > > > > > the order of CJK ideographic characters is based on UCS code
> > > points,
> > > > > > > and thus look meaningless to human CJK users.
> > > > > > >
> > > > > > > Meanwhile, Japanese standard (JIS X 4061:1996) defines
> ordering of
> > > > > > > Japanese text.  JIS X 4061 is NOT based on ISO/IEC 14651.  Its
> > > > > > > ordering is based on code points in JIS X 0208 and X0201
> (which is
> > > > > > > roughly equal to US-ASCII).  This ordernig makes sense for
> Japanese
> > > > > > > users, since code points in these standards are based on the
> > > Japanese
> > > > > > > alphabetical order of the kana transcription of each Kanji.
> > > > > > >
> > > > > > > I do not believe that referencing ISO/IEC 14651 from OOXML is
> > > useful,
> > > > > > > unless we provide our own variation of the common template
> table.
> > > > > > > This variation should cover ordering in JIS X 0208.  I suppose
> > > that it
> > > > > > > should also cover orderings in mailing China, Taiwan,
> HongKong, and
> > > > > > > and many areas.  It might be technically possible to document
> do
> > > so,
> > > > > > > but I do not think that it is practically possible.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Makoto
> > > > > > >
> > > > > > > 2016-12-21 6:37 GMT+09:00 Francis Cave <
> francis at franciscave.com>:
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Keld
> > > > > > > >
> > > > > > > > Thanks for your suggestions. We could certainly consider
> passing
> > > these
> > > > > > on
> > > > > > > > to
> > > > > > > > the DR submitter as a possible approach to implementing the
> sort
> > > > > > method for
> > > > > > > > special characters. Whether or not they would match how
> existing
> > > > > > > > implementers have tackled this issue is for those
> implementers
> > > to say,
> > > > > > > > should they choose to do so.
> > > > > > > >
> > > > > > > > Kind regards,
> > > > > > > >
> > > > > > > > Francis
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > > > Sent: 20 December 2016 19:24
> > > > > > > > To: Francis Cave <francis at franciscave.com>
> > > > > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > characters
> > > > > > [for
> > > > > > > > today's call]
> > > > > > > >
> > > > > > > > Hi Francis
> > > > > > > >
> > > > > > > > I understand that there are 3 valid values for sortMethod,
> > > Strokes,
> > > > > > PinYin
> > > > > > > > and none.
> > > > > > > > The DR asks for what sorting to use for special characters.
> > > > > > > > I propose to use the one defined by the null tailoring of ISO
> > > 14651.
> > > > > > > >
> > > > > > > > As strings can have all UCS charcters in it, the ordering
> just
> > > > > > mentioned
> > > > > > > > fits that bill nicely as it is defined on all UCS characters.
> > > ISO 14651
> > > > > > > > also
> > > > > > > > addresses other codesets than UCS, if that is relevant.
> > > > > > > >
> > > > > > > > It would make a difference on the first accent, all other
> > > characters
> > > > > > > > considered equivalent, corresponding to the expected
> ordering in
> > > > > > English
> > > > > > > > and
> > > > > > > > many other languages.
> > > > > > > >
> > > > > > > > Maybe one could use extLst to further describe the sorting
> > > method. I
> > > > > > would
> > > > > > > > advise to use normal internationalisation mechanisms, such as
> > > given by
> > > > > > a
> > > > > > > > locale, as this is a well-known concept, and readily
> available
> > > in most
> > > > > > > > operating systems. This is not a new feature, so there is no
> > > need to
> > > > > > have
> > > > > > > > an
> > > > > > > > amendment or revision of the standard, it could be handled
> by a
> > > DR with
> > > > > > > > some
> > > > > > > > guidance on how to do it.
> > > > > > > >
> > > > > > > > Some suggested advice could be: if the extList is empty,
> then the
> > > > > > > > associated
> > > > > > > > locale of the current environment should be used.
> > > > > > > > Hereby I mean the language  setting for the text in
> question, it
> > > could
> > > > > > be
> > > > > > > > an
> > > > > > > > English part of a Spanish document.
> > > > > > > >
> > > > > > > > extLst could also be a name, in that case it should be the
> name
> > > of an
> > > > > > > > implementation-defined locale of the operating system, of
> which
> > > the
> > > > > > sorting
> > > > > > > > spec is to be used.
> > > > > > > >
> > > > > > > > I don't know if a new DR is needed or this could be part of
> the
> > > answer
> > > > > > to
> > > > > > > > DR-16-0018.
> > > > > > > >
> > > > > > > > Best regards
> > > > > > > > keld
> > > > > > > >
> > > > > > > > On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> > > > > > > > >
> > > > > > > > > HI Keld
> > > > > > > > >
> > > > > > > > > My main concern is whether the sort method, if specified,
> > > affects
> > > > > > what
> > > > > > > > > the user sees when they open a document. Suppose that, for
> the
> > > sake
> > > > > > of
> > > > > > > > > argument, the default sort method of implementation A sorts
> > > accented
> > > > > > > > > text according to the "normal" approach (first accent
> > > difference
> > > > > > > > > determines the order), while implementation B sorts
> accented
> > > text
> > > > > > > > > according to the "French" approach (last accent difference
> > > determines
> > > > > > > > > the order). Suppose a spreadsheet is created by
> implementation
> > > A and
> > > > > > > > > has an auto-filter applied that sorts according to its
> default
> > > sort
> > > > > > > > > method.  If this spreadsheet is subsequently opened by
> > > implementation
> > > > > > > > > B, will it appear the same as if it was re-opened by
> > > implementation
> > > > > > A,
> > > > > > > > > or will implementation B's default sort method be
> automatically
> > > > > > > > > applied? My guess is that generally the latter will be the
> > > case, but
> > > > > > this
> > > > > > > > is obviously implementation-dependent.
> > > > > > > > >
> > > > > > > > > I suspect that this is a case where implementations must be
> > > free to
> > > > > > > > > choose alternative approaches that aren't fully
> interoperable.
> > > Both
> > > > > > > > > implementations A and B in my example will have had good
> > > reasons for
> > > > > > > > > choosing different sort methods, e.g. based upon market
> > > demands.
> > > > > > > > >
> > > > > > > > > It might be nice if the sort method were spelt out in the
> > > document,
> > > > > > > > > but this would definitely be a new feature. For now, in
> > > response to
> > > > > > DR
> > > > > > > > > 16-0018, I think we should simply ensure that the
> > > specification is
> > > > > > > > > consistent with the schema and is clear about what are
> meant
> > > by the
> > > > > > > > > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and
> > > 'stroke'.
> > > > > > > > >
> > > > > > > > > Interestingly, I note that the content model of sortState
> > > includes
> > > > > > the
> > > > > > > > > application-defined extension element extLst, which could
> be
> > > used to
> > > > > > > > > specify the sort state in more detail, using MCE. So, in
> > > theory, it
> > > > > > > > > would be possible to define an extension to OOXML without
> > > having to
> > > > > > > > > amend the base standard. However, I'm not sure that
> there'd be
> > > > > > sufficient
> > > > > > > > demand for this.
> > > > > > > > >
> > > > > > > > > Kind regards,
> > > > > > > > >
> > > > > > > > > Francis
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > > > > Sent: 20 December 2016 09:37
> > > > > > > > > To: Francis Cave <francis at franciscave.com>
> > > > > > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34
> WG4'
> > > > > > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > characters
> > > > > > > > > [for today's call]
> > > > > > > > >
> > > > > > > > > Hi Francis
> > > > > > > > >
> > > > > > > > > Yes, I understand that new requirements need an amendment
> or
> > > > > > revision.
> > > > > > > > >
> > > > > > > > > However, this is not what I recommend as the immediate
> > > solution to
> > > > > > > > > sorting special characters. I just propose to use the ISO
> > > > > > 14651/UTS#10
> > > > > > > > > kind of default spec.
> > > > > > > > > And no locale choice.
> > > > > > > > >
> > > > > > > > > Best regards
> > > > > > > > > keld
> > > > > > > > >
> > > > > > > > > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave
> wrote:
> > > > > > > > > >
> > > > > > > > > > Try again...
> > > > > > > > > >
> > > > > > > > > > In the first paragraph, for "cannot be done" read "can
> only
> > > be
> > > > > > done".
> > > > > > > > > > Clearly a serious finger malfunction...
> > > > > > > > > >
> > > > > > > > > > Francis
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > > > > > Sent: 19 December 2016 16:41
> > > > > > > > > > To: keld at keldix.com; 'MURATA Makoto' <
> > > eb2m-mrt at asahi-net.or.jp>
> > > > > > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > > > > > Subject: RE: DR-16-0018: WML: need sort method for
> special
> > > > > > > > > > characters [for today's call]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In the first paragraph or "cannot be done" ready "can
> only be
> > > > > > done".
> > > > > > > > > Sorry!
> > > > > > > > > >
> > > > > > > > > > Francis
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > > > > > Sent: 19 December 2016 16:33
> > > > > > > > > > To: keld at keldix.com; 'MURATA Makoto' <
> > > eb2m-mrt at asahi-net.or.jp>
> > > > > > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > > > > > Subject: RE: DR-16-0018: WML: need sort method for
> special
> > > > > > > > > > characters [for today's call]
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Keld
> > > > > > > > > >
> > > > > > > > > > You may or may not be correct in your view that there is
> a
> > > user
> > > > > > > > > > requirement that fields be sortable according to the
> current
> > > > > > locale.
> > > > > > > > > > However, if there is such a user requirement, it isn't
> > > currently
> > > > > > met
> > > > > > > > > > by OOXML or by implementations. Support for specifying
> the
> > > current
> > > > > > > > > > locale as the sort method would involve an extension to
> > > OOXML and
> > > > > > as
> > > > > > > > > > such cannot be done by amendment or revision of the
> > > standard, not
> > > > > > by
> > > > > > > > > > Technical Corrigendum, so this is a big deal. Although
> this
> > > might
> > > > > > be
> > > > > > > > > > a relatively simple change, e.g. by changing
> ST_SortMethod
> > > to allow
> > > > > > > > > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could
> not
> > > be
> > > > > > made
> > > > > > > > > > mandatory for implementations to support these new values
> > > without
> > > > > > > > > > breaking existing implementations, and if they do not
> > > already do
> > > > > > so,
> > > > > > > > > > that suggests that there hasn't be much market pressure
> that
> > > would
> > > > > > > > > persuade them to implement the enhancement.
> > > > > > > > > >
> > > > > > > > > > I note that ODF doesn't appear to have this feature
> either
> > > > > > (although
> > > > > > > > > > it has a similar feature ??? §19.865 text:sort-algorithm
> ???
> > > for
> > > > > > sorting
> > > > > > > > > > text,
> > > > > > > > > e.g.
> > > > > > > > > > bibliographies), and in §19.685 table:order there is the
> > > following
> > > > > > > > note:
> > > > > > > > > >
> > > > > > > > > >     Note: Sorting is locale and implementation-dependent.
> > > > > > > > > >
> > > > > > > > > > It is hard to avoid the conclusion that demand for this
> > > feature is
> > > > > > > > > > very limited, at least in office document applications.
> > > > > > > > > >
> > > > > > > > > > Francis
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > > > > > Sent: 18 December 2016 20:19
> > > > > > > > > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > > > > > > > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > > > > > > > > Subject: Re: DR-16-0018: WML: need sort method for
> special
> > > > > > > > > > characters [for today's call]
> > > > > > > > > >
> > > > > > > > > > I am not aware of the reason why this is so.
> > > > > > > > > >
> > > > > > > > > > But anyway, why not then use the 14651 tailorable
> ordering
> > > in its
> > > > > > > > > > template form, which is equivalent to UTS#10 - as the
> > > universal
> > > > > > > > > > sorting in
> > > > > > > > > OOXML?
> > > > > > > > > >
> > > > > > > > > > I would think it was a user requirement that fields are
> > > sortable
> > > > > > > > > > according to the current locale, eg a list of names.
> > > > > > > > > >
> > > > > > > > > > Having the sorting order not being changeable creates
> > > troubles for
> > > > > > > > > > users too.
> > > > > > > > > >
> > > > > > > > > > best regards
> > > > > > > > > > keld
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto
> > > wrote:
> > > > > > > > > > > The sort order of Excel cannot be changed without
> causing
> > > > > > troubles
> > > > > > > > > > > to users.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Makoto
> > > > > > > > > > >
> > > > > > > > > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <
> keld at keldix.com>:
> > > > > > > > > > >
> > > > > > > > > > > > I propose that you use the locale of the current
> > > process, and
> > > > > > > > > > > > the implied sorting sequence for special cheracters
> > > there, Or
> > > > > > at
> > > > > > > > > > > > least the sorting specified in ISO/IEC
> > > > > > > > > > > > 14651
> > > > > > > > > > > > or the equivalent Unicode specifcation.
> > > > > > > > > > > >
> > > > > > > > > > > > best regards
> > > > > > > > > > > > keld Simonsen
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > > > > > > > >
> > > > > > > > > > > Makoto
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > > > >
> > > > > > > Makoto
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > >
> > > > > Makoto
> > >
> >
> >
> >
> > --
> >
> > Praying for the victims of the Japan Tohoku earthquake
> >
> > Makoto
>



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20170719/cf790e16/attachment-0001.html>


More information about the sc34wg4 mailing list