DR-16-0018: WML: need sort method for special characters [for today's call]

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Wed Jul 19 12:57:50 CEST 2017


Keld,

Thank you.  I did not know that.  I am still looking for the tailoring for
JIS X 4061,
but it is good to know it is somewhere in CLDR.  Are there tailoring for
Chinese?

Regards,
Makoto


2017-07-19 19:04 GMT+09:00 <keld at keldix.com>:

> Dear Makoto-san
>
> I found out that indeed JIS X 4061 is supported in the Unicode collection
> of sorting specifications,
> which are tailoring of 14651 specs:
>
> https://stackoverflow.com/questions/29874198/which-
> japanese-sorting-collation-orders-are-supported-by-icu-cldr-uca
>
> Best regards
> keld
>
> On Wed, Jul 19, 2017 at 11:33:03AM +0200, keld at keldix.com wrote:
> > I think that MS excel uses 14651 or the equivalent Unicode UTS #10.
> > Microsoft normailly uses Unicode specifications in many of their
> > products, and 14651/UTS #10 is readily availiable in Microsoft's
> > operating systems, as far as I know.
> >
> > But we can find out.
> >
> > Many applications use nowadays ISO 10646/Unicode, and just having the
> > sorting specifications for the characters of X0208 and X0201 is not
> enough in
> > todays Japanese environments, especially if you run modern
> > microsoft operating systems and use the MS Office application.
> >
> > But you can have a tailoring of the 14651 template that is conforming to
> > JIS X 4061, so that all of the characters of X0208 and X0201 are sorted
> > correctly  - and then the rest of the Japanese characters, including
> those in X0212,
> > are also sorted in a culturally acceptable order.
> >
> > I remember from our previous meetings, Makoto-san, that you seem to be
> fond
> > of Unicode specifications. In this case the ISO and Unicode specs are
> technically the
> > same, so I hope that this fact will make you more positive about
> referencing
> > the ISO 14651 standard and template.
> >
> > Best regards
> > Keld
> >
> >
> >
> > On Wed, Jul 19, 2017 at 06:07:06PM +0900, MURATA Makoto wrote:
> > > But in Japan, we did not rewrite JIS X 4061 on the basis of ISO 14651.
> > > I do not know if China has created their tailoring of 14651.  I do not
> > > think
> > > that MS Excel relies on 14651.
> > >
> > > I agree that 14651 is good for future projects.  But I do not
> > > think that it is practically possible to document what MS Excel
> > > does using 14651.
> > >
> > > Regards,
> > > Makoto
> > >
> > > 2017-07-19 18:00 GMT+09:00 <keld at keldix.com>:
> > >
> > > > Dear Makato-san
> > > >
> > > > I have asked a sorting expert, who is the editor of 14651, and he
> says that
> > > > for Japanese and Chinese, there surely is a need for tailoring the
> sorting
> > > > template
> > > > in 14651. I also know that for my own language, Danish, a tailoring
> of
> > > > 14651
> > > > is required.
> > > >
> > > > A number of tailoring is already available for this, for many
> languages,
> > > > and also in use in the marketplace, and readily available as open
> source
> > > > specifications.
> > > > It is many hundreds of specifications. I can come back with more
> > > > information.
> > > > But in the first place I can refer to the glibc specifications and
> the
> > > > CLDR collections
> > > > of sorting specifications. They both build on the ISO 14651 template.
> > > >
> > > > So I think a normative reference to ISO 14651 is very relevant, as
> this is
> > > > what is done
> > > > in the marketplace, and then also a mention of glibc and CLDR sorting
> > > > could be mentioned
> > > > in the bibliography.
> > > >
> > > > Best regards
> > > > keld
> > > >
> > > > On Tue, Jul 18, 2017 at 09:43:00PM +0900, MURATA Makoto wrote:
> > > > > ISO/IEC 14651 defines a reference comparison method and a common
> > > > > template table for ordering text data.  It is necessary to tailor
> the
> > > > > common template table for a given language's ordering.  For
> example,
> > > > > the order of CJK ideographic characters is based on UCS code
> points,
> > > > > and thus look meaningless to human CJK users.
> > > > >
> > > > > Meanwhile, Japanese standard (JIS X 4061:1996) defines ordering of
> > > > > Japanese text.  JIS X 4061 is NOT based on ISO/IEC 14651.  Its
> > > > > ordering is based on code points in JIS X 0208 and X0201 (which is
> > > > > roughly equal to US-ASCII).  This ordernig makes sense for Japanese
> > > > > users, since code points in these standards are based on the
> Japanese
> > > > > alphabetical order of the kana transcription of each Kanji.
> > > > >
> > > > > I do not believe that referencing ISO/IEC 14651 from OOXML is
> useful,
> > > > > unless we provide our own variation of the common template table.
> > > > > This variation should cover ordering in JIS X 0208.  I suppose
> that it
> > > > > should also cover orderings in mailing China, Taiwan, HongKong, and
> > > > > and many areas.  It might be technically possible to document do
> so,
> > > > > but I do not think that it is practically possible.
> > > > >
> > > > > Regards,
> > > > > Makoto
> > > > >
> > > > > 2016-12-21 6:37 GMT+09:00 Francis Cave <francis at franciscave.com>:
> > > > >
> > > > > >
> > > > > > Hi Keld
> > > > > >
> > > > > > Thanks for your suggestions. We could certainly consider passing
> these
> > > > on
> > > > > > to
> > > > > > the DR submitter as a possible approach to implementing the sort
> > > > method for
> > > > > > special characters. Whether or not they would match how existing
> > > > > > implementers have tackled this issue is for those implementers
> to say,
> > > > > > should they choose to do so.
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Francis
> > > > > >
> > > > > >
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > Sent: 20 December 2016 19:24
> > > > > > To: Francis Cave <francis at franciscave.com>
> > > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> characters
> > > > [for
> > > > > > today's call]
> > > > > >
> > > > > > Hi Francis
> > > > > >
> > > > > > I understand that there are 3 valid values for sortMethod,
> Strokes,
> > > > PinYin
> > > > > > and none.
> > > > > > The DR asks for what sorting to use for special characters.
> > > > > > I propose to use the one defined by the null tailoring of ISO
> 14651.
> > > > > >
> > > > > > As strings can have all UCS charcters in it, the ordering just
> > > > mentioned
> > > > > > fits that bill nicely as it is defined on all UCS characters.
> ISO 14651
> > > > > > also
> > > > > > addresses other codesets than UCS, if that is relevant.
> > > > > >
> > > > > > It would make a difference on the first accent, all other
> characters
> > > > > > considered equivalent, corresponding to the expected ordering in
> > > > English
> > > > > > and
> > > > > > many other languages.
> > > > > >
> > > > > > Maybe one could use extLst to further describe the sorting
> method. I
> > > > would
> > > > > > advise to use normal internationalisation mechanisms, such as
> given by
> > > > a
> > > > > > locale, as this is a well-known concept, and readily available
> in most
> > > > > > operating systems. This is not a new feature, so there is no
> need to
> > > > have
> > > > > > an
> > > > > > amendment or revision of the standard, it could be handled by a
> DR with
> > > > > > some
> > > > > > guidance on how to do it.
> > > > > >
> > > > > > Some suggested advice could be: if the extList is empty, then the
> > > > > > associated
> > > > > > locale of the current environment should be used.
> > > > > > Hereby I mean the language  setting for the text in question, it
> could
> > > > be
> > > > > > an
> > > > > > English part of a Spanish document.
> > > > > >
> > > > > > extLst could also be a name, in that case it should be the name
> of an
> > > > > > implementation-defined locale of the operating system, of which
> the
> > > > sorting
> > > > > > spec is to be used.
> > > > > >
> > > > > > I don't know if a new DR is needed or this could be part of the
> answer
> > > > to
> > > > > > DR-16-0018.
> > > > > >
> > > > > > Best regards
> > > > > > keld
> > > > > >
> > > > > > On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> > > > > > >
> > > > > > > HI Keld
> > > > > > >
> > > > > > > My main concern is whether the sort method, if specified,
> affects
> > > > what
> > > > > > > the user sees when they open a document. Suppose that, for the
> sake
> > > > of
> > > > > > > argument, the default sort method of implementation A sorts
> accented
> > > > > > > text according to the "normal" approach (first accent
> difference
> > > > > > > determines the order), while implementation B sorts accented
> text
> > > > > > > according to the "French" approach (last accent difference
> determines
> > > > > > > the order). Suppose a spreadsheet is created by implementation
> A and
> > > > > > > has an auto-filter applied that sorts according to its default
> sort
> > > > > > > method.  If this spreadsheet is subsequently opened by
> implementation
> > > > > > > B, will it appear the same as if it was re-opened by
> implementation
> > > > A,
> > > > > > > or will implementation B's default sort method be automatically
> > > > > > > applied? My guess is that generally the latter will be the
> case, but
> > > > this
> > > > > > is obviously implementation-dependent.
> > > > > > >
> > > > > > > I suspect that this is a case where implementations must be
> free to
> > > > > > > choose alternative approaches that aren't fully interoperable.
> Both
> > > > > > > implementations A and B in my example will have had good
> reasons for
> > > > > > > choosing different sort methods, e.g. based upon market
> demands.
> > > > > > >
> > > > > > > It might be nice if the sort method were spelt out in the
> document,
> > > > > > > but this would definitely be a new feature. For now, in
> response to
> > > > DR
> > > > > > > 16-0018, I think we should simply ensure that the
> specification is
> > > > > > > consistent with the schema and is clear about what are meant
> by the
> > > > > > > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and
> 'stroke'.
> > > > > > >
> > > > > > > Interestingly, I note that the content model of sortState
> includes
> > > > the
> > > > > > > application-defined extension element extLst, which could be
> used to
> > > > > > > specify the sort state in more detail, using MCE. So, in
> theory, it
> > > > > > > would be possible to define an extension to OOXML without
> having to
> > > > > > > amend the base standard. However, I'm not sure that there'd be
> > > > sufficient
> > > > > > demand for this.
> > > > > > >
> > > > > > > Kind regards,
> > > > > > >
> > > > > > > Francis
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > > Sent: 20 December 2016 09:37
> > > > > > > To: Francis Cave <francis at franciscave.com>
> > > > > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > > > > <e-SC34-WG4 at ecma-international.org>
> > > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> characters
> > > > > > > [for today's call]
> > > > > > >
> > > > > > > Hi Francis
> > > > > > >
> > > > > > > Yes, I understand that new requirements need an amendment or
> > > > revision.
> > > > > > >
> > > > > > > However, this is not what I recommend as the immediate
> solution to
> > > > > > > sorting special characters. I just propose to use the ISO
> > > > 14651/UTS#10
> > > > > > > kind of default spec.
> > > > > > > And no locale choice.
> > > > > > >
> > > > > > > Best regards
> > > > > > > keld
> > > > > > >
> > > > > > > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > > > > > > >
> > > > > > > > Try again...
> > > > > > > >
> > > > > > > > In the first paragraph, for "cannot be done" read "can only
> be
> > > > done".
> > > > > > > > Clearly a serious finger malfunction...
> > > > > > > >
> > > > > > > > Francis
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > > > Sent: 19 December 2016 16:41
> > > > > > > > To: keld at keldix.com; 'MURATA Makoto' <
> eb2m-mrt at asahi-net.or.jp>
> > > > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > > > > characters [for today's call]
> > > > > > > >
> > > > > > > >
> > > > > > > > In the first paragraph or "cannot be done" ready "can only be
> > > > done".
> > > > > > > Sorry!
> > > > > > > >
> > > > > > > > Francis
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > > > > Sent: 19 December 2016 16:33
> > > > > > > > To: keld at keldix.com; 'MURATA Makoto' <
> eb2m-mrt at asahi-net.or.jp>
> > > > > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > > > > characters [for today's call]
> > > > > > > >
> > > > > > > >
> > > > > > > > Keld
> > > > > > > >
> > > > > > > > You may or may not be correct in your view that there is a
> user
> > > > > > > > requirement that fields be sortable according to the current
> > > > locale.
> > > > > > > > However, if there is such a user requirement, it isn't
> currently
> > > > met
> > > > > > > > by OOXML or by implementations. Support for specifying the
> current
> > > > > > > > locale as the sort method would involve an extension to
> OOXML and
> > > > as
> > > > > > > > such cannot be done by amendment or revision of the
> standard, not
> > > > by
> > > > > > > > Technical Corrigendum, so this is a big deal. Although this
> might
> > > > be
> > > > > > > > a relatively simple change, e.g. by changing ST_SortMethod
> to allow
> > > > > > > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could not
> be
> > > > made
> > > > > > > > mandatory for implementations to support these new values
> without
> > > > > > > > breaking existing implementations, and if they do not
> already do
> > > > so,
> > > > > > > > that suggests that there hasn't be much market pressure that
> would
> > > > > > > persuade them to implement the enhancement.
> > > > > > > >
> > > > > > > > I note that ODF doesn't appear to have this feature either
> > > > (although
> > > > > > > > it has a similar feature ??? §19.865 text:sort-algorithm ???
> for
> > > > sorting
> > > > > > > > text,
> > > > > > > e.g.
> > > > > > > > bibliographies), and in §19.685 table:order there is the
> following
> > > > > > note:
> > > > > > > >
> > > > > > > >     Note: Sorting is locale and implementation-dependent.
> > > > > > > >
> > > > > > > > It is hard to avoid the conclusion that demand for this
> feature is
> > > > > > > > very limited, at least in office document applications.
> > > > > > > >
> > > > > > > > Francis
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > > > > Sent: 18 December 2016 20:19
> > > > > > > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > > > > > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > > > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > > > > > > characters [for today's call]
> > > > > > > >
> > > > > > > > I am not aware of the reason why this is so.
> > > > > > > >
> > > > > > > > But anyway, why not then use the 14651 tailorable ordering
> in its
> > > > > > > > template form, which is equivalent to UTS#10 - as the
> universal
> > > > > > > > sorting in
> > > > > > > OOXML?
> > > > > > > >
> > > > > > > > I would think it was a user requirement that fields are
> sortable
> > > > > > > > according to the current locale, eg a list of names.
> > > > > > > >
> > > > > > > > Having the sorting order not being changeable creates
> troubles for
> > > > > > > > users too.
> > > > > > > >
> > > > > > > > best regards
> > > > > > > > keld
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto
> wrote:
> > > > > > > > > The sort order of Excel cannot be changed without causing
> > > > troubles
> > > > > > > > > to users.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Makoto
> > > > > > > > >
> > > > > > > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > > > > > > >
> > > > > > > > > > I propose that you use the locale of the current
> process, and
> > > > > > > > > > the implied sorting sequence for special cheracters
> there, Or
> > > > at
> > > > > > > > > > least the sorting specified in ISO/IEC
> > > > > > > > > > 14651
> > > > > > > > > > or the equivalent Unicode specifcation.
> > > > > > > > > >
> > > > > > > > > > best regards
> > > > > > > > > > keld Simonsen
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > > > > > >
> > > > > > > > > Makoto
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > >
> > > > > Makoto
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Praying for the victims of the Japan Tohoku earthquake
> > >
> > > Makoto
>



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20170719/3c126240/attachment-0001.html>


More information about the sc34wg4 mailing list