DR-16-0018: WML: need sort method for special characters [for today's call]

MURATA Makoto eb2m-mrt at asahi-net.or.jp
Wed Jul 19 11:07:06 CEST 2017


But in Japan, we did not rewrite JIS X 4061 on the basis of ISO 14651.
I do not know if China has created their tailoring of 14651.  I do not
think
that MS Excel relies on 14651.

I agree that 14651 is good for future projects.  But I do not
think that it is practically possible to document what MS Excel
does using 14651.

Regards,
Makoto

2017-07-19 18:00 GMT+09:00 <keld at keldix.com>:

> Dear Makato-san
>
> I have asked a sorting expert, who is the editor of 14651, and he says that
> for Japanese and Chinese, there surely is a need for tailoring the sorting
> template
> in 14651. I also know that for my own language, Danish, a tailoring of
> 14651
> is required.
>
> A number of tailoring is already available for this, for many languages,
> and also in use in the marketplace, and readily available as open source
> specifications.
> It is many hundreds of specifications. I can come back with more
> information.
> But in the first place I can refer to the glibc specifications and the
> CLDR collections
> of sorting specifications. They both build on the ISO 14651 template.
>
> So I think a normative reference to ISO 14651 is very relevant, as this is
> what is done
> in the marketplace, and then also a mention of glibc and CLDR sorting
> could be mentioned
> in the bibliography.
>
> Best regards
> keld
>
> On Tue, Jul 18, 2017 at 09:43:00PM +0900, MURATA Makoto wrote:
> > ISO/IEC 14651 defines a reference comparison method and a common
> > template table for ordering text data.  It is necessary to tailor the
> > common template table for a given language's ordering.  For example,
> > the order of CJK ideographic characters is based on UCS code points,
> > and thus look meaningless to human CJK users.
> >
> > Meanwhile, Japanese standard (JIS X 4061:1996) defines ordering of
> > Japanese text.  JIS X 4061 is NOT based on ISO/IEC 14651.  Its
> > ordering is based on code points in JIS X 0208 and X0201 (which is
> > roughly equal to US-ASCII).  This ordernig makes sense for Japanese
> > users, since code points in these standards are based on the Japanese
> > alphabetical order of the kana transcription of each Kanji.
> >
> > I do not believe that referencing ISO/IEC 14651 from OOXML is useful,
> > unless we provide our own variation of the common template table.
> > This variation should cover ordering in JIS X 0208.  I suppose that it
> > should also cover orderings in mailing China, Taiwan, HongKong, and
> > and many areas.  It might be technically possible to document do so,
> > but I do not think that it is practically possible.
> >
> > Regards,
> > Makoto
> >
> > 2016-12-21 6:37 GMT+09:00 Francis Cave <francis at franciscave.com>:
> >
> > >
> > > Hi Keld
> > >
> > > Thanks for your suggestions. We could certainly consider passing these
> on
> > > to
> > > the DR submitter as a possible approach to implementing the sort
> method for
> > > special characters. Whether or not they would match how existing
> > > implementers have tackled this issue is for those implementers to say,
> > > should they choose to do so.
> > >
> > > Kind regards,
> > >
> > > Francis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > Sent: 20 December 2016 19:24
> > > To: Francis Cave <francis at franciscave.com>
> > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > <e-SC34-WG4 at ecma-international.org>
> > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> [for
> > > today's call]
> > >
> > > Hi Francis
> > >
> > > I understand that there are 3 valid values for sortMethod, Strokes,
> PinYin
> > > and none.
> > > The DR asks for what sorting to use for special characters.
> > > I propose to use the one defined by the null tailoring of ISO 14651.
> > >
> > > As strings can have all UCS charcters in it, the ordering just
> mentioned
> > > fits that bill nicely as it is defined on all UCS characters. ISO 14651
> > > also
> > > addresses other codesets than UCS, if that is relevant.
> > >
> > > It would make a difference on the first accent, all other characters
> > > considered equivalent, corresponding to the expected ordering in
> English
> > > and
> > > many other languages.
> > >
> > > Maybe one could use extLst to further describe the sorting method. I
> would
> > > advise to use normal internationalisation mechanisms, such as given by
> a
> > > locale, as this is a well-known concept, and readily available in most
> > > operating systems. This is not a new feature, so there is no need to
> have
> > > an
> > > amendment or revision of the standard, it could be handled by a DR with
> > > some
> > > guidance on how to do it.
> > >
> > > Some suggested advice could be: if the extList is empty, then the
> > > associated
> > > locale of the current environment should be used.
> > > Hereby I mean the language  setting for the text in question, it could
> be
> > > an
> > > English part of a Spanish document.
> > >
> > > extLst could also be a name, in that case it should be the name of an
> > > implementation-defined locale of the operating system, of which the
> sorting
> > > spec is to be used.
> > >
> > > I don't know if a new DR is needed or this could be part of the answer
> to
> > > DR-16-0018.
> > >
> > > Best regards
> > > keld
> > >
> > > On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> > > >
> > > > HI Keld
> > > >
> > > > My main concern is whether the sort method, if specified, affects
> what
> > > > the user sees when they open a document. Suppose that, for the sake
> of
> > > > argument, the default sort method of implementation A sorts accented
> > > > text according to the "normal" approach (first accent difference
> > > > determines the order), while implementation B sorts accented text
> > > > according to the "French" approach (last accent difference determines
> > > > the order). Suppose a spreadsheet is created by implementation A and
> > > > has an auto-filter applied that sorts according to its default sort
> > > > method.  If this spreadsheet is subsequently opened by implementation
> > > > B, will it appear the same as if it was re-opened by implementation
> A,
> > > > or will implementation B's default sort method be automatically
> > > > applied? My guess is that generally the latter will be the case, but
> this
> > > is obviously implementation-dependent.
> > > >
> > > > I suspect that this is a case where implementations must be free to
> > > > choose alternative approaches that aren't fully interoperable. Both
> > > > implementations A and B in my example will have had good reasons for
> > > > choosing different sort methods, e.g. based upon market demands.
> > > >
> > > > It might be nice if the sort method were spelt out in the document,
> > > > but this would definitely be a new feature. For now, in response to
> DR
> > > > 16-0018, I think we should simply ensure that the specification is
> > > > consistent with the schema and is clear about what are meant by the
> > > > existing values of ST_SortMethod, i.e. 'none', 'pinYin' and 'stroke'.
> > > >
> > > > Interestingly, I note that the content model of sortState includes
> the
> > > > application-defined extension element extLst, which could be used to
> > > > specify the sort state in more detail, using MCE. So, in theory, it
> > > > would be possible to define an extension to OOXML without having to
> > > > amend the base standard. However, I'm not sure that there'd be
> sufficient
> > > demand for this.
> > > >
> > > > Kind regards,
> > > >
> > > > Francis
> > > >
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > Sent: 20 December 2016 09:37
> > > > To: Francis Cave <francis at franciscave.com>
> > > > Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> > > > <e-SC34-WG4 at ecma-international.org>
> > > > Subject: Re: DR-16-0018: WML: need sort method for special characters
> > > > [for today's call]
> > > >
> > > > Hi Francis
> > > >
> > > > Yes, I understand that new requirements need an amendment or
> revision.
> > > >
> > > > However, this is not what I recommend as the immediate solution to
> > > > sorting special characters. I just propose to use the ISO
> 14651/UTS#10
> > > > kind of default spec.
> > > > And no locale choice.
> > > >
> > > > Best regards
> > > > keld
> > > >
> > > > On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > > > >
> > > > > Try again...
> > > > >
> > > > > In the first paragraph, for "cannot be done" read "can only be
> done".
> > > > > Clearly a serious finger malfunction...
> > > > >
> > > > > Francis
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > Sent: 19 December 2016 16:41
> > > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > characters [for today's call]
> > > > >
> > > > >
> > > > > In the first paragraph or "cannot be done" ready "can only be
> done".
> > > > Sorry!
> > > > >
> > > > > Francis
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Francis Cave [mailto:francis at franciscave.com]
> > > > > Sent: 19 December 2016 16:33
> > > > > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > > > > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > > > > Subject: RE: DR-16-0018: WML: need sort method for special
> > > > > characters [for today's call]
> > > > >
> > > > >
> > > > > Keld
> > > > >
> > > > > You may or may not be correct in your view that there is a user
> > > > > requirement that fields be sortable according to the current
> locale.
> > > > > However, if there is such a user requirement, it isn't currently
> met
> > > > > by OOXML or by implementations. Support for specifying the current
> > > > > locale as the sort method would involve an extension to OOXML and
> as
> > > > > such cannot be done by amendment or revision of the standard, not
> by
> > > > > Technical Corrigendum, so this is a big deal. Although this might
> be
> > > > > a relatively simple change, e.g. by changing ST_SortMethod to allow
> > > > > any string (§18.18.73 of ISO/IEC 29500-1:2016), it could not be
> made
> > > > > mandatory for implementations to support these new values without
> > > > > breaking existing implementations, and if they do not already do
> so,
> > > > > that suggests that there hasn't be much market pressure that would
> > > > persuade them to implement the enhancement.
> > > > >
> > > > > I note that ODF doesn't appear to have this feature either
> (although
> > > > > it has a similar feature ??? §19.865 text:sort-algorithm ??? for
> sorting
> > > > > text,
> > > > e.g.
> > > > > bibliographies), and in §19.685 table:order there is the following
> > > note:
> > > > >
> > > > >     Note: Sorting is locale and implementation-dependent.
> > > > >
> > > > > It is hard to avoid the conclusion that demand for this feature is
> > > > > very limited, at least in office document applications.
> > > > >
> > > > > Francis
> > > > >
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: keld at keldix.com [mailto:keld at keldix.com]
> > > > > Sent: 18 December 2016 20:19
> > > > > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > > > > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > > > > Subject: Re: DR-16-0018: WML: need sort method for special
> > > > > characters [for today's call]
> > > > >
> > > > > I am not aware of the reason why this is so.
> > > > >
> > > > > But anyway, why not then use the 14651 tailorable ordering in its
> > > > > template form, which is equivalent to UTS#10 - as the universal
> > > > > sorting in
> > > > OOXML?
> > > > >
> > > > > I would think it was a user requirement that fields are sortable
> > > > > according to the current locale, eg a list of names.
> > > > >
> > > > > Having the sorting order not being changeable creates troubles for
> > > > > users too.
> > > > >
> > > > > best regards
> > > > > keld
> > > > >
> > > > >
> > > > > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto wrote:
> > > > > > The sort order of Excel cannot be changed without causing
> troubles
> > > > > > to users.
> > > > > >
> > > > > > Regards,
> > > > > > Makoto
> > > > > >
> > > > > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > > > >
> > > > > > > I propose that you use the locale of the current process, and
> > > > > > > the implied sorting sequence for special cheracters there, Or
> at
> > > > > > > least the sorting specified in ISO/IEC
> > > > > > > 14651
> > > > > > > or the equivalent Unicode specifcation.
> > > > > > >
> > > > > > > best regards
> > > > > > > keld Simonsen
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Praying for the victims of the Japan Tohoku earthquake
> > > > > >
> > > > > > Makoto
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> >
> >
> > --
> >
> > Praying for the victims of the Japan Tohoku earthquake
> >
> > Makoto
>



-- 

Praying for the victims of the Japan Tohoku earthquake

Makoto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20170719/ef4007a7/attachment-0001.html>


More information about the sc34wg4 mailing list