DR-16-0018: WML: need sort method for special characters [for today's call]

keld at keldix.com keld at keldix.com
Tue Dec 20 20:24:18 CET 2016


Hi Francis

I understand that there are 3 valid values for sortMethod, Strokes, PinYin and none.
The DR asks for what sorting to use for special characters.
I propose to use the one defined by the null tailoring of ISO 14651.

As strings can have all UCS charcters in it, the ordering just mentioned fits that
bill nicely as it is defined on all UCS characters. ISO 14651 also addresses other codesets than UCS,
if that is relevant.

It would make a difference on the first accent, all other characters considered
equivalent, corresponding to the expected ordering in English and many other languages.

Maybe one could use extLst to further describe the sorting method. I would advise to use
normal internationalisation mechanisms, such as given by a locale, as this is a well-known
concept, and readily available in most operating systems. This is not a new feature,
so there is no need to have an amendment or revision of the standard, it could
be handled by a DR with some guidance on how to do it. 

Some suggested advice could be: if the extList is empty, then the 
associated locale of the current environment should be used.
Hereby I mean the language  setting for the text in question, it could be an English
part of a Spanish document.

extLst could also be a name, in that case it should be the name of an implementation-defined 
locale of the operating system, of which the sorting spec is to be used.

I don't know if a new DR is needed or this could be part of the answer to DR-16-0018.

Best regards
keld

On Tue, Dec 20, 2016 at 04:09:41PM -0000, Francis Cave wrote:
> 
> HI Keld
> 
> My main concern is whether the sort method, if specified, affects what the
> user sees when they open a document. Suppose that, for the sake of argument,
> the default sort method of implementation A sorts accented text according to
> the "normal" approach (first accent difference determines the order), while
> implementation B sorts accented text according to the "French" approach
> (last accent difference determines the order). Suppose a spreadsheet is
> created by implementation A and has an auto-filter applied that sorts
> according to its default sort method.  If this spreadsheet is subsequently
> opened by implementation B, will it appear the same as if it was re-opened
> by implementation A, or will implementation B's default sort method be
> automatically applied? My guess is that generally the latter will be the
> case, but this is obviously implementation-dependent. 
> 
> I suspect that this is a case where implementations must be free to choose
> alternative approaches that aren't fully interoperable. Both implementations
> A and B in my example will have had good reasons for choosing different sort
> methods, e.g. based upon market demands.
> 
> It might be nice if the sort method were spelt out in the document, but this
> would definitely be a new feature. For now, in response to DR 16-0018, I
> think we should simply ensure that the specification is consistent with the
> schema and is clear about what are meant by the existing values of
> ST_SortMethod, i.e. 'none', 'pinYin' and 'stroke'.
> 
> Interestingly, I note that the content model of sortState includes the
> application-defined extension element extLst, which could be used to specify
> the sort state in more detail, using MCE. So, in theory, it would be
> possible to define an extension to OOXML without having to amend the base
> standard. However, I'm not sure that there'd be sufficient demand for this.
> 
> Kind regards,
> 
> Francis
> 
> 
> 
> -----Original Message-----
> From: keld at keldix.com [mailto:keld at keldix.com] 
> Sent: 20 December 2016 09:37
> To: Francis Cave <francis at franciscave.com>
> Cc: 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>; 'SC 34 WG4'
> <e-SC34-WG4 at ecma-international.org>
> Subject: Re: DR-16-0018: WML: need sort method for special characters [for
> today's call]
> 
> Hi Francis
> 
> Yes, I understand that new requirements need an amendment or revision.
> 
> However, this is not what I recommend as the immediate solution to sorting
> special characters. I just propose to use the ISO 14651/UTS#10 kind of
> default spec.
> And no locale choice. 
> 
> Best regards
> keld
> 
> On Mon, Dec 19, 2016 at 05:27:25PM -0000, Francis Cave wrote:
> > 
> > Try again...
> > 
> > In the first paragraph, for "cannot be done" read "can only be done".
> > Clearly a serious finger malfunction...
> > 
> > Francis
> > 
> > 
> > 
> > -----Original Message-----
> > From: Francis Cave [mailto:francis at franciscave.com]
> > Sent: 19 December 2016 16:41
> > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > Subject: RE: DR-16-0018: WML: need sort method for special characters 
> > [for today's call]
> > 
> > 
> > In the first paragraph or "cannot be done" ready "can only be done".
> Sorry!
> > 
> > Francis
> > 
> > 
> > 
> > -----Original Message-----
> > From: Francis Cave [mailto:francis at franciscave.com]
> > Sent: 19 December 2016 16:33
> > To: keld at keldix.com; 'MURATA Makoto' <eb2m-mrt at asahi-net.or.jp>
> > Cc: 'SC 34 WG4' <e-SC34-WG4 at ecma-international.org>
> > Subject: RE: DR-16-0018: WML: need sort method for special characters 
> > [for today's call]
> > 
> > 
> > Keld
> > 
> > You may or may not be correct in your view that there is a user 
> > requirement that fields be sortable according to the current locale. 
> > However, if there is such a user requirement, it isn't currently met 
> > by OOXML or by implementations. Support for specifying the current 
> > locale as the sort method would involve an extension to OOXML and as 
> > such cannot be done by amendment or revision of the standard, not by 
> > Technical Corrigendum, so this is a big deal. Although this might be a 
> > relatively simple change, e.g. by changing ST_SortMethod to allow any 
> > string (§18.18.73 of ISO/IEC 29500-1:2016), it could not be made 
> > mandatory for implementations to support these new values without 
> > breaking existing implementations, and if they do not already do so, 
> > that suggests that there hasn't be much market pressure that would
> persuade them to implement the enhancement.
> > 
> > I note that ODF doesn't appear to have this feature either (although 
> > it has a similar feature – §19.865 text:sort-algorithm – for sorting text,
> e.g.
> > bibliographies), and in §19.685 table:order there is the following note:
> > 
> > 	Note: Sorting is locale and implementation-dependent.
> > 
> > It is hard to avoid the conclusion that demand for this feature is 
> > very limited, at least in office document applications.
> > 
> > Francis
> > 
> > 
> > 
> > -----Original Message-----
> > From: keld at keldix.com [mailto:keld at keldix.com]
> > Sent: 18 December 2016 20:19
> > To: MURATA Makoto <eb2m-mrt at asahi-net.or.jp>
> > Cc: SC 34 WG4 <e-SC34-WG4 at ecma-international.org>
> > Subject: Re: DR-16-0018: WML: need sort method for special characters 
> > [for today's call]
> > 
> > I am not aware of the reason why this is so.
> > 
> > But anyway, why not then use the 14651 tailorable ordering in its 
> > template form, which is equivalent to UTS#10 - as the universal sorting in
> OOXML?
> > 
> > I would think it was a user requirement that fields are sortable 
> > according to the current locale, eg a list of names.
> > 
> > Having the sorting order not being changeable creates troubles for 
> > users too.
> > 
> > best regards
> > keld
> > 
> > 
> > On Mon, Dec 19, 2016 at 04:48:21AM +0900, MURATA Makoto wrote:
> > > The sort order of Excel cannot be changed without causing troubles 
> > > to users.
> > > 
> > > Regards,
> > > Makoto
> > > 
> > > 2016-12-08 3:17 GMT+09:00 Keld Simonsen <keld at keldix.com>:
> > > 
> > > > I propose that you use the locale of the current process, and the 
> > > > implied sorting sequence for special cheracters there, Or at least 
> > > > the sorting specified in ISO/IEC
> > > > 14651
> > > > or the equivalent Unicode specifcation.
> > > >
> > > > best regards
> > > > keld Simonsen
> > > >
> > > 
> > > 
> > > 
> > > --
> > > 
> > > Praying for the victims of the Japan Tohoku earthquake
> > > 
> > > Makoto
> > 
> > 
> > 
> > 
> > 
> 


More information about the sc34wg4 mailing list