DR 09-0040: WML/DML: Complex scripts

Arms, Caroline caar at loc.gov
Thu Feb 10 15:17:37 CET 2011


Small typo within 17.3.2.26 could be fixed at same time.

In table of attributes, under asciiTheme, "thisrange" needs to be "this range"

   Caroline

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource
http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **
________________________________________
From: Arms, Caroline [caar at loc.gov]
Sent: Thursday, February 10, 2011 8:41 AM
To: Chris Rae; MURATA Makoto (FAMILY Given)
Cc: e-SC34-WG4 at ecma-international.org
Subject: RE: DR 09-0040: WML/DML: Complex scripts

A few small things noted:

I find "this element" ambiguous.  In particular, deleting the paragraph just after the list of font slots appears to me to change the meaning of "this element" -- before it related to <cs>, now to <rFonts>.  Which is right?

In the table, you use "ascii" and "Ascii" inconsistently.  Unless there is a reason not, to shouldn't we use "ASCII" throughout.

Talking of ASCII, I'm puzzled at the emphasis on the scope of ASCII (i.e., the first 128 Unicode code points), when the table uses "ascii" for ranges outside basic ASCII.

In several places, "Unicode" is used, but I think it should be "Unicode code point" or "Unicode character value"

These points are somewhat editorial.  I'm still trying to figure out if I think the substance is clear.

     Caroline

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource
http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **
________________________________________
From: Chris Rae [Chris.Rae at microsoft.com]
Sent: Monday, January 24, 2011 1:33 PM
To: MURATA Makoto (FAMILY Given)
Cc: Arms, Caroline; e-SC34-WG4 at ecma-international.org
Subject: RE: DR 09-0040: WML/DML: Complex scripts

Hello all - attached is another version of my proposed response to this DR, hopefully addressing Murata-san's concerns.

The changes are:

* I've switched from a four-step algorithm to a two-step one (where the second step has a couple of paths). I think this makes it a little easier to follow, and I agree with Murata-san that it wasn't really four steps before.
* I've clarified all the various references to "hint" and "eastAsia" - as you surmise, these are values for the w:hint attribute. I've also made clearer the link between the first step and the second. Or, at least, I hope I have.
* Regarding <w:cs/> and <w:rtl/>, there are two different things here and they weren't made very clear by my original document. The methodology is to determine which font slot should be used for contents in a run. After that, if the content is determined by this methodology to be Complex Script font, then the complex script formatting elements, such as bCs (§17.3.2.2), iCs (§17.3.2.17), and szCs (§17.3.2.39), should affect the content. Otherwise, the non-complex script formatting elements, such as b (§17.3.2.1), i (§17.3.2.16), and sz (§17.3.2.38), should affect the content. I've tried to make that a little clearer in the text. This should also answer the main question in the original DR.

Chris

-----Original Message-----
From: MURATA Makoto (FAMILY Given) [mailto:eb2m-mrt at asahi-net.or.jp]
Sent: 19 January 2011 05:36
To: Chris Rae
Cc: Arms, Caroline; e-SC34-WG4 at ecma-international.org
Subject: Re: DR 09-0040: WML/DML: Complex scripts

> Attached is a further-updated version of this with some text changes from Jesper.
>Please note that these were grammatical/editorial changes only - none
>of the meaning of the document is different.

I am afraid that I still do not understand.

First, I do not think that the proposed solution uses a "four-step methodology".  To me, it is a two-step algorithm, where the first step classifies character contents while the second step derives font slots from classifications.

The first step "decide(s) the classification of the content, based on its Unicode" AS WELL AS A HINT.  This classification contains Ascii, hAnsi, and eastAsia.  What is "hint" here?  The value of the w:hint attribute?  (BTW, <w:hint="eastAsia"> in the second item is not XML.)

The second step decides the font slot from the classification computed in the first step.  (If I am not mistaken, this step never introduces
"latin".)

The interaction between the first item and second item in the itemized list looks unclear.  Both use hint.  Are they talking about the same thing?

The third item mentions <w:cs/> and <w:rtl/>.  The original JP DR mentioned more elements.  Are they not covered by the "four-step methodology"?

Part 1, §17.3.2.2, "bCs (Complex Script Bold)", p. 281 Part 1, §17.3.2.7, "cs (Use Complex Script Formatting on Run)", p.289 Part 1, §17.3.2.17, "iCs (Complex Script Italics)", p. 304 Part 1, §17.3.2.20, "lang (Languages for Run Content)", p. 304 Part 1, §17.3.2.39, "szCs (Complex Script Font Size)", p. 335 Part 1, §21.1.2.3.1, "cs (Complex Script Font)", p. 3596 Part 1, §21.1.2.3.3, "ea (East Asian Font)", p. 3605


Caroline wrote:

> I have no doubt that the "algorithm" addresses issues raised by this
> DR, but it's not clear to me that the precise question has been
> answered directly.  The DR mentions several specific sections that use
> the phrases "complex script characters" or "complex script contents
> [of a run]" with apparently different interpretations for the range of
> characters included.  Your proposed change to 17.3.2.26 certainly
> doesn't address that directly.

I agree.  I think that for each of the following  sub-clauses (mentioned above), we should show how we use font slots for the grouping used in that sub-clause.

Cheers,
Makoto


More information about the sc34wg4 mailing list