DR 09-0040: a small test to clarify "chinese language"

Thu Sep 1 23:08:47 CEST 2011

Hello Suzuki-san; WG4.

I've been investigating this question regarding DR 09-0040 and I now have some further information. I am of the belief that the language component of the language element (eg. the "zh" of "zh-CN") should be the determinant of the font slot, and not the region component (e.g. "CN"). I've updated my draft resolution of DR 09-0040 to reflect that and will circulate it this week.

As Suzuki-san observes, however, this is not what Microsoft Word does. It regards many combinations (zh-CN, zh-TW, etc) as Chinese layout but for others it does not (e.g. zh-MY). I believe this to be an implementation error. Internally, we work from a manually created lookup list of complete language codes and certain unusual combinations simply are not in that list and therefore do not have the classification logic applied. I don't think this is the correct behaviour, and I don't think it should be codified in the standard. Instead, the standard should contain the correct logic (i.e. to base text layout off the language and not the region) and Microsoft should document their differing logic in their implementer notes.

Suzuki-san, would it be an acceptable solution for you if we modified the standard to use language only, and I would get Microsoft to document our differing behaviour in implementer notes?

Chris

-----Original Message-----
From: suzuki toshiya [mailto:mpsuzuki at hiroshima-u.ac.jp] 
Sent: 16 May 2011 20:38
To: e-SC34-WG4 at ecma-international.org
Cc: Chris Rae
Subject: DR 09-0040: a small test to clarify "chinese language"

Dear Chris,

Sorry for my lated action, and thank you for comment about my figure about the font slot selection algorithm. If anybody has a comment to improve the figure for the official inclusion, please give me.

In my comment posted for DR 09-0040, I requested a clarification of the coverage about Chinese locale that changes the accented Latin for PinYin; "zh-CN" and "zh-TW" will be included but how about "zh-HK", "zh-MO", "zh-SG" etc etc.

Attached docx is a testing document repeating a text with various locale specification:

<w:p>
 <w:pPr>
  <w:r w:rsidRPr="008B6412">
   <w:rPr>
    <w:rFonts w:ascii="Courier New" w:eastAsia="SimSun-18030" w:hAnsi="Arial" w:cs="Arabic Typesetting" w:hint="default"/>
    <w:sz w:val="16"/>
    <w:szCs w:val="16"/>
    <w:lang w:val="en-US" w:eastAsia="en-US" w:bidi="en-US"/>
   </w:rPr>
   <w:t>(en-US,hint=default)  Hànyǔ Pīnyīn,uüÜ,úǘǗ‘’‚‛“”„…()[]ΑΒΓЀЁЂ</w:t>
  </w:r>
 </w:pPr>
</w:p>

3 different settings of <w:lang> elements are tested:
    <w:lang w:val="zh-CN" w:eastAsia="en-US" w:bidi="en-US"/>  (questionable usage)
    <w:lang w:val="en-US" w:eastAsia="zh-CN" w:bidi="en-US"/>  (designed usage)
    <w:lang w:val="en-US" w:eastAsia="en-US" w:bidi="zh-CN"/>  (maybe invalid usage) and various around east/south-east asian territories (CN, TW, HK, MO, SG, and JP, KR, MY, TH, VN) are tested.

From the result, the Chinese language is meaningful only when it is set to eastAsia attribute, and the coverage of Chinese languages would be defined some registered list:
zh-CN, zh-TW, zh-HK, zh-MO, zh-SG are valid, but other combinations with different territories (like zh-MY) are invalid.
I wish if the list of registered Chinese language is provided in the spec, even if it is classified as non-normative. In Bidi support clarification, the list of registered languages would be helpful too (e.g. w:bidi=ar-IQ, or =fa-IR makes the paragraph to right-to-left mode, but ar-IR cannot).

Regards,
suzuki toshiya, Hiroshima University, Japan