DR 09-0045 ― WML, Fonts: Character encodings of font names

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Wed Oct 27 16:07:32 CEST 2010


Dear Chris,

The proposed change is basically OK. But about file system
restriction, I propose to small change:

replacing

	", and is often not supported for file names in file systems"

by

	", and often different character encodings are used to
	 localize the file systems"

--

The background is following:


* ISO/IEC 14496-22 cannot hold UTF-8 or UTF-16 string in it,
  it is true, and worthful to note, because popular XML
  is coded by UTF-8 or UTF-16.

* About the file systems:

- it would be true that many application confuse the file
  name of the font file and the font family or face name
  written in the font file.

- and, even if a system handles a file system with UTF-8
  capability, it is not so irregular to store the font with
  with localized file name in legacy character encoding.
   (e.g. Microsoft Windows for Japanese market uses
    Microsoft Codepage 932 for small removable devices
    like flash memory cards by default, for back compatibility)

- so, it is good idea to note that the assumption of UTF-8
  pathname of the font is not recommended.

Regards,
mpsuzuki

On Mon, 4 Oct 2010 23:49:56 +0000
Chris Rae <Chris.Rae at microsoft.com> wrote:

>Hi Suzuki-san - just getting back onto this now. I've made a reworked version incorporating your comments - what do you think of the attached?
>
>Regarding guiding people towards the right sort of character encoding - are you thinking about this only in terms of font name encoding, or are you meaning a wider span (i.e. encoding anything that has to go into an IS 29500 file)?
>
>Your thoughts,
>
>Chris
>
>-----Original Message-----
>From: suzuki toshiya [mailto:mpsuzuki at hiroshima-u.ac.jp] 
>Sent: 06 September 2010 21:57
>To: Chris Rae
>Cc: e-SC34-WG4 at ecma-international.org; MURATA Makoto (mmurata at japan.email.ne.jp)
>Subject: Re: DR 09-0045 ― WML, Fonts: Character encodings of font names
>
>Dear Chris,
>
>Thank you for quick drafting.
>I think it's far better than the original version, but I want to clarify following points clearer:
>1) font file (embedded or external) may not have UTF-8 font names
>2) copying font names from font file is not recommended, if its character encoding is incompatible with XML part.
>3) code conversion is usually required to handle such localized font names.
>
>I drafted as:
>--------------------------------------------------------------
>Localized font names stored in the embedded font resources or external font resources may be coded by the character encoding that is incompatible with the character encodings for XML parser.
>To use these font names in the values of this attribute in the XML part, they should be converted to the character encoding fitting to XML parser. Copying raw byte sequences from font files should be avoided. [Note: ISO/IEC 14496-22:2007 does not permit to store UTF-8 font name in the font file. end note]
>--------------------------------------------------------------
>Please give me comment.
>
>I want to ask about the question about "converted to the character encoding". What I'm thinking is a conversion from "a Kanji expressed by a Shift-JIS codepoint" to "a Kanji expressed by a Unicode codepoint".
>
>I want to exclude ASCII-fy approach converting "字" (ShiftJIS = 8E9A,
>UCS2 = U+5B57) to "x8Ex9A" (worst) or "x5Bx57" (worse) etc. This is because restoring such "escaped byte sequence" to original font name is not easy for most font management systems (they check the existence of "x8Ex9A", if not found, try "U+8E9A", if not found, ...).
>Is there good terminology to exclude such conversion?
>
>Regards,
>suzuki toshiya
>
>
>Chris Rae wrote (2010/09/07 13:19):
>> http://cid-c8ba0861dc5e4adc.office.live.com/view.aspx/Public%20Documen
>> ts/2009/DR-09-0045.docx
>> 
>> I've updated the wording to incorporate a note explaining that intra-font names will often need alterations to appear in XML. Suzuki-san and Murata-san, does this look reasonable?
>> 
>> Changes attached.
>> 		
>> Chris
>
>


More information about the sc34wg4 mailing list