RE: DR 09-0045 ― WML, Fonts: Character encodings of font names

Wed Oct 27 16:14:42 CEST 2010

I'm not sure that I understand what is meant by "to localize the file
systems". Is this clear to others, or is some more wordsmithing needed here?

Francis

> -----Original Message-----
> From: mpsuzuki at hiroshima-u.ac.jp [mailto:mpsuzuki at hiroshima-u.ac.jp]
> Sent: 27 October 2010 15:08
> To: Chris Rae
> Cc: e-SC34-WG4 at ecma-international.org; mmurata at japan.email.ne.jp;
> mpsuzuki at hiroshima-u.ac.jp; eb2m-mrt at asahi-net.or.jp
> Subject: Re: DR 09-0045 ― WML, Fonts: Character encodings of font names
>
> Dear Chris,
>
> The proposed change is basically OK. But about file system
> restriction, I propose to small change:
>
> replacing
>
> 	", and is often not supported for file names in file systems"
>
> by
>
> 	", and often different character encodings are used to
> 	 localize the file systems"
>
> --
>
> The background is following:
>
>
> * ISO/IEC 14496-22 cannot hold UTF-8 or UTF-16 string in it,
>   it is true, and worthful to note, because popular XML
>   is coded by UTF-8 or UTF-16.
>
> * About the file systems:
>
> - it would be true that many application confuse the file
>   name of the font file and the font family or face name
>   written in the font file.
>
> - and, even if a system handles a file system with UTF-8
>   capability, it is not so irregular to store the font with
>   with localized file name in legacy character encoding.
>    (e.g. Microsoft Windows for Japanese market uses
>     Microsoft Codepage 932 for small removable devices
>     like flash memory cards by default, for back compatibility)
>
> - so, it is good idea to note that the assumption of UTF-8
>   pathname of the font is not recommended.
>
> Regards,
> mpsuzuki
>
> On Mon, 4 Oct 2010 23:49:56 +0000
> Chris Rae <Chris.Rae at microsoft.com> wrote:
>
> >Hi Suzuki-san - just getting back onto this now. I've made a reworked
> version incorporating your comments - what do you think of the
> attached?
> >
> >Regarding guiding people towards the right sort of character encoding
> - are you thinking about this only in terms of font name encoding, or
> are you meaning a wider span (i.e. encoding anything that has to go
> into an IS 29500 file)?
> >
> >Your thoughts,
> >
> >Chris
> >
> >-----Original Message-----
> >From: suzuki toshiya [mailto:mpsuzuki at hiroshima-u.ac.jp]
> >Sent: 06 September 2010 21:57
> >To: Chris Rae
> >Cc: e-SC34-WG4 at ecma-international.org; MURATA Makoto
> (mmurata at japan.email.ne.jp)
> >Subject: Re: DR 09-0045 ― WML, Fonts: Character encodings of font
> names
> >
> >Dear Chris,
> >
> >Thank you for quick drafting.
> >I think it's far better than the original version, but I want to
> clarify following points clearer:
> >1) font file (embedded or external) may not have UTF-8 font names
> >2) copying font names from font file is not recommended, if its
> character encoding is incompatible with XML part.
> >3) code conversion is usually required to handle such localized font
> names.
> >
> >I drafted as:
> >--------------------------------------------------------------
> >Localized font names stored in the embedded font resources or external
> font resources may be coded by the character encoding that is
> incompatible with the character encodings for XML parser.
> >To use these font names in the values of this attribute in the XML
> part, they should be converted to the character encoding fitting to XML
> parser. Copying raw byte sequences from font files should be avoided.
> [Note: ISO/IEC 14496-22:2007 does not permit to store UTF-8 font name
> in the font file. end note]
> >--------------------------------------------------------------
> >Please give me comment.
> >
> >I want to ask about the question about "converted to the character
> encoding". What I'm thinking is a conversion from "a Kanji expressed by
> a Shift-JIS codepoint" to "a Kanji expressed by a Unicode codepoint".
> >
> >I want to exclude ASCII-fy approach converting "字" (ShiftJIS = 8E9A,
> >UCS2 = U+5B57) to "x8Ex9A" (worst) or "x5Bx57" (worse) etc. This is
> because restoring such "escaped byte sequence" to original font name is
> not easy for most font management systems (they check the existence of
> "x8Ex9A", if not found, try "U+8E9A", if not found, ...).
> >Is there good terminology to exclude such conversion?
> >
> >Regards,
> >suzuki toshiya
> >
> >
> >Chris Rae wrote (2010/09/07 13:19):
> >> http://cid-
> c8ba0861dc5e4adc.office.live.com/view.aspx/Public%20Documen
> >> ts/2009/DR-09-0045.docx
> >>
> >> I've updated the wording to incorporate a note explaining that
> intra-font names will often need alterations to appear in XML. Suzuki-
> san and Murata-san, does this look reasonable?
> >>
> >> Changes attached.
> >>
> >> Chris
> >
> >