DR 09-0061 _ Shared MLs, Shared Simple Types: Constrain ST_Panose value set

Tue Nov 23 20:13:19 CET 2010

Hi Suzuki-san/Murata-san - I would prefer option (2) or (3) as I think they better represent the state of affairs. Of those two I don't think I have a strong opinion - (2) might be easier to manage from a practical point of view (in that it doesn't matter as much if we forget to update it when the ISO/IEC 14496-22 group comes to a resolution).

Chris

-----Original Message-----
From: suzuki toshiya [mailto:mpsuzuki at hiroshima-u.ac.jp] 
Sent: 18 November 2010 19:45
To: Chris Rae
Cc: e-SC34-WG4 at ecma-international.org; MURATA Makoto
Subject: Re: DR 09-0061 _ Shared MLs, Shared Simple Types: Constrain ST_Panose value set

Dear Chris and Murata-san,

I could have exchanged a few mails with Vladimir Levantovsky, the engineer from MonoType and the chair of SC29/WG11, about the incompatibilities of Panose among genuine Panose spec, ISO/IEC 14496-22 (OpenType) spec, and Windows GDI spec.

The summary of his comments are:
* the exist of the incompatibility is agreed.
* this issue exists in the side of Windows GDI (and OOXML),
  ISO/IEC 14496-22/OpenType/Panose spec cannot solve it,
  and the requirement to update ISO/IEC 14496-22/OpenType
  spec is not recognized.
* the implementation note in OOXML may be appropriate.
* further discussion in public mailing list is recommended.
* the revise of genuine Panose spec should be asked to HP,
  not to MonoType (MonoType is just hosting the document).

My personal opinion is that current OpenType spec (and its ISO version, ISO/IEC 14496-22) inherits Windows GDI (because of the names of each bytes), so it should be stated in OpenType spec at least (and OOXML just refers ISO/IEC 14496-22 without OOXML-specific Panose), but more discussion is needed to insist it.

# MPEG-4 has a spec for conformance test: ISO/IEC 14496-4 and, # the referential software: ISO/IEC 14496-5. I think they should # be aware of the incompatibility.

--

Considering the exist of OpenType fonts bundled to Microsoft Windows and Apple Mac OS X including the Panose values incompatible with genuine Panose spec, I prefer the loose regex permitting covering both of genuine & Windows GDI Panose values.

Now I have following options:

1) the loose regex (the genuine & Windows GDI Panose are permitted)
   is included without note. when the discussion in OpenType or
   ISO/IEC 14496-22 people has any resolution (insertion of the
   note, update of ISO/IEC 14496-22 spec, etc), it will be reflected
   in future amendment or corrigendum.

2) the loose regex (the genuine & Windows GDI Panose are permitted)
   is included, with a note something like "the valid values are
   wider than Panose spec, for the compatibility with existing
   implementations."

3) postpone the inclusion of detailed regex for Panose and leave
   as it is now (10 bytes in hexdigit expression), until the day
   when OpenType or ISO/IEC 14496-22 people has any resolution.

Chris and Murata-san, which is the best option?

Regards,
suzuki toshiya, Hiroshima University, Japan

Chris Rae wrote:
> Suzuki-san - we discussed this defect report briefly on the last WG4 call. Many thanks, as usual, for the immensely useful background information and all the work you've done here. It's great to see detail experts participating in WG4 like this!
> 
> I mentioned on the call that my preference would be to take your adjusted RegExp from the later mail in this thread as the preferred version in the standard (I'll confirm that in the next week or two - we're busy testing it at the moment). However, I believe Microsoft didn't ever knowingly contradict the Panose standard, so I'm hoping I can simultaneously convince you that that RegExp is in fact a valid RegExp for testing Panose.
> 
> I think the part of the Panose standard (http://www.panose.com/ProductsServices/pan1.aspx for everyone else) that disagrees with the "Latin Pictorial" page is section 1.5. It reads:
> 
> --
> 1.5 Digit values of 0 and 1
> The reader will notice that the value 0 and 1 are defined as Any and No Fit for every digit in the PANOSE system. These have specific meanings to the mapper. 0 means match that digit with any available digit. This allows the mapper to handle distortable typefaces such as multiple master fonts in which, for example, weights may be variable or serifs may change. 1 means that the item being classified does not fit within the present system. There are two possible causes of this. First is that there has been no work done on that family of faces, for example at the present time an Arabic cursive font would have the PANOSE number 1 1 1 1 1 1 1 1 1 1 as there has as yet been no work done on Arabic fonts.
> --
> 
> I think these global overrides make values such as the 05000000000000000000 we were discussing valid Panose. Do you have any thoughts?
> 
> Chris
> 
> -----Original Message-----
> From: mpsuzuki at hiroshima-u.ac.jp [mailto:mpsuzuki at hiroshima-u.ac.jp]
> Sent: 26 October 2010 22:03
> To: Chris Rae
> Cc: e-SC34-WG4 at ecma-international.org
> Subject: Re: DR 09-0061 _ Shared MLs, Shared Simple Types: Constrain 
> ST_Panose value set
> 
> Dear Chris,
> 
> Excuse me, 05000000000000000000 is invalid Panose, if I follow panose.com definition.
> 
> Please refer "Latin Pictorial" page,
> 
>   http://www.panose.com/ProductsServices/pan5.aspx
> 
> "5.3 Weight" and "5.5 Aspect ratio & contrast".
> In there, the 3rd and 5th digits of the Panose for family kind 5 are restricted to "1".
> "0" is unavailable. Thus my panose.com-strict regex
> 
> \s*0?2\s*0?[0-9A-Fa-f]\s*0?[0-9ABab]\s*0?[0-9]\s*0?[0-9]\s*0?[0-9Aa]\s
> *0?[0-9ABab]\s*0?[0-9A-Fa-f]\s*0?[0-9A-Da-d]\s*0?[0-7]\s*|\s*0?3\s*0?[
> 0-9]\s*0?[0-9ABab]\s*0?[0-3]\s*0?[0-6]\s*0?[0-9]\s*0?[0-9Aa]\s*0?[0-9A
> -Da-d]\s*0?[0-9A-Da-d]\s*0?[0-6]\s*|\s*0?4\s*0?[0-9A-Ca-c]\s*0?[0-9ABa
> b]\s*0?[0-9]\s*0?[0-9A-Da-d]\s*(0?[0-9A-Fa-f]|10)\s*0[0-7]\s*0?[0-8]\s
> *0[0-9A-Fa-f]\*0[0-5]\s*|\s*0?5\s*0?[0-9A-Ca-c]\s*0?1\s*0?[0-3]\s*0?1\
> s*0?[0-9]\s*0?[0-9]\s*0?[0-9]\s*0?[0-9]\s*0?[0-9]\s*
> 
> refuses 05000000000000000000.
> 
> However, as I mentioned in Tokyo meeting, the definition of Panose is different between MSDN and panose.com.
> 
> http://msdn.microsoft.com/en-us/library/ms533998.aspx
> http://www.panose.com/
> 
> The MSDN-style regex
> 
> \s*0?[0-5]\s*0?[0-9A-Fa-f]\s*0?[0-9ABab]\s*0?[0-9]\s*0?[0-9]\s*0?[0-8]
> \s*0?[0-9ABab]\s*0?[0-9A-Fa-f]\s*0?[0-9A-Da-d]\s*0?[0-7]\s*
> 
> accepts 05000000000000000000.
> 
> I will check the Panose values in existing TrueType fonts bundled to Microsoft Windows etc and the number of the Panose that is valid in MSDN syntax but invalid in Panose.com syntax is remarkably large.
> 
> Regards,
> mpsuzuki
> 
> On Wed, 27 Oct 2010 11:57:27 +0900
> mpsuzuki at hiroshima-u.ac.jp wrote:
> 
>> Dear Chris,
>>
>> Sorry for lated response. The value 05000000000000000000 must be 
>> accepted, and I was thinking my proposal accepts it. I will check, 
>> please wait. I will post my replies to other issues within 12 hours.
>>
>> Regards,
>> suzuki toshiya, Hiroshima University, Japan
>>
>> On Tue, 26 Oct 2010 21:24:50 +0000
>> Chris Rae <Chris.Rae at microsoft.com> wrote:
>>
>>> I may have spoken too soon on this one. It would appear there are some values which are acceptable according to the Panose spec which this RegExp does not regard as valid. For example, 05000000000000000000. This seems to be valid according to section 1.5 of the Panose spec (http://www.panose.com/ProductsServices/pan1.aspx) but doesn't match this RegEx. I've pasted the section below.
>>>
>>> Suzuki-san - is there a chance you could check that I'm right in this assertion?
>>>
>>> Chris
>>>
>>> --
>>>
>>> Panose: 1.5 Digit values of 0 and 1
>>> The reader will notice that the value 0 and 1 are defined as Any and No Fit for every digit in the PANOSE system. These have specific meanings to the mapper. 0 means match that digit with any available digit. This allows the mapper to handle distortable typefaces such as multiple master fonts in which, for example, weights may be variable or serifs may change. 1 means that the item being classified does not fit within the present system. There are two possible causes of this. First is that there has been no work done on that family of faces, for example at the present time an Arabic cursive font would have the PANOSE number 1 1 1 1 1 1 1 1 1 1 as there has as yet been no work done on Arabic fonts.
>>>
>>> -----Original Message-----
>>> From: Chris Rae [mailto:Chris.Rae at microsoft.com]
>>> Sent: 25 October 2010 15:37
>>> To: e-SC34-WG4 at ecma-international.org
>>> Cc: suzuki toshiya (mpsuzuki at hiroshima-u.ac.jp)
>>> Subject: DR 09-0061 _ Shared MLs, Shared Simple Types: Constrain 
>>> ST_Panose value set
>>>
>>> http://cid-c8ba0861dc5e4adc.office.live.com/view.aspx/Public%20Docum
>>> en
>>> ts/2009/DR-09-0061.docx
>>>
>>> This is a very simple one indeed. We talked about this at some length at Tokyo - I was under the impression that certain valid Panose values were not accepted by Suzuki-san's RegEx. This, it turns out, was a mistake - I was truncating unused zeros from the front of the strings and in actual fact this is not done (in Office, or in the Panose specification itself). Panose consists of ten byte couplets denoted in hex where leading zeros are always included, even on the first byte.
>>>
>>> I used http://www.regexplanet.com/simple/index.html to validate Suzuki-san's sample against some Word documents and it looks like this RegEx does indeed work fine (some sample values: 020F0502020204030204, 02010600030101010101, 020B0604020202020204, 02020603050405020304, 02040503050406030204).
>>>
>>> I think we can accept the original solution as proposed by the submitter in the DR. Let's discuss on the next call.
>>>
>>> Chris
>>>
>