DR 09-0061 _ Shared MLs, Shared Simple Types: Constrain ST_Panose value set

mpsuzuki at hiroshima-u.ac.jp mpsuzuki at hiroshima-u.ac.jp
Wed Oct 27 13:27:41 CEST 2010


Dear Chris,

I revise my proposal of regex for Panose as:

\s*0[0-5]\s*0[0-9A-Fa-f]\s*0[0-9ABab]\s*0[0-9]\s*0[0-9A-Da-d]\s*(0[0-9A-Fa-f]|10)\s*0[0-9ABab]\s*0[0-9A-Fa-f]\s*0[0-9A-Fa-f]\s*0[0-9]\s*

The background is following.
-----------------------------------------------------
1. POSSIBLE VALUES OF PANOSE, SPEC AND IMPLEMENTATION

I attached 3 lists of Panose values in TTFs
found to Microsoft Windows 7, Mac OS X, and
the fonts distributed in Debian GNU/Linux.
In the cases of Microsoft Windows and Mac OS
X, most fonts are bundled/preinstalled fonts,
but some fonts are installed by applications
(e.g. Wolfram's Mathematica, LaTeX, etc).

Also I attached 2 perl scripts to validate
Panose values by MSDN-style and Panose.com-style.
The check results are following:

The fonts on Microsoft Windows:
invalid panose: 04040404050702020202    # CURLZ___.TTF
invalid panose: 04040504061007020d02    # GIGI.TTF
invalid panose: 04040505050a02020702    # HARNGTON.TTF
invalid panose: 04040605051002020d02    # Gabriola.ttf
invalid panose: 04040805050809020602    # RAVIE.TTF
invalid panose: 04040905080b02020502    # BROADW.TTF
invalid panose: 040409050d0802020404    # STENCIL.TTF
invalid panose: 04040a07060a02020202    # SNAP____.TTF
invalid panose: 04060505060202020a04    # FELIXTI.TTF
invalid panose: 04090605060d06020702    # JOKERMAN.TTF
invalid panose: 040b0a09000000000000    # HGRPP1.TTC
invalid panose: 05000000000000000000    # esri_30s.ttf
invalid panose: 05000000000000000000    # wingding.ttf
invalid panose: 05000500000000000000    # REFSPCL.TTF
invalid panose: 05000800060100000101    # Mathematica1b.ttf
invalid panose: 05010000000000000000    # opens___.ttf
invalid panose: 05010100010000000000    # OUTLOOK.TTF
invalid panose: 05010101010101010101    # BSSYM7.TTF
invalid panose: 05010400040101000101    # Mathematica4.ttf
invalid panose: 05010400040101000101    # Mathematica4m.ttf
invalid panose: 05010700040101000101    # Mathematica4mb.ttf
invalid panose: 05010800040101000101    # Mathematica4b.ttf

The fonts on Mac OS X:
invalid panose: 01000500000000020003    # Kokonor.ttf
invalid panose: 01000500000000020004    # Baghdad.ttf
invalid panose: 05000000000000000000    # Apple Braille Outline 6 Dot.ttf
invalid panose: 05000000000000000000    # Apple Braille Outline 8 Dot.ttf
invalid panose: 05000000000000000000    # Apple Braille Pinpoint 6 Dot.ttf
invalid panose: 05000000000000000000    # Apple Braille Pinpoint 8 Dot.ttf
invalid panose: 05000000000000000000    # Apple Braille.ttf
invalid panose: 05000000000000000000    # Wingdings.ttf

The fonts for Debian GNU/Linux:
invalid panose: 01000500000000000000    # homa.ttf
invalid panose: 01000506000000020004    # nazli.ttf
invalid panose: 01000700000000000000    # titr.ttf
invalid panose: 030b0800000101010101    # UnPilgiBold.ttf
invalid panose: 04040905080102020602    # vectroid.ttf
invalid panose: 040b0600000101010101    # UnBatang.ttf
invalid panose: 040b0600000101010101    # UnGraphic.ttf
invalid panose: 040b0600000101010101    # UnJamoBatang.ttf
invalid panose: 040b0600000101010101    # UnJamoNovel.ttf
invalid panose: 040b0600000101010101    # UnJamoSora.ttf
invalid panose: 040b0600000101010101    # UnShinmun.ttf
invalid panose: 040b0600000101010101    # UnVada.ttf
invalid panose: 040b0800000101010101    # UnBatangBold.ttf
invalid panose: 040b0800000101010101    # UnBom.ttf
invalid panose: 040b0800000101010101    # UnGraphicBold.ttf
invalid panose: 040b0800000101010101    # UnYetgul.ttf
invalid panose: 05010000000000000000    # opens___.ttf

MSDN-style Panose accepts all Panose values in above.
I guess, most OpenType follows to MSDN's spec, not
genuine Panose.com spec.

Considering that the popular font, Wingdings violates
the range defined by Panose.com spec, using Panose.com-
strict syntax would make many OOXML (and ODF) documents
marked as invalid. It will cause many troubles.

Here I propose new syntax that can cover all possible
values in MSDN spec and Panose.com spec.

--
The possible values in MSDN spec:

  bFamilyType           0x00 - 0x05
  bSerifStyle           0x00 - 0x0F
  bWeight               0x00 - 0x0B
  bProportion           0x00 - 0x09
  bContrast             0x00 - 0x09
  bStrokeVariation      0x00 - 0x08
  bArmStyle             0x00 - 0x0B
  bLetterForm           0x00 - 0x0F
  bMidLine              0x00 - 0x0D
  bXHeight              0x00 - 0x07

The possible defined values in Panose.com spec:

    LatinText     LatinHandWritten  LatinDecoratives  LatinSymbol
#0  0x02          0x03              0x04              0x05
#1  0x00 - 0x0F   0x00 - 0x09       0x00 - 0x0C       0x00 - 0x0C
#2  0x00 - 0x0B   0x00 - 0x0B       0x00 - 0x0B       0x01
#3  0x00 - 0x09   0x00 - 0x03       0x00 - 0x09       0x00 - 0x03
#4  0x00 - 0x09   0x00 - 0x06       0x00 - 0x0D       0x01
#5  0x00 - 0x0A   0x00 - 0x09       0x00 - 0x10       0x00 - 0x09
#6  0x00 - 0x0B   0x00 - 0x0A       0x00 - 0x07       0x00 - 0x09
#7  0x00 - 0x0F   0x00 - 0x0D       0x00 - 0x08       0x00 - 0x09
#8  0x00 - 0x0D   0x00 - 0x0D       0x00 - 0x0F       0x00 - 0x09
#9  0x00 - 0x07   0x00 - 0x06       0x00 - 0x05       0x00 - 0x09

Combining both ranges, the result is:

#0 0x00 - 0x05 (same with MSDN)
#1 0x00 - 0x0F (same with MSDN)
#2 0x00 - 0x0B (same with MSDN)
#3 0x00 - 0x09 (same with MSDN)
#4 0x00 - 0x0D (0x0A - 0x0D are used by LatinDecorative in Panose.com, see 4.5)
#5 0x00 - 0x10 (0x0B - 0x10 are used by LatinDecorative in Panose.com, see 4.6)
#6 0x00 - 0x0B (same with MSDN)
#7 0x00 - 0x0F (same with MSDN)
#8 0x00 - 0x0F (0x0E - 0x0F are used by LatinDecorative in Panose.com, see 4.9)
#9 0x00 - 0x09 (0x08 - 0x09 are used by LatinSymbol in Panose.com, see 5.10)

Writing a regex to cover this set of the range would be:

\s*0?[0-5]\s*0?[0-9A-Fa-f]\s*0?[0-9ABab]\s*0?[0-9]\s*0?[0-9A-Da-d]\s*(0?[0-9A-Fa-f]|10)\s*0?[0-9ABab]\s*0?[0-9A-Fa-f]\s*0?[0-9A-Fa-f]\s*0?[0-9]

-----------------------------------------------------
2. NIBBLES OR OCTETS, HEXDIGITS OR DECIMALS?

I remember there was an another issue about the regex for
Panose values. Some people may want to use "10 nibbles in
hexdigit numbers" (aslike "5FB998BFD7") instead of "10
octets in hexdigit numbers" (aslike "050F0B0909080B0F0D07"),
because the defined ranges in MSDN-spec does not exceed
0x0F. I asked for Microsoft's staffs investigation if
there is any existing implementation of OOXML to write
Panose value as 10 nibbles.

I think 10 nibbles WITHOUT DELIMITER is strongly discouraged,
even if there was any implementations, because a range in
Panose.com spec can exceed 0x0F and there is an ambiguity
how to split the hexdigit strings into each values.

In the case of Adobe PDF, Panose values are expressed by
10 octets in hexdigit numbers delimited by space.
In the case of CSS, Panose values are expressed by 10 decimal
numbers delimited by space. Thinking about the interchange
of the font related part between CSS and OOXML, acceptance
of CSS-like 10 decimal number expression is expected, but
it is difficult to eliminate the ambiguity in parsing if we
tries to support hexdigit expression and decimal expression.
I think the decimal expression should not be supported to
avoid the ambguity, as far as existing OOXML implementation
in Microsoft products had never supported such.

If we can refuse the ambigious abbreviated syntax aslike
"5FB9..." to mean "05 0F 0B 09 ...", the improved regex
would be:

\s*0[0-5]\s*0[0-9A-Fa-f]\s*0[0-9ABab]\s*0[0-9]\s*0[0-9A-Da-d]\s*(0[0-9A-Fa-f]|10)\s*0[0-9ABab]\s*0[0-9A-Fa-f]\s*0[0-9A-Fa-f]\s*0[0-9]\s*

Regards,
mpsuzuki

On Wed, 27 Oct 2010 14:03:05 +0900
mpsuzuki at hiroshima-u.ac.jp wrote:

>Dear Chris,
>
>Excuse me, 05000000000000000000 is invalid Panose,
>if I follow panose.com definition.
>
>Please refer "Latin Pictorial" page,
>
>  http://www.panose.com/ProductsServices/pan5.aspx
>
>"5.3 Weight" and "5.5 Aspect ratio & contrast".
>In there, the 3rd and 5th digits of the Panose for
>family kind 5 are restricted to "1".
>"0" is unavailable. Thus my panose.com-strict regex
>
>\s*0?2\s*0?[0-9A-Fa-f]\s*0?[0-9ABab]\s*0?[0-9]\s*0?[0-9]\s*0?[0-9Aa]\s*0?[0-9ABab]\s*0?[0-9A-Fa-f]\s*0?[0-9A-Da-d]\s*0?[0-7]\s*|\s*0?3\s*0?[0-9]\s*0?[0-9ABab]\s*0?[0-3]\s*0?[0-6]\s*0?[0-9]\s*0?[0-9Aa]\s*0?[0-9A-Da-d]\s*0?[0-9A-Da-d]\s*0?[0-6]\s*|\s*0?4\s*0?[0-9A-Ca-c]\s*0?[0-9ABab]\s*0?[0-9]\s*0?[0-9A-Da-d]\s*(0?[0-9A-Fa-f]|10)\s*0[0-7]\s*0?[0-8]\s*0[0-9A-Fa-f]\*0[0-5]\s*|\s*0?5\s*0?[0-9A-Ca-c]\s*0?1\s*0?[0-3]\s*0?1\s*0?[0-9]\s*0?[0-9]\s*0?[0-9]\s*0?[0-9]\s*0?[0-9]\s*
>
>refuses 05000000000000000000.
>
>However, as I mentioned in Tokyo meeting, the definition
>of Panose is different between MSDN and panose.com.
>
>http://msdn.microsoft.com/en-us/library/ms533998.aspx
>http://www.panose.com/
>
>The MSDN-style regex
>
>\s*0?[0-5]\s*0?[0-9A-Fa-f]\s*0?[0-9ABab]\s*0?[0-9]\s*0?[0-9]\s*0?[0-8]\s*0?[0-9ABab]\s*0?[0-9A-Fa-f]\s*0?[0-9A-Da-d]\s*0?[0-7]\s*
>
>accepts 05000000000000000000.
>
>I will check the Panose values in existing TrueType fonts
>bundled to Microsoft Windows etc and the number of the
>Panose that is valid in MSDN syntax but invalid in Panose.com
>syntax is remarkably large.
>
>Regards,
>mpsuzuki
>
>On Wed, 27 Oct 2010 11:57:27 +0900
>mpsuzuki at hiroshima-u.ac.jp wrote:
>
>>Dear Chris,
>>
>>Sorry for lated response. The value 05000000000000000000
>>must be accepted, and I was thinking my proposal accepts
>>it. I will check, please wait. I will post my replies to
>>other issues within 12 hours.
>>
>>Regards,
>>suzuki toshiya, Hiroshima University, Japan
>>
>>On Tue, 26 Oct 2010 21:24:50 +0000
>>Chris Rae <Chris.Rae at microsoft.com> wrote:
>>
>>>I may have spoken too soon on this one. It would appear there are some values which are acceptable according to the Panose spec which this RegExp does not regard as valid. For example, 05000000000000000000. This seems to be valid according to section 1.5 of the Panose spec (http://www.panose.com/ProductsServices/pan1.aspx) but doesn't match this RegEx. I've pasted the section below.
>>>
>>>Suzuki-san - is there a chance you could check that I'm right in this assertion?
>>>
>>>Chris
>>>
>>>--
>>>
>>>Panose: 1.5 Digit values of 0 and 1
>>>The reader will notice that the value 0 and 1 are defined as Any and No Fit for every digit in the PANOSE system. These have specific meanings to the mapper. 0 means match that digit with any available digit. This allows the mapper to handle distortable typefaces such as multiple master fonts in which, for example, weights may be variable or serifs may change. 1 means that the item being classified does not fit within the present system. There are two possible causes of this. First is that there has been no work done on that family of faces, for example at the present time an Arabic cursive font would have the PANOSE number 1 1 1 1 1 1 1 1 1 1 as there has as yet been no work done on Arabic fonts.
>>>
>>>-----Original Message-----
>>>From: Chris Rae [mailto:Chris.Rae at microsoft.com] 
>>>Sent: 25 October 2010 15:37
>>>To: e-SC34-WG4 at ecma-international.org
>>>Cc: suzuki toshiya (mpsuzuki at hiroshima-u.ac.jp)
>>>Subject: DR 09-0061 _ Shared MLs, Shared Simple Types: Constrain ST_Panose value set
>>>
>>>http://cid-c8ba0861dc5e4adc.office.live.com/view.aspx/Public%20Documents/2009/DR-09-0061.docx
>>>
>>>This is a very simple one indeed. We talked about this at some length at Tokyo - I was under the impression that certain valid Panose values were not accepted by Suzuki-san's RegEx. This, it turns out, was a mistake - I was truncating unused zeros from the front of the strings and in actual fact this is not done (in Office, or in the Panose specification itself). Panose consists of ten byte couplets denoted in hex where leading zeros are always included, even on the first byte.
>>>
>>>I used http://www.regexplanet.com/simple/index.html to validate Suzuki-san's sample against some Word documents and it looks like this RegEx does indeed work fine (some sample values: 020F0502020204030204, 02010600030101010101, 020B0604020202020204, 02020603050405020304, 02040503050406030204).
>>>
>>>I think we can accept the original solution as proposed by the submitter in the DR. Let's discuss on the next call.
>>>
>>>Chris
>>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: panose-jpwin7.tsv
Type: text/tab-separated-values
Size: 20594 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20101027/c18c92f9/attachment-0003.tsv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: panose-macosx.tsv
Type: text/tab-separated-values
Size: 7448 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20101027/c18c92f9/attachment-0004.tsv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: panose-debian.tsv
Type: text/tab-separated-values
Size: 74208 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20101027/c18c92f9/attachment-0005.tsv>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: check-invalid-panose-by-msdn.pl
Type: text/x-perl
Size: 288 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20101027/c18c92f9/attachment-0002.pl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: check-invalid-panose-by-panose-com.pl
Type: text/x-perl
Size: 800 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20101027/c18c92f9/attachment-0003.pl>


More information about the sc34wg4 mailing list