[FYI] charset attribute and signedness

suzuki toshiya mpsuzuki at hiroshima-u.ac.jp
Thu Sep 24 14:31:52 CEST 2015


Today we found that existing implementation of OOXML
writes the charset value from 0x80 - 0xFF as minus
decimal value. This is consistent with the description,
"The possible values for this attribute are defined by
the W3C XML Schema byte datatype."
http://www.w3.org/TR/xmlschema-2/#byte
(if we want to write positive 128-255, unsignedByte
type should be used)

When I change the charset attribute of Century typeface
from "0" to "161" (unsigned 8bit interpretation of 0xA1,
meaning Greek charset), Office2010 complains a resource
in the document could not be opened. But, when I set
charset as "-95" (signed 8bit interpretation of 0xA1),
Office2010 does not complain.

So I have to apologize that my experiments on inconsistent
charset are based on invalid data, violating the type
definition itself.

Anyway, we agreed to write as the ambiguous cases would
be dealt as the application-dependent behaviour, so this
information has no impact with current draft.

Regards,
mpsuzuki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: office2010-differentFontName-Charset=-95.pptx
Type: application/vnd.openxmlformats-officedocument.presentationml.presentation
Size: 1771786 bytes
Desc: not available
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20150924/c6f2d6f6/attachment-0001.pptx>


More information about the sc34wg4 mailing list