Draft for review: ISO 8601 date work on IS 29500

Wed Jul 14 08:02:44 CEST 2010

> Hi Rick - we do need to resort to xs:String, because the cell value can
> also actually be a string. I wasn't very clear at the beginning of this
> thread, but the cell type is denoted by its v attribute, which can be one
> of:
>
> b (Boolean): Cell containing a boolean.
> d (Date): Cell contains a date in the ISO 8601 format.
> e (Error): Cell containing an error.
> inlineStr (Inline String): Cell containing an (inline) rich string
> n (Number): Cell containing a number.
> s (Shared String): Cell containing a shared string.
> str (String): Cell containing a formula string.

Even more reason to clearly spell out the story in a schema.

In XML Schemas 1.0, you can just have something like the following:

<xs:simpleType>
  <xs:union>
     <xs:boolean />
     <xs:integer />
     <xs:simpleType ref="s:emptyType" />
     <xs:string />
  </xs:union>
</xs:simpleType >

However, I think your point is that schema-aware processors are not
supposed to parse the string to determine type, but the attribute value
should be used: if the type attribute says "string" and the string looks
like a date, you don't want a type-binding schema processor to make it a
dataTime object: is it the case that the purpose of the schema is to allow
data binding rather than validation, since they have conflicting
requirements here? In that case, just saying xs:string (or better, not
assigning any type at all) might indeed be best for the XSD.

For a more specific statement, you could use Schematron, something like:

<sch:pattern>
  <sch:title>Rules for Cell types</sch:title>

  <sch:rule context="c[@t='b']>
    <sch:assert test="boolean(.)">A cell with value type boolean
    must have a boolean value.</sch:assert>
  </sch:rule>

  <sch:rule context="c[@t='d'][not(@odf:dateTime)]">
    <sch:assert test="number(.)">A cell with value type date
    must have a numeric value.</sch:assert>
  </sch:rule>

  <sch:rule context="c[@t='d'][@odf:dateTime]">
    <sch:assert test="number(.) or string-length(.) = 0">A cell with value
    type date may have a numeric value.</sch:assert>
    <sch:assert test="dateTime(@odf:dateTime)">
    The dateTime attribute should contain a simple ISO8601 date</sch:assert>
  </sch:rule>

  <sch:rule context="c[@t='n']">
    <sch:assert test="number(.)">A cell with value type boolean
    must have a boolean value.</sch:assert>
  </sch:rule>

  <sch:rule context="c[@t='s']">
    <sch:assert test="number(.)">A cell with value type shared
    must have a numeric reference as its value.</sch:assert>
    <sch:assert test="$SharedStringTable//c[@id = current()/.]">
    A cell with value type shared must have a numeric reference
    to an existing item in the shared string table as its value.
    </sch:assert>

  </sch:rule>

  <sch:rule context="c[@t='e']">
    <sch:assert test="string-length(.) = 0">A cell with value type empty
    must have no value.</sch:assert>
  </sch:rule>

  <sch:rule context="c[@t='inlineStr']">
    <sch:assert test="true()">A cell with value type inlineStr
    must have a string value.</sch:assert>
  </sch:rule>

  <sch:rule context="c[@t='str']">
    <sch:assert test="true()">A cell with value type str
    must have a string value.</sch:assert>
  </sch:rule>

  <sch:rule context="c">
    <sch:report test="true()">A cell must have one of the defined value
types.</sch:report>
  </sch:rule>

</sch:pattern>

I think RELAX NG would be excellent for many of these, too.

> The actual content is in the v attribute, which is xs:String in the
> schema, with the restrictions in the prose. Unless we break back
> compatibility with applications which use IS 29500:1, we'll have to keep
> this somewhat schema-unfriendly link between these attributes. If I follow
> it right, adding the MustUnderstand namespace (a-la ODF) would cause any
> files with the new narrower-scope date formats to be unreadable in
> applications which were only compliant with IS 29500:1 - I'd be less keen
> to do that given that the changes Gareth and I are making here are only
> reductive, so that application would have been able to understand these
> files fine if we left the markup structure the same and just tightened
> down the date values we could accept in there.

The MustUnderstand would only be in the case where the odf:dateTime code
is given and not the case where there was a fast-load numeric version. In
that case, the user should be alerted if the consuming application cannot
understand it.

What would this achieve?

1) No change required for existing documents.

2) Generating applications can decide to use numbers, or numbers + dates
or just dates. (I presume very few would choose just dates, initially.)

3) It would remove the either/or problem, where people have to upgrade or
else. Instead, it would allow a transition period or on-ramp period where
more spreadsheets would supply the date. This period might take 5 or 10
years, it can be left to the market. (At some appropriate stage in the
future, when applications that accept dates have the critical mass, some
stronger wording can be added to deprecate the numeric form, if that were
considered desirable.)

4) The files might be larger if both are used (as happens with ODF) but
the speed penalty from parsing dates to numbers would not need to be
foregone even if the document also has the dateTime attribute.

Cheers
Rick Jelliffe