Full validation of ODF/OOXML/EPUB, etc. with DFDL and NVDL?
robert_weir at us.ibm.com
robert_weir at us.ibm.com
Wed Nov 17 14:50:21 CET 2010
I recently ran across Data Format Description Language (DFDL), pronounced
"Daffodil", a draft standard in the Open Grid Forum. It is a validation
language for non-markup data, including text and binary files. It is
oriented toward record based formats, commonly used in scientific and
industrial applications, including modern and legacy formats. DFDL is
expressed as annotations on an XML Schema, using XML Schema (and
Datatypes) to express the logical format of the data, and using
annotations on the XSD to express physical aspects like byte ordering,
etc.
I'm doing a little hand waving, but consider this:
The formats of interest for this study group are binary on the outside and
XML and binary (images, etc,) on the inside. And the XML on the inside is
in a variety of languages, expressed in generally by multiple schema
definition languages.
So NVDL has a role to play. But to-date that would only work for the XML
pieces. What about the binary?
What if we brought DFDL into SC34/WG1 as a new part of DSDL? (This is
within the realm of possibility, based on my conversations with a
colleague of mine who chairs the DFDL WG).
Could we then express ZIP formally using DFDL? And by doing so in DFDL,
enable the kind of modularization we're also seeking?
What if we then enhanced NVDL to allow an out-side in validation of such
ZIP+XML+ binaries ? So we can express validation not only of the ZIP, but
also the contents of the ZIP, both markup as well as binaries. Imagine a
DFDL description of PNG, for example.
Obviously this is not the shortest path to getting a normative ZIP
reference for ODF 1.2. The shortest path is to do an RER. And given the
timetable that is what the ODF TC will likely end up doing. But I think
there is great value in tackling the general problem here, which is how
binary and general text data and markup relate together in complex
scenarios. Whether we're talking about scientific data collection, legacy
formats or even modern web formats like JSON, it is clear that a "pure"
XML world exists only in the imagination.
Regards,
-Rob
More information about the sc34wg1study
mailing list