Logical archive model
robert_weir at us.ibm.com
robert_weir at us.ibm.com
Tue Nov 2 23:17:34 CET 2010
So what is needed, I think, is something like this:
---------------------------------------------------------
An archive is in a hierarchical structure containing items. Items may be
directories or files. Directories may contain other items. Files are
terminals and do not contain other items. Directories may be empty.
Items are ordered in the archive, though the order of the items bears no
necessary relationships to the hierarchical structure, e.g., there is no
requirement that a "parent" item appear before a "child" item.
Items are identified by an IRI path, which conform to the "ipath-absolute"
production in RFC 3987.
Items may have associated attributes. Attributes defined by this standard
include:
Creation Date (ISO 8601)
Modified Date (ISO 8601)
Size (long integer)
Additional attributes, including implementation-defined attributes, are
also permitted.
An archive is stored in an archive encoding, e.g., ZIP, GZIP, TAR, XML,
etc..
---------------------------------------------------------
We don't need a whole file system. For example, we don't need to deal
with locking, symbolic links, permissions or anything like that.
So stopping here, can any one think of any aspect of ODF, OOXML, EPUB
packaging, or whatever that cannot be expressed in this model?
For example, one of the ODF requirements is that the mimetype file be the
first in the ZIP and that it be uncompressed. We can clearly express
that. Everything can just be specifying items via IRI path.
I'm putting compression aside, for a second, since I don't think that is
an essential aspect of packaging. It is however, an important aspect of
particular encodings, where it would fit in as additional attributes,
e.g.:
Compression Method (enum/string)
Original Size (long integer)
But compression per se does not really carry semantic value at the
application/document level, at least not among formats like ODF, OOXML,
EPUB, etc. But a particular software application may be very interested
in setting this attribute on a per Item basis, to optimize storage based
on underlying content types, e.g., don't compress already compressed
images, but do compress XML.
So this isn't rocket science, but if we had this logical archive model, as
well as at least one encoding of it, in ZIP, then I think it would be
possible to cleanly express what we need in document format uses. And by
using this separation of logical model from encoding, we also future-proof
this technology and allow other approaches to encoding be used in the
future, e.g., ones that are more streaming-friendly,
-Rob
More information about the sc34wg1study
mailing list