Logical archive model
Dennis E. Hamilton
dennis.hamilton at acm.org
Wed Nov 3 01:26:32 CET 2010
This is fine at a logical level. In fact, specifying an order would not
make sense. Sniffing for magic numbers is a physical-/storage-structure
level provision, and a MIME TYPE is probably a property separate from the
set of items themselves.
I am not sure what you will do with directories but if you want to have
them, you will stretch APP NOTE 6.2.0 as a carrier.
Also, for ipath-absolute, you need to rule out "." and ".." as isegment-nz
and isegment values.
Note, presumably, an ipath-absolute that ends with "/" (with an empty
isegment) would appear to be the way to name a "directory."
[Side note: When mapping to Zip, which is not done here yet, the leading "/"
has to disappear because Zip filenames may not begin with "/" (APP NOTE
6.2.0). This also means that "/" by itself can't be the mapping of an
I agree that we are talking about abstracted content so the file items would
be the logical data streams and not be compressed, encrypted, (signed ?), or
anything else. Digital signatures might be tricky here or might need to be
at least one layer closer to physical.
From: sc34wg1study-bounces at vse.cz [mailto:sc34wg1study-bounces at vse.cz] On
Behalf Of robert_weir at us.ibm.com
Sent: Tuesday, November 02, 2010 15:18
To: 'ISO Zip'
Subject: Logical archive model
So what is needed, I think, is something like this:
An archive is in a hierarchical structure containing items. Items may be
directories or files. Directories may contain other items. Files are
terminals and do not contain other items. Directories may be empty.
Items are ordered in the archive, though the order of the items bears no
necessary relationships to the hierarchical structure, e.g., there is no
requirement that a "parent" item appear before a "child" item.
Items are identified by an IRI path, which conform to the "ipath-absolute"
production in RFC 3987.
Items may have associated attributes. Attributes defined by this standard
Creation Date (ISO 8601)
Modified Date (ISO 8601)
Size (long integer)
Additional attributes, including implementation-defined attributes, are
An archive is stored in an archive encoding, e.g., ZIP, GZIP, TAR, XML,
We don't need a whole file system. For example, we don't need to deal
with locking, symbolic links, permissions or anything like that.
So stopping here, can any one think of any aspect of ODF, OOXML, EPUB
packaging, or whatever that cannot be expressed in this model?
For example, one of the ODF requirements is that the mimetype file be the
first in the ZIP and that it be uncompressed. We can clearly express
that. Everything can just be specifying items via IRI path.
I'm putting compression aside, for a second, since I don't think that is
an essential aspect of packaging. It is however, an important aspect of
particular encodings, where it would fit in as additional attributes,
Compression Method (enum/string)
Original Size (long integer)
But compression per se does not really carry semantic value at the
application/document level, at least not among formats like ODF, OOXML,
EPUB, etc. But a particular software application may be very interested
in setting this attribute on a per Item basis, to optimize storage based
on underlying content types, e.g., don't compress already compressed
images, but do compress XML.
So this isn't rocket science, but if we had this logical archive model, as
well as at least one encoding of it, in ZIP, then I think it would be
possible to cleanly express what we need in document format uses. And by
using this separation of logical model from encoding, we also future-proof
this technology and allow other approaches to encoding be used in the
future, e.g., ones that are more streaming-friendly,
sc34wg1study mailing list
sc34wg1study at vse.cz
More information about the sc34wg1study