Logical archive model

Dennis E. Hamilton dennis.hamilton at acm.org
Wed Nov 3 01:26:32 CET 2010

This is fine at a logical level.  In fact, specifying an order would not
make sense.  Sniffing for magic numbers is a physical-/storage-structure
level provision, and a MIME TYPE is probably a property separate from the
set of items themselves.

I am not sure what you will do with directories but if you want to have
them, you will stretch APP NOTE 6.2.0 as a carrier.

Also, for ipath-absolute, you need to rule out "." and ".." as isegment-nz
and isegment values.

Note, presumably, an ipath-absolute that ends with "/" (with an empty
isegment) would appear to be the way to name a "directory."

[Side note: When mapping to Zip, which is not done here yet, the leading "/"
has to disappear because Zip filenames may not begin with "/" (APP NOTE
6.2.0).  This also means that "/" by itself can't be the mapping of an

I agree that we are talking about abstracted content so the file items would
be the logical data streams and not be compressed, encrypted, (signed ?), or
anything else.  Digital signatures might be tricky here or might need to be
at least one layer closer to physical.

-----Original Message-----
From: sc34wg1study-bounces at vse.cz [mailto:sc34wg1study-bounces at vse.cz] On
Behalf Of robert_weir at us.ibm.com
Sent: Tuesday, November 02, 2010 15:18
To: 'ISO Zip'
Subject: Logical archive model

So what is needed, I think,  is something like this:


An archive is in a hierarchical structure containing items.  Items may be 
directories or files.  Directories may contain other items.  Files are 
terminals and do not contain other items.  Directories may be empty.

Items are ordered in the archive, though the order of the items bears no 
necessary relationships to the hierarchical structure, e.g., there is no 
requirement that a "parent" item appear before a "child" item.

Items are identified by an IRI path, which conform to the "ipath-absolute" 
production in RFC 3987.

Items may have associated attributes.  Attributes defined by this standard 

Creation Date (ISO 8601)
Modified Date (ISO 8601)
Size (long integer)

Additional attributes, including implementation-defined attributes, are 
also permitted.

An archive is stored in an archive encoding, e.g., ZIP, GZIP, TAR, XML, 


We don't need a whole file system.  For example, we don't need to deal 
with locking, symbolic links, permissions or anything like that.

So stopping here, can any one think of any aspect of ODF, OOXML, EPUB 
packaging, or whatever that cannot be expressed in this model?

For example, one of the ODF requirements is that the mimetype file be the 
first in the ZIP and that it be uncompressed.  We can clearly express 
that.  Everything can just be specifying items via IRI path.

I'm putting compression aside, for a second, since I don't think that is 
an essential aspect of packaging.  It is however, an important aspect of 
particular encodings, where it would fit in as additional attributes, 

Compression Method (enum/string)
Original Size (long integer)

But compression per se does not really carry semantic value at the 
application/document level, at least not among formats like ODF, OOXML, 
EPUB, etc.  But a particular software application may be very interested 
in setting this attribute on a per Item basis, to optimize storage based 
on underlying content types, e.g., don't compress already compressed 
images, but do compress XML.

So this isn't rocket science, but if we had this logical archive model, as 
well as at least one encoding of it, in ZIP, then I think it would be 
possible to cleanly express what we need in document format uses.  And by 
using this separation of logical model from encoding, we also future-proof 
this technology and allow other approaches to encoding be used in the 
future, e.g., ones that are more streaming-friendly,


sc34wg1study mailing list
sc34wg1study at vse.cz

More information about the sc34wg1study mailing list