Logical archive model

robert_weir at us.ibm.com robert_weir at us.ibm.com
Wed Nov 3 02:34:47 CET 2010


"Dennis E. Hamilton" <dennis.hamilton at acm.org> wrote on 11/02/2010 
08:26:32 PM:

> RE: Logical archive model
> 
> This is fine at a logical level.  In fact, specifying an order would not
> make sense.  Sniffing for magic numbers is a physical-/storage-structure
> level provision, and a MIME TYPE is probably a property separate from 
the
> set of items themselves.
> 

Well, ODF does have an order dependency for the mimetype, so I wanted to 
include that in the model.  But we could certainly argue whether or not it 
should be there. It would be interesting to know if any other formats have 
a similar requirement.

I could also see some advantages to having ordering dependencies for 
streaming uses, e.g., put low resolution previews early and follow with 
the full document contents.

> I am not sure what you will do with directories but if you want to have
> them, you will stretch APP NOTE 6.2.0 as a carrier.
> 
> Also, for ipath-absolute, you need to rule out "." and ".." as 
isegment-nz
> and isegment values.
>

Right.
 
> Note, presumably, an ipath-absolute that ends with "/" (with an empty
> isegment) would appear to be the way to name a "directory."
>

Yes.

 
> [Side note: When mapping to Zip, which is not done here yet, the leading 
"/"
> has to disappear because Zip filenames may not begin with "/" (APP NOTE
> 6.2.0).  This also means that "/" by itself can't be the mapping of an
> item.]
> 

Good catch.

> I agree that we are talking about abstracted content so the file items 
would
> be the logical data streams and not be compressed, encrypted, (signed 
?), or
> anything else.  Digital signatures might be tricky here or might need to 
be
> at least one layer closer to physical.
>

There is an ordering dependency here, or at least if you want the best 
performance.  Compress before encryption.  But with ODF that occurs 
outside of the ZIP compression.  It is an application-level compression, 
right?  So no problem there.

It isn't clear to me whether there is an ordering dependency on signing 
compressed versus uncompressed.  Perhaps for performance reasons you want 
to verify a signature without requiring decompression, so signature best 
applied last? 

I think that any dependency the document application has on the 
post-compressed bytes of the archive items presents a challenge to the 
simplest form of this model.  I need to think about this some more.

-Rob
 
> -----Original Message-----
> From: sc34wg1study-bounces at vse.cz [mailto:sc34wg1study-bounces at vse.cz] 
On
> Behalf Of robert_weir at us.ibm.com
> Sent: Tuesday, November 02, 2010 15:18
> To: 'ISO Zip'
> Subject: Logical archive model
> 
> So what is needed, I think,  is something like this:
> 
> ---------------------------------------------------------
> 
> An archive is in a hierarchical structure containing items.  Items may 
be 
> directories or files.  Directories may contain other items.  Files are 
> terminals and do not contain other items.  Directories may be empty.
> 
> Items are ordered in the archive, though the order of the items bears no 

> necessary relationships to the hierarchical structure, e.g., there is no 

> requirement that a "parent" item appear before a "child" item.
> 
> Items are identified by an IRI path, which conform to the 
"ipath-absolute" 
> production in RFC 3987.
> 
> Items may have associated attributes.  Attributes defined by this 
standard 
> include:
> 
> Creation Date (ISO 8601)
> Modified Date (ISO 8601)
> Size (long integer)
> 
> Additional attributes, including implementation-defined attributes, are 
> also permitted.
> 
> An archive is stored in an archive encoding, e.g., ZIP, GZIP, TAR, XML, 
> etc..
> 
> ---------------------------------------------------------
> 
> We don't need a whole file system.  For example, we don't need to deal 
> with locking, symbolic links, permissions or anything like that.
> 
> So stopping here, can any one think of any aspect of ODF, OOXML, EPUB 
> packaging, or whatever that cannot be expressed in this model?
> 
> For example, one of the ODF requirements is that the mimetype file be 
the 
> first in the ZIP and that it be uncompressed.  We can clearly express 
> that.  Everything can just be specifying items via IRI path.
> 
> I'm putting compression aside, for a second, since I don't think that is 

> an essential aspect of packaging.  It is however, an important aspect of 

> particular encodings, where it would fit in as additional attributes, 
> e.g.:
> 
> Compression Method (enum/string)
> Original Size (long integer)
> 
> But compression per se does not really carry semantic value at the 
> application/document level, at least not among formats like ODF, OOXML, 
> EPUB, etc.  But a particular software application may be very interested 

> in setting this attribute on a per Item basis, to optimize storage based 

> on underlying content types, e.g., don't compress already compressed 
> images, but do compress XML.
> 
> So this isn't rocket science, but if we had this logical archive model, 
as 
> well as at least one encoding of it, in ZIP, then I think it would be 
> possible to cleanly express what we need in document format uses.  And 
by 
> using this separation of logical model from encoding, we also 
future-proof 
> this technology and allow other approaches to encoding be used in the 
> future, e.g., ones that are more streaming-friendly,
> 
> -Rob
> 
> _______________________________________________
> sc34wg1study mailing list
> sc34wg1study at vse.cz
> http://mailman.vse.cz/mailman/listinfo/sc34wg1study
> 



More information about the sc34wg1study mailing list