An outline proposal

Dave Pawson dave.pawson at
Tue Oct 19 17:25:59 CEST 2010

On 19 October 2010 14:19, Bob Jolliffe <bobjolliffe at> wrote:

Re 'package'

> What is a package in the context we are discussing, other than a
> collection (ordered collection? its serialized) of related streams
> which together provide the means to represent fully a document?

This should not be constrained to a zip 'package' (whatever) which
constitutes a document Bob.

 In a
> zip file (from the appnote) each stream also has a local file header
> which contains, amongst other things, a filename.  These filenames
> allow the package to be also represented exploded as files in a
> filesystem which is useful.  That's a useful quality of zip.  But IMHO
> the streams in the zip file are not inherently files.  But its useful
> to name them as if they were in order to have this dual
> representation.

Not understood Bob. Could you phrase that in terms a little more
general, or define fully the terms you are using?

>>   Existing zip tools seem to behave differently in this
>>> respect.  Are we simply interested in the lowest common denominator of
>>> the binary stream plus name?   I know zip files are often composed
>>> from files in a filesystem, but they are also frequently generated
>>> directly from strems without touching the filesystem.
>> What do others call such a collection... without calling it a file?
>> Pragmatically it is likely to end up as a file on disk?
> I really don't agree unless we are maybe misunderstanding each other.

Quite possible!

> The zip file is certainly most likely to end up on a disk and I don't
> really have any gripe about talking about zip as a file format.  But
> the things which are stored inside the zip are not necessarily likely
> to either begin or end life as files on a disk.

Which may be out of scope, if we want to exclude applications
which use this format?

  Unless we are simply
> referring to the conventional, typical, general purpose use of zip.

AFAIK we have no guidance to do otherwise so that was
the scope I assumed.

> But if we are then the appnote maybe suffices.  Perhaps my
> misunderstanding is that I'm seeing the scope, at least partly, as
> defining a minimal zip profile (don't jump - I know we should define
> profile :-) which is sufficient for the packaging of documents.

Until I see some concretion of 'the appnote' or other definitions,
I can't comment on this Bob.

>>>>> 2. The packaged entity shall hold one or more file.
>>>> OK.  This needs further specification:  A file has a name, metadata (date,
>>>> permissions, archive bit?) as well as contents.  Detailed requirements?
>>> yes.  Minor point, but I would have thought the packaged entities are
>>> the things which are packaged within the package entity.  And its the
>>> nature of these "packaged entities" - files, streams or what have you
>>> - which we should detail.
>> 1. Are we into metadata? Is this going too far into implementation?
>> Again, I'd prefer to stay above a file system and attributes?
>> I'm -1 here (if we can do without it). That's the implementation layer,
>> rather than the specification *(what)* level.
> I think we agree.

OK, unless we get objection, lets take 'file formats/properties' as
being in the application layer above our work.

Re the storage of  a listing/manifest.

>> Why first Bob? So long as it is available? Simply for speed of access?
> Partly. But its also about not having to unzip to a tmp folder or even
> persist the zip in order for a consumer to decide what to do next -
> because the contents may be very large.

I think you're getting very implementation oriented here?
Can I declare that out of scope?

  I'll give an example from a
> piece of code I'm working on right now.  A web application imports
> files in various formats.  Some of those are zipped formats (eg. xslsx
> and soon odf calc but also some others).  And as I say, some of these
> 'files' can be very large.  The first challenge is to find out what
> kind of file format I'm dealing with so I know whether I can parse it
> or not - the first few bytes of the binary stream indicate its a zip.
> With binary format files one would typically discover the file 'type'
> by looking at these signature bytes up front.  With a zipped format we
> only know at this point that we are dealing with a zipped format.
> From my perspective that's quite a big disadvantage.  Of course once
> I've seen the manifest or .rels I know what I'm dealing with.  Hence
> I'd like to recommend that producers put these up front where
> possible.

I'm proposing that a manifest be included within the zip, so the procedure
would be to
1. Extract the manifest, examine it and proceed. Would that suffice?

> It's only a recommendation.  But if non-naive zip producers used it,
> it would give back to consumers some of what is lost by moving from
> binary representations to random collections of zipped xml documents.

I'd hope the manifest answers your requirement, if not let us know.

>> Define profiling please?
> I doubt I'm the best person to do this.  But my interpretation of
> profiling in this context would be an additional set of constraints on
> top of the general zip structure as described in the appnote

Define 'the appnote'?

> provide a minimal set of characteristics to meet our requirements.
> Which I know begs the next question .. what are these requirements?
> If they are in fact as general as your point 1 above then you can
> discount most of what I have said.  But my understanding of sc34
> interest is that it is primarily interested in describing the features
> of zip required to package documents.  And I do see this as a subtle
> re-purposing of the generic zip requirement to archive stuff from the
> filesystem.

I'll add that definition to rev 3, though it's currently not in use?

If it's 'to meet our requirements' it should be included, not referenced surely?


Dave Pawson
Docbook FAQ.

More information about the sc34wg1study mailing list