An outline proposal

Dave Pawson dave.pawson at gmail.com
Tue Oct 19 17:25:59 CEST 2010


On 19 October 2010 14:19, Bob Jolliffe <bobjolliffe at gmail.com> wrote:

Re 'package'

> What is a package in the context we are discussing, other than a
> collection (ordered collection? its serialized) of related streams
> which together provide the means to represent fully a document?

This should not be constrained to a zip 'package' (whatever) which
constitutes a document Bob.


 In a
> zip file (from the appnote) each stream also has a local file header
> which contains, amongst other things, a filename.  These filenames
> allow the package to be also represented exploded as files in a
> filesystem which is useful.  That's a useful quality of zip.  But IMHO
> the streams in the zip file are not inherently files.  But its useful
> to name them as if they were in order to have this dual
> representation.

Not understood Bob. Could you phrase that in terms a little more
general, or define fully the terms you are using?



>
>>
>>
>>   Existing zip tools seem to behave differently in this
>>> respect.  Are we simply interested in the lowest common denominator of
>>> the binary stream plus name?   I know zip files are often composed
>>> from files in a filesystem, but they are also frequently generated
>>> directly from strems without touching the filesystem.
>>
>> What do others call such a collection... without calling it a file?
>> Pragmatically it is likely to end up as a file on disk?
>
> I really don't agree unless we are maybe misunderstanding each other.

Quite possible!

> The zip file is certainly most likely to end up on a disk and I don't
> really have any gripe about talking about zip as a file format.  But
> the things which are stored inside the zip are not necessarily likely
> to either begin or end life as files on a disk.

Which may be out of scope, if we want to exclude applications
which use this format?

  Unless we are simply
> referring to the conventional, typical, general purpose use of zip.

AFAIK we have no guidance to do otherwise so that was
the scope I assumed.


> But if we are then the appnote maybe suffices.  Perhaps my
> misunderstanding is that I'm seeing the scope, at least partly, as
> defining a minimal zip profile (don't jump - I know we should define
> profile :-) which is sufficient for the packaging of documents.

Until I see some concretion of 'the appnote' or other definitions,
I can't comment on this Bob.




>>>>> 2. The packaged entity shall hold one or more file.
>>>>>
>>>>
>>>> OK.  This needs further specification:  A file has a name, metadata (date,
>>>> permissions, archive bit?) as well as contents.  Detailed requirements?
>>>
>>> yes.  Minor point, but I would have thought the packaged entities are
>>> the things which are packaged within the package entity.  And its the
>>> nature of these "packaged entities" - files, streams or what have you
>>> - which we should detail.
>>
>> 1. Are we into metadata? Is this going too far into implementation?
>> Again, I'd prefer to stay above a file system and attributes?
>> I'm -1 here (if we can do without it). That's the implementation layer,
>> rather than the specification *(what)* level.
>
> I think we agree.

OK, unless we get objection, lets take 'file formats/properties' as
being in the application layer above our work.





Re the storage of  a listing/manifest.

>> Why first Bob? So long as it is available? Simply for speed of access?
>
> Partly. But its also about not having to unzip to a tmp folder or even
> persist the zip in order for a consumer to decide what to do next -
> because the contents may be very large.

I think you're getting very implementation oriented here?
Can I declare that out of scope?

  I'll give an example from a
> piece of code I'm working on right now.  A web application imports
> files in various formats.  Some of those are zipped formats (eg. xslsx
> and soon odf calc but also some others).  And as I say, some of these
> 'files' can be very large.  The first challenge is to find out what
> kind of file format I'm dealing with so I know whether I can parse it
> or not - the first few bytes of the binary stream indicate its a zip.
> With binary format files one would typically discover the file 'type'
> by looking at these signature bytes up front.  With a zipped format we
> only know at this point that we are dealing with a zipped format.
> From my perspective that's quite a big disadvantage.  Of course once
> I've seen the manifest or .rels I know what I'm dealing with.  Hence
> I'd like to recommend that producers put these up front where
> possible.

I'm proposing that a manifest be included within the zip, so the procedure
would be to
1. Extract the manifest, examine it and proceed. Would that suffice?



>
> It's only a recommendation.  But if non-naive zip producers used it,
> it would give back to consumers some of what is lost by moving from
> binary representations to random collections of zipped xml documents.

I'd hope the manifest answers your requirement, if not let us know.




>> Define profiling please?
>
> I doubt I'm the best person to do this.  But my interpretation of
> profiling in this context would be an additional set of constraints on
> top of the general zip structure as described in the appnote

Define 'the appnote'?

 which
> provide a minimal set of characteristics to meet our requirements.
> Which I know begs the next question .. what are these requirements?
> If they are in fact as general as your point 1 above then you can
> discount most of what I have said.  But my understanding of sc34
> interest is that it is primarily interested in describing the features
> of zip required to package documents.  And I do see this as a subtle
> re-purposing of the generic zip requirement to archive stuff from the
> filesystem.

I'll add that definition to rev 3, though it's currently not in use?

If it's 'to meet our requirements' it should be included, not referenced surely?

regards







-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk


More information about the sc34wg1study mailing list