An outline proposal
Dave Pawson
dave.pawson at gmail.com
Tue Oct 19 17:25:59 CEST 2010
On 19 October 2010 14:19, Bob Jolliffe <bobjolliffe at gmail.com> wrote:
Re 'package'
> What is a package in the context we are discussing, other than a
> collection (ordered collection? its serialized) of related streams
> which together provide the means to represent fully a document?
This should not be constrained to a zip 'package' (whatever) which
constitutes a document Bob.
In a
> zip file (from the appnote) each stream also has a local file header
> which contains, amongst other things, a filename. These filenames
> allow the package to be also represented exploded as files in a
> filesystem which is useful. That's a useful quality of zip. But IMHO
> the streams in the zip file are not inherently files. But its useful
> to name them as if they were in order to have this dual
> representation.
Not understood Bob. Could you phrase that in terms a little more
general, or define fully the terms you are using?
>
>>
>>
>> Existing zip tools seem to behave differently in this
>>> respect. Are we simply interested in the lowest common denominator of
>>> the binary stream plus name? I know zip files are often composed
>>> from files in a filesystem, but they are also frequently generated
>>> directly from strems without touching the filesystem.
>>
>> What do others call such a collection... without calling it a file?
>> Pragmatically it is likely to end up as a file on disk?
>
> I really don't agree unless we are maybe misunderstanding each other.
Quite possible!
> The zip file is certainly most likely to end up on a disk and I don't
> really have any gripe about talking about zip as a file format. But
> the things which are stored inside the zip are not necessarily likely
> to either begin or end life as files on a disk.
Which may be out of scope, if we want to exclude applications
which use this format?
Unless we are simply
> referring to the conventional, typical, general purpose use of zip.
AFAIK we have no guidance to do otherwise so that was
the scope I assumed.
> But if we are then the appnote maybe suffices. Perhaps my
> misunderstanding is that I'm seeing the scope, at least partly, as
> defining a minimal zip profile (don't jump - I know we should define
> profile :-) which is sufficient for the packaging of documents.
Until I see some concretion of 'the appnote' or other definitions,
I can't comment on this Bob.
>>>>> 2. The packaged entity shall hold one or more file.
>>>>>
>>>>
>>>> OK. This needs further specification: A file has a name, metadata (date,
>>>> permissions, archive bit?) as well as contents. Detailed requirements?
>>>
>>> yes. Minor point, but I would have thought the packaged entities are
>>> the things which are packaged within the package entity. And its the
>>> nature of these "packaged entities" - files, streams or what have you
>>> - which we should detail.
>>
>> 1. Are we into metadata? Is this going too far into implementation?
>> Again, I'd prefer to stay above a file system and attributes?
>> I'm -1 here (if we can do without it). That's the implementation layer,
>> rather than the specification *(what)* level.
>
> I think we agree.
OK, unless we get objection, lets take 'file formats/properties' as
being in the application layer above our work.
Re the storage of a listing/manifest.
>> Why first Bob? So long as it is available? Simply for speed of access?
>
> Partly. But its also about not having to unzip to a tmp folder or even
> persist the zip in order for a consumer to decide what to do next -
> because the contents may be very large.
I think you're getting very implementation oriented here?
Can I declare that out of scope?
I'll give an example from a
> piece of code I'm working on right now. A web application imports
> files in various formats. Some of those are zipped formats (eg. xslsx
> and soon odf calc but also some others). And as I say, some of these
> 'files' can be very large. The first challenge is to find out what
> kind of file format I'm dealing with so I know whether I can parse it
> or not - the first few bytes of the binary stream indicate its a zip.
> With binary format files one would typically discover the file 'type'
> by looking at these signature bytes up front. With a zipped format we
> only know at this point that we are dealing with a zipped format.
> From my perspective that's quite a big disadvantage. Of course once
> I've seen the manifest or .rels I know what I'm dealing with. Hence
> I'd like to recommend that producers put these up front where
> possible.
I'm proposing that a manifest be included within the zip, so the procedure
would be to
1. Extract the manifest, examine it and proceed. Would that suffice?
>
> It's only a recommendation. But if non-naive zip producers used it,
> it would give back to consumers some of what is lost by moving from
> binary representations to random collections of zipped xml documents.
I'd hope the manifest answers your requirement, if not let us know.
>> Define profiling please?
>
> I doubt I'm the best person to do this. But my interpretation of
> profiling in this context would be an additional set of constraints on
> top of the general zip structure as described in the appnote
Define 'the appnote'?
which
> provide a minimal set of characteristics to meet our requirements.
> Which I know begs the next question .. what are these requirements?
> If they are in fact as general as your point 1 above then you can
> discount most of what I have said. But my understanding of sc34
> interest is that it is primarily interested in describing the features
> of zip required to package documents. And I do see this as a subtle
> re-purposing of the generic zip requirement to archive stuff from the
> filesystem.
I'll add that definition to rev 3, though it's currently not in use?
If it's 'to meet our requirements' it should be included, not referenced surely?
regards
--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk
More information about the sc34wg1study
mailing list