An outline proposal
dave.pawson at gmail.com
Tue Oct 19 13:43:56 CEST 2010
On 19 October 2010 11:38, Bob Jolliffe <bobjolliffe at gmail.com> wrote:
>>> 1. Provide a compressed archive format for general use.
> I think we probably need to define what we mean by 'archive' format -
> my understanding being:
> Provide a compressed format for representing a collection of files (or
> streams) within a single stream. I am in two minds about the use of
> the term 'file' as it has some platform connotations, for examples
> expectations around preservation of permissions and other file
How abstract do we want to be?
IMHO we should keep out of the file system arena?
Leave it at 'collection' is a bit woolly?
Existing zip tools seem to behave differently in this
> respect. Are we simply interested in the lowest common denominator of
> the binary stream plus name? I know zip files are often composed
> from files in a filesystem, but they are also frequently generated
> directly from strems without touching the filesystem.
What do others call such a collection... without calling it a file?
Pragmatically it is likely to end up as a file on disk?
>>> 1.1. A compression algorithm shall be provided which is usable without
>>> infringing any existent patent.
>> The goal should not be that the algorithm is free of IPR but that the
>> users of the standard may practice the algorithm without payment of
>> royalties. For example, the owner of the patent could pledge not to
>> assert their patents for implementors of the standard. Royalty-free and
>> IPR-free are not the same thing, though they are often confused.
> Agreed. But I wouldn't like to throw away the desirability of using
> algorithms which are patent-free. Pledges, covenants, promises etc
> are all a bit second-best. Perhaps the statement 1.2 below (modified
> slightly) is sufficient
How about 'unencumbered'?
I'd be happy with the later one inserted here. I think the aim is
clear, no problem with wordsmithing it.
>>> 2. The packaged entity shall hold one or more file.
>> OK. This needs further specification: A file has a name, metadata (date,
>> permissions, archive bit?) as well as contents. Detailed requirements?
> yes. Minor point, but I would have thought the packaged entities are
> the things which are packaged within the package entity. And its the
> nature of these "packaged entities" - files, streams or what have you
> - which we should detail.
1. Are we into metadata? Is this going too far into implementation?
Again, I'd prefer to stay above a file system and attributes?
I'm -1 here (if we can do without it). That's the implementation layer,
rather than the specification *(what)* level.
>>> 2.2 Any file hierarchy present when the package is created shall be
>>> duplicated on extraction if requested.
>> So this leads to the requirement that you can store a file hierarchy.
> Again I would not straight away assume we are talking of a file
> hierarchy. The contents of the package may have started off as files.
> And may even be extracted to files. But is this necessarily so? I
> would prefer to think of the zip as simply a container for streams.
Stealing your earlier words, does entity/entities work here?
I read 'streams' and think of the java file / stream hieararchy?
Agreed this may all be happening in memory (and then optionally
written to disk), but I'm short on words that reflect this
>>> 2.3 The package shall hold any combintation of binary and/or text
>> Not sure I agree that text files must be distinguishable from binary
>> files. Once you have text files you end up dealing with DOS/Unix CRLF
>> conversions. Better to just store the file as-is, directly, at which
>> point there is no difference between text and binary files.
I'm saying absract them from the archive, not process them.
Thats in the application layer operating on these things.
Anyone see a problem omitting this differentiation? If not
then I'm OK to remove it.
>>> 2.3.1 There shall be no difference between a file prior to being
>>> archived and the corresponding file when extracted from the archive.
>> OK. And per above, if you are not changing a text file on different OS's
>> then there is no difference between text and binary.
Does this address the binary/text differentiation sufficiently?
>>> 2.3.2 No change shall be made to any character encoding by compressing
>>> and decompressing a file. I.e. an input file after decompression must
>>> match its character encoding prior to compression.
>> Again, then why distinguish text from binary?
>>> 3. A means of verification of an archive shall be provided.
>> Not sure what is intended here. Do you mean you want a specification of a
>> verification procedure? (A validator?)Or that a conforming
>> ZIP-consumer/ZIP-producer must include a means of verification?
*I* meant, tell me what files are contained?
>zip -l archive.zip gives me the file sizes and names. I think that's
all thats needed, thought a full verification, i.e. re-compress and compare...
something (mdsum?) with archived content is perhaps the 200% check.
Is this needed? If it fails then the earlier 'no change' fails, so IMHO
a simple 'tell me what's inside' is sufficient.
Anyone care to wordsmith that?
>>> 4. A means of listing the contents of an archive without extraction
>>> shall be provided.
>> Again, not clear who is providing this, the specification or a
We are requiring it of an implementation which declares itself
compliant to our requirement.
> This is a really interesting issue. Currently in zip appnote we have
> a central directory record (which is supposed to, but does not
> necessarily, reflect the collection of entries). A problem with this
> is that it appears at the end of the zip. This can be quite
> inconvenient when consuming a large incoming zip stream. One of the
> first requirements for a consumer is typically to determine exactly
> what kind of package it is dealing with. Many formats (including odf,
> jar etc) also have a requirement for some form of manifest. OPC has
> .rels which addresses the same problem slightly differently. It would
> certainly be desirable to have this "listing" (I know its more than
> listing) as the first entity in the packaged collection.
Why first Bob? So long as it is available? Simply for speed of access?
> appnote doesn't say anything about ordering - most probably because
> the original rationale for archiving was quite different to our
> rationale for packaging.
Or it's a how, not a what?
> Final thought - given that most of the formats we refer to make use of
> some form of manifest, how important is it to concern ourselves too
> much with the central directory at all?
Define central directory please? Just a list of items contained in the archive?
I required that, but as an ordinary file (thingy) within the archive.
Sure to be compatible with
> existing general purpose zip implementations, it must be there. But
> when profiling a package specification on top of zip (and possibly
> even other low level mechanisms like tar) it might be more important
> that we focus on the manifest.
Define profiling please?
XSLT XSL-FO FAQ.
More information about the sc34wg1study