UTF-8 in ZIP

Dennis E. Hamilton dennis.hamilton at acm.org
Tue Nov 2 17:21:14 CET 2010


The main difficulty is that the default situation in Zip is single-byte
encoding and a presumed single-byte code page in the filename entry.  This
clashes with use of UTF-8 for any Unicode code points that do not map to
7-bit ASCII (bit 8 = 0), where the UTF-8 is essentially single-byte ASCII. 

There is an Appendix about this in versions of the App Note more recent than
6.2.0.

Of course, if we introduced %-encoding of other UTF-8 sequences (say, using
the IRI collapse to URI mapping), it would fit that practice and we would be
within the sweet spot that Zip has traditionally supported cross-platform.

 - Dennis

-----Original Message-----
From: sc34wg1study-bounces at vse.cz [mailto:sc34wg1study-bounces at vse.cz] On
Behalf Of MURATA Makoto (FAMILY Given)
Sent: Tuesday, November 02, 2010 07:39
To: ISO Zip
Subject: UTF-8 in ZIP

Dear colleagues,

I am wondering if I have to pay license fee to PKWARE if 
I use UTF-8 file names in a ZIP file.  Note that EPUB and 
Widet packaging of W3C use UTF-8 file names in ZIP.  (I plan 
to submit an issue about this topic to the IDPF EPUB WG.)


Regards,

MURATA Makoto (FAMILY Given)

_______________________________________________
sc34wg1study mailing list
sc34wg1study at vse.cz
http://mailman.vse.cz/mailman/listinfo/sc34wg1study



More information about the sc34wg1study mailing list