OPC: part names, part references, and ZIP item names
MURATA Makoto (FAMILY Given)
eb2m-mrt at asahi-net.or.jp
Fri Jun 17 07:43:29 CEST 2011
Dear colleagues,
In this mail, I am concerned about problems around part names, part
references, and ZIP item names. These problems have been reported
by DRs 09-0280, 09-0283, 09-0284, 09-0285, 09-0286, 09-0288, 09-0291,
and 09-0292.
Most of the problems are editorial.
- Horrible terms: "Unicode string", "Part IRI", "Part URI",
- Undefined term: "part reference"
- Annex A (Resolving Unicode Strings to Part Names) is detached
from 9.2 (Part Addressing)
But there are two serious technical problems. 10.1.3 should apply the
IRI-to-URI conversion, and Annex A should not apply it. This is because
non-ASCII part names were introduced in the BRM without revising
10.1.3 and Annex A.
Here is a list of required changes (sketch).
- Do not use the terms "Part IRI" and "Part URI". Always use "part
name" (or OPC "part name").
- In 9.1.1.1, define "part name" as follows:
A part name shall be an internationalized absolute path, as defined by
the non-terminal ipath-absolute in RFC 3987, such that the following
constraints are satisfied.
1) An isegment shall be non-empty.
2) An isegment shall not contain percent-encoded forward slash ("/"), or
backward slash ("\") characters.
3) An isegment shall not contain percent-encoded iunreserved characters.
Note: I deleted other items in 9.1.1.1, since I do not think that they are
enforced by existing implementations. But I may be mistaken.
Note: We might want to disallow the asterisk (“*”) and colon (“:”) part names.
See the last example in "10.2.5 ZIP Package Limitations"
- Delete 9.1.1.1.1 and 9.1.1.1.2
- Delete 9.1.1.2 (Part IRI and Part URI mapping)
- Define part name equivalence as follows:
Equivalence of two part names is determined by comparing them character-by-character:
pct-encoded and ALPHA characters as case-insensitive ASCII
ucschar characters as case-sensitive Unicode
- Delete 9.1.1.3.1 (Part IRI Equivalence ) and 9.1.1.3.2(Part URI Equivalence )
- Expand 9.1.1.4 by adding two restrictions:
Part names of two different parts within an OPC package shall not be equivalent.
Part names of two different parts within an OPC package should
not become equivalent after NFC or NFD normalization.
Note: This helps Apple.
- In 9.2 "Part Addressing", first define "part reference" as follows:
A part reference shall be a relative LEIRI reference.
Note: One could use adopt relative IRI references as the definition, but
some characters (in particular, IVS in UTS #37) will be disallowed.
- Move Annex A (except A.2 (Creating a URI from an IRI)) as part of 9.2,
but further change the terminology (s/Unicode string/relative LEIRI reference/g).
- Refer W3C LEIRI as the definition of the conversion from LERIs to IRIs
- Reformulate the itemized list in A.3 (see my mail "OPC Annex A").
- In 10.1.3 (Mapping Part Names to Physical Package Item Names) , apply
the IRI-to-URI conversion to part names before creating logical item names.
I once thought that one reason for the mess is the %HH hack for
representing non-ASCII characters as part of ZIP packages. I
do not think so any more. We can continue to use US-ASCII
ZIP item names or we can migrate to UTF-8 ZIP item names by
dropping the IRI-to-URI conversion from 10.1.3.
Cheers,
Makoto
More information about the sc34wg4
mailing list