OPC: part names, part references, and ZIP item names

MURATA Makoto (FAMILY Given) eb2m-mrt at asahi-net.or.jp
Fri Jun 17 07:43:29 CEST 2011


Dear colleagues,

In this mail, I am concerned about problems around part names, part 
references, and ZIP item names.  These problems have been reported
by DRs 09-0280, 09-0283, 09-0284, 09-0285, 09-0286, 09-0288, 09-0291,
and 09-0292.

Most of the problems are editorial.

  - Horrible terms: "Unicode string", "Part IRI", "Part URI",

  - Undefined term: "part reference"

  - Annex A (Resolving Unicode Strings to Part Names) is detached 
    from 9.2 (Part Addressing)

But there are two serious technical problems.  10.1.3 should apply the
IRI-to-URI conversion, and Annex A should not apply it.  This is because 
non-ASCII part names were introduced in the BRM without revising 
10.1.3 and Annex A.

Here is a list of required changes (sketch).

- Do not use the terms "Part IRI" and "Part URI".  Always use "part
   name" (or OPC "part name").

- In 9.1.1.1, define "part name" as follows:

      A part name shall be an internationalized absolute path, as defined by
      the non-terminal ipath-absolute in RFC 3987, such that the following
      constraints are satisfied.

      1) An isegment shall be non-empty.
      2) An isegment shall not contain percent-encoded forward slash ("/"), or 
          backward slash ("\") characters.
      3) An isegment shall not contain percent-encoded iunreserved characters.

      Note:  I deleted other items in 9.1.1.1, since I do not think that they are 
      enforced by existing implementations.  But I may be mistaken.

      Note: We might want to disallow the asterisk (“*”) and colon (“:”)  part names. 
      See the last example in "10.2.5 ZIP Package Limitations"

- Delete 9.1.1.1.1 and 9.1.1.1.2

- Delete 9.1.1.2 (Part IRI and Part URI mapping)

- Define part name equivalence as follows:

   Equivalence of two part names is determined by comparing them character-by-character:
	pct-encoded and ALPHA characters as case-insensitive ASCII 
	ucschar characters as case-sensitive Unicode

- Delete 9.1.1.3.1 (Part IRI Equivalence ) and 9.1.1.3.2(Part URI Equivalence )

- Expand 9.1.1.4 by adding two restrictions:

      Part names of two different parts within an OPC package shall not be equivalent.

      Part names of two different parts within an OPC package should
      not become equivalent after NFC or NFD normalization.

      Note: This helps Apple.

- In 9.2 "Part Addressing", first define "part reference" as follows:

    A part reference shall be a relative LEIRI reference.

    Note: One could use adopt relative IRI references as the definition, but 
    some characters (in particular, IVS in UTS #37) will be disallowed.

- Move Annex A (except A.2 (Creating a URI from an IRI)) as part of 9.2, 
  but further change the terminology (s/Unicode string/relative LEIRI reference/g).

- Refer W3C LEIRI as the definition of the conversion from LERIs to IRIs

- Reformulate the itemized list in A.3 (see my mail "OPC Annex A").

- In 10.1.3 (Mapping Part Names to Physical Package Item Names) ,  apply
  the IRI-to-URI conversion to part names before creating logical item names.



I once thought that one reason for the mess is the %HH hack for
representing non-ASCII characters as part of ZIP packages.  I 
do not think so any more.  We can continue to use US-ASCII 
ZIP item names or we can migrate to UTF-8 ZIP item names by 
dropping the IRI-to-URI conversion from 10.1.3.

Cheers,
Makoto


More information about the sc34wg4 mailing list