Strings referencing to OPC parts (1)
MURATA Makoto (FAMILY Given)
eb2m-mrt at asahi-net.or.jp
Sat Aug 21 15:07:23 CEST 2010
Dear colleagues,
I think that the terminology in Part 2 is entirely broken. So, let's
not use any term in Part 2 first and make sure we have the same
understanding.
First, application programmers and format designers use strings for
identifying OPC parts. These strings are for human users rather than
protocols. (The syntax of a programming language might force a
programmer to introduce some escaping to such strings, though.)
Let's call such strings "H-identifiers".
Second, eventually an OPC part is represented by one or more files in
a ZIP archive. Such files also have strings have names. Let's call
such strings "Z-identifiers".
Q0. Do H-identifiers begin with "/" always?
Q1. Can an H-identifier contain "//"?
Q2. Can an H-identifier end with "/"?
Q3. Can an H-identifier contain non-ASCII characters?
Q4. Can an H-identifier contain the space character?
Q5. Can an H-identifier contain "<" (U+003C), ">" (U+003E) and '"' (U+0022)?
Q6. Can an H-identifier contain unwise characters "\" (U+005C), "^" (U+005E),
"`" (U+0060), "{" (U+007B), "|" (U+007C) and "}" (U+007D)?
Q7. Can an H-identifier contain the controls (C0 controls, DEL and C1 controls,
U+0000 - U+001F U+007F - U+009F)?
Q8. Can an H-identifier contain the Bidi formatting characters
(U+200E, U+200F, U+202H-202E)?
Q9. Can an H-identifier contain the Specials (U+FFF0-FFFD)?
Q10. Can an H-identifier contain the Tags (U+E0000-E0FFF)?
Q11. Can an H-identifier contain the Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF,
U+2FFFE-2FFFF, U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF,
U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF, U+BFFFE-BFFFF,
U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF, U+FFFFE-FFFFF, U+10FFFE-10FFFF)?
Q12. Can an H-identifier contain the Surrogate code units (U+D800-U+DFFF)?
Q13. Which character is allowed as part of a Z-identifier?
My answer to Q3 thru 13 is:
Q3: Y, Q4 thru 12: N
But I can live with
Q3: Y, Q4 thru 12: Y
I am against any other combinations.
My answers to Q0, Q1, Q2 are Y, Y, and Y, respectively.
Q13 should be clear from the ZIP specification referenced from Part 2,
but it is not at clear.
That spec simply says:
> file name: (Variable)
>
> The name of the file, with optional relative path.
> The path stored should not contain a drive or
> device letter, or a leading slash. All slashes
> should be forward slashes '/' as opposed to
> backwards slashes '\' for compatibility with Amiga
> and Unix file systems etc. If input came from standard
> input, there is no file name field. If encrypting
> the central directory and general purpose bit flag 13 is set
> indicating masking, the file name stored in the Local Header
> will not be the actual file name. A masking value consisting
> of a unique hexadecimal value will be stored. This value will
> be sequentially incremented for each file in the archive. See
> the section on the Strong Encryption Specification for details
> on retrieving the encrypted file name.
Some people thought this allows any encoding such as Shift_JIS. The
result is zero interoperability.
I think that we have to create our own definition of Z-strings, which
should cause no problems to every important implementation of the ZIP
file format. Part 2 appears to implicitly provide such a definition
by specifying a conversion procedure in Appendix A.3.
Cheers,
Makoto
More information about the sc34wg4
mailing list