Strings referencing to OPC parts (1)

Chris Rae Chris.Rae at microsoft.com
Mon Aug 23 23:55:06 CEST 2010


Hi Murata-san - this is a great list of questions and I think it'll help to frame discussion. Quick question: Do your "H-identifiers" ever appear in an IS 29500 file, or are they converted into Z-identifiers by an application before the file is created?

Chris

-----Original Message-----
From: MURATA Makoto (FAMILY Given) [mailto:eb2m-mrt at asahi-net.or.jp] 
Sent: 21 August 2010 06:07
To: e-SC34-WG4 at ecma-international.org
Subject: Strings referencing to OPC parts (1)

Dear colleagues,

I think that the terminology in Part 2 is entirely broken.  So, let's not use any term in Part 2 first and make sure we have the same understanding.

First, application programmers  and format designers use strings for identifying OPC parts.  These strings are for human users rather than protocols.  (The syntax of a programming language might force a programmer to introduce some escaping to such strings, though.) Let's call such strings "H-identifiers".

Second, eventually an OPC part is represented by one or more files in a ZIP archive.  Such files also have strings have names.  Let's call such strings "Z-identifiers".

Q0. Do H-identifiers begin with "/" always?

Q1. Can an H-identifier contain "//"?

Q2. Can an H-identifier end with "/"?

Q3. Can an H-identifier contain non-ASCII characters?

Q4. Can an H-identifier contain the space character?

Q5. Can an H-identifier contain "<" (U+003C), ">" (U+003E) and '"' (U+0022)?

Q6. Can an H-identifier contain unwise characters "\" (U+005C), "^" (U+005E), 
     "`" (U+0060), "{" (U+007B), "|" (U+007C) and "}" (U+007D)?

Q7. Can an H-identifier contain the controls (C0 controls, DEL and C1 controls, 
    U+0000 - U+001F U+007F - U+009F)?

Q8. Can an H-identifier contain the Bidi formatting characters 
    (U+200E, U+200F, U+202H-202E)?

Q9. Can an H-identifier contain the Specials (U+FFF0-FFFD)?

Q10. Can an H-identifier contain the Tags (U+E0000-E0FFF)?

Q11. Can an H-identifier contain the Non-characters (U+FDD0-FDEF, U+1FFFE-1FFFF, 
    U+2FFFE-2FFFF, U+3FFFE-3FFFF, U+4FFFE-4FFFF, U+5FFFE-5FFFF, U+6FFFE-6FFFF, 
    U+7FFFE-7FFFF, U+8FFFE-8FFFF, U+9FFFE-9FFFF, U+AFFFE-AFFFF, U+BFFFE-BFFFF, 
    U+CFFFE-CFFFF, U+DFFFE-DFFFF, U+EFFFE-EFFFF, U+FFFFE-FFFFF, U+10FFFE-10FFFF)?

Q12. Can an H-identifier contain the Surrogate code units (U+D800-U+DFFF)?

Q13. Which character is allowed as part of a Z-identifier?

My answer to Q3 thru 13 is:

   Q3: Y, Q4 thru 12: N

But I can live with 

   Q3: Y, Q4 thru 12: Y

I am against any other combinations.

My answers to Q0, Q1, Q2 are Y, Y, and Y, respectively.

Q13 should be clear from the ZIP specification referenced from Part 2, but it is not at clear.

That spec simply says:

>       file name: (Variable)
> 
>           The name of the file, with optional relative path.
>           The path stored should not contain a drive or
>           device letter, or a leading slash.  All slashes
>           should be forward slashes '/' as opposed to
>           backwards slashes '\' for compatibility with Amiga
>           and Unix file systems etc.  If input came from standard
>           input, there is no file name field.  If encrypting
>           the central directory and general purpose bit flag 13 is set 
>           indicating masking, the file name stored in the Local Header 
>           will not be the actual file name.  A masking value consisting 
>           of a unique hexadecimal value will be stored.  This value will 
>           be sequentially incremented for each file in the archive. See
>           the section on the Strong Encryption Specification for details 
>           on retrieving the encrypted file name. 

Some people thought this allows any encoding such as Shift_JIS.  The result is zero interoperability.  

I think that we have to create our own definition of Z-strings, which should cause no problems to every important implementation of the ZIP file format.  Part 2 appears to implicitly provide such a definition by specifying a conversion procedure in Appendix A.3.

Cheers,
Makoto



More information about the sc34wg4 mailing list