"Expert Contribution on OOXML Versioning"

Arms, Caroline caar at loc.gov
Mon Apr 19 15:02:25 CEST 2010


Alex,

Representing potential users of transformers and validators in order to keep documents accessible into the future, I will say that archival institutions (libraries and other archives who are trying to preserve content after the active stages of its lifecycle) are interested in the developers of such tools having the information they need to build reliable tools.  I see a fair amount of discussion on lists involving people trying to build tools for inspecting collections of documents being ingested into library collections.   A common problem that results in inefficiency is -- this document doesn't validate, but as far as I can see, the reason for it being invalid is not really significant.  As I see it, this problem is likely to be exacerbated if the validators and transformers (thinking primarily of migrating forward to later versions) are not robust.  

So this may not constitute another use case for distinguishing versions, but rather a larger customer base for the same use case.

   Caroline

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource
http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **
________________________________________
From: Alex Brown [alexb at griffinbrown.co.uk]
Sent: Monday, April 19, 2010 5:13 AM
To: MURATA Makoto (FAMILY Given); Norbert Bollow; Arms, Caroline
Cc: e-SC34-WG4 at ecma-international.org
Subject: RE: "Expert Contribution on OOXML Versioning"

Dear Norbert, all,

Taking a step back from the specific question of a versioning attribute, a more general question for me is: is there a use case for conveying more versioning information in an OOXML document?

Currently there are two versions of OOXML:

1. Ecma 376 1st ed
2. ISO/IEC 29500:2008 (== Ecma 376 2nd Ed)

The second version has two variants (Strict and Transitional).

These seem likely in the next few months to be joined by a third version (Amd. 1), which also has Strict and Transitional variants.

As somebody writing validation software I'm *very* interested in knowing exactly what versions of a spec a document targets. But I'm not sure how typical that interest is!

For the Transitional variant, the WG seems to be working to make it correspond to "the existing corpus of legacy documents" (TECOLD). Since versioning was never part of this corpus I'm not sure about introducing versioning into this family now, especially when it is intended to be a deprecated format coming up for retirement. On the other hand we have the major implementation (MS Office) targeting ISO/IEC 29500:2008 Transitional. If (say) a later service-packed version of MS Office targets Amd. 1 how can we know its emitted files are now targeting this newer spec? -- there are substantive differences between these versions of the T variant. I fear the horse has already bolted in this instance, and we shall just have to rely on vendors being very clear about precisely which version of T they target, when they use that format.

Strict is a different matter. If the amendment passes it seems we will be committed to a route where (rare) updates to the core Strict schemas will be accompanied by a Standard-wide namespace change.

I think the question of extensions is different to the question of versioning, and I am certainly in favour of coming up with mechanisms to convey to consumers precisely what they are getting in a document - perhaps building on the approaches Mohamed outlined in Stockholm.

So I'm wondering - what community of interests would we be serving by introducing more precise versioning information? (other than people developing validators!)

- Alex.

> -----Original Message-----
> From: Norbert Bollow [mailto:nb at bollow.ch]
> Sent: 18 April 2010 17:12
> To: MURATA Makoto (FAMILY Given)
> Cc: e-SC34-WG4 at ecma-international.org
> Subject: Re: "Expert Contribution on OOXML Versioning"
>
> MURATA Makoto <eb2m-mrt at asahi-net.or.jp> wrote:
>
> > I read your document "Expert Contribution on OOXML Versioning".
>
> Thank you! I should perhaps explain that it was not my idea to write this little
> paper, or to call it "expert contribution", but it was a request from out of the
> Swiss mirror committee to become more active in working towards a
> resolution of this issue.
>
> > In the past, some WG4 participants have expessed concerns about the
> > interactions or conflicts betweeen the versioning attribute and MCE.
> > But your document does not  mention MCE.  It is still not clear to me
> > whether or not the versioning attribute works for your three use cases
> > when MCE-based extensions are present.
>
> Even a simple version attribute would even in the presence of MCE-based
> extensions suffice to address the complexity that comes from the existence
> of different versions of the OOXML specification. It would however obviously
> not help with complexity related to MCE extensions, which in some cases
> might also justify use of separate import filters etc.
>
> My current preference would be for a version attribute which encodes the
> ISO/IEC publication year of the base version of the standard together with a
> single integer number indicating a "patch level" in terms of amendmends and
> corrigenda. Obviously it would need to be standardized which integer
> number value corresponds to which set of amendmends and corrigenda.
>
> For example, the version attribute could look like this:
>
>   iso-iec:version="2008.3"
>
> and this could be defined to mean something like
>
>   "ISO/IEC 29500-1:2008+COR1+AMD1+COR2; ISO/IEC 29500-2:2008+COR1;
>    ISO/IEC 29500-3:2008+COR1; ISO/IEC 29500-4:2008+COR1+AMD1+COR2"
>
> The namespace prefix iso-iec would be declared as an ignorable namespace
> in order to ensure that conformant implementations of
> Ecma-376 ed.1 will certainly be able to tolerate its presence.
>
> I think that the best place to define values for the versioning attribute would
> be in a new, very short, ISO/IEC 29500-0, which would contain informative
> text giving an overview of ISO/IEC 29500 and normative text defining version
> attribute values. I believe that ISO/IEC has a way in which it could be specially
> designated to allow relatively easy frequent updating (i.e. whenever new
> AMDs and CORs are created.)
>
> The potential presence of MCE does not in any way conflict with this, and it
> does not conflict with validation against the appropriate schema indicated by
> this version attribute, provided that the validation process is carried out as
> indicated in ISO/IEC 29500-3 by ignoring elements and attributes from
> ignorable namespaces.
>
> The above proposal however obviously does not help with versioning and
> validation of MCE extensions, which I agree is wothwhile goal in its own right,
> but it's clearly too complex to allow it to be addressed by a simple version
> attribute. If we want such extensions to be carefully versioned, that would
> need a significantly more complex mechanism, that somehow addresses also
> the case of extension namespace elements and attributes being preserved
> by an application which does not understand them. At the current stage of
> the evolution of OOXML I doubt that we have enough experience with this
> kind of use-cases that it would make sense to standardize this at the current
> stage.
>
> > At this stage, I do not think that WG4 can reach some consensus even
> > if I add this topic to the agenda for our next phone conference.
>
> On the other hand, I think from the comments of various national bodies
> (not just Switzerland) it is also clear that the current situation witout proper
> versioning information also does not reflect any kind of consensus of ISO/IEC
> national bodies.
>
> > Are you willing to study the interactions and possible conflicts
> > between the versioning attribute and MCE?
>
> Yes, absolutely, provided someone puts me in touch with real-life use-cases
> of MCE.
>
> Greetings,
> Norbert
>
> --
> Representative in matters of international standardization of the Swiss Open
> Systems User Group /ch/open - http://ch-open.ch
>
> Owner/CEO, Adaptux GmbH - http://adaptux.com Coaching and Consulting
> in all areas of informatics management including Goal-Setting, Strategy
> Development, Procurement, Day-To-Day Operations, Software Asset
> Management, Risk Management, Benefit Orientation Management.
>
> __________________________________________________________
> ____________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> __________________________________________________________
> ____________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________


More information about the sc34wg4 mailing list