Some more thoughts on OPC metadata

Arms, Caroline caar at
Fri Mar 30 16:17:19 CEST 2012


In the last call, Murata-san asked me a question relating to metadata for OPC.  In the interim, he and I had a further exchange on the topic by e-mail.  Since OPC may be discussed again before I am back, it seemed worth getting the substance of the interchange into the e-mail archive.


>>>>  substance of exchange >>>>

Thank you.  Last year, I attended a meeting of a government
metadata project lead by Prof. Sugimoto.   I insisted that
we need a generic solution to the Japanese phonetics
issue.  Prof. Miyazawa was also there. Consensus in this
group was that we need a multi-lingual solution to this issue
and Japanese phonetics is just a special case.  In other
words, we need multiple elements for a single author and
each element are in different languages or scripts.  So far
so good.  But OPC, Atom, ODF, and ONIX do not allow
such multiple elements for a single author.

Incidentally, PRISM people (metadata for magazine articles)
also contacted me yesterday.  It appears that they do
allow multiple dc:title elements (with @xml:lang) for a
single article.  But they are now inclined to introduce an
additional attribute.  I am trying to persuade them.


My reponse:

I agree that a generic solution to multilingual metadata would be a useful direction.  Whether OPC is the right vehicle for moving in that direction is a different question.  I don't have a strong view on that.

As an example where your problem is addressed:   see the section on "Link for transliterations and translations of the same"

To add to your list of schemas that do not do what you want:   I did notice a trans-title element here.       trans-...   elements here too, but not for <name>.  Both these tag libraries intended for published content are aimed at publications in English -- not surprising since NLM is looking to influence what it gets via mandatory copyright deposit -- and the set of medical publications in Japanese published in the US may be empty.
BTW, both of these schemas do have an attribute that identifies the correct sort and display orders for the names.

In relation to another question you asked, I checked on how bi-directionality is handled in MARC.  The expectation is that within any MARC subfield, character data is entered in the order intended for display given the language/script in use.  I find that stated explicitly.  Display interfaces (i.e. library catalogs) seem to be expected to order subfields within fields as appropriate for the audience and or language.
My interest in having a recognized place to embed a chunk of metadata in an OPC package is a generic solution to a different problem.   What constitutes "good" metadata is very context-dependent.  Many communities have developed their own schemas and practices.  Transforming from one schema to another is always fraught with problems.  For example, in some schemes, a name is split into first/last or family/given, with forms of address kept separate; in others, it is just a string.  Geospatial data is sometimes recorded in spreadsheets and there is a relevant ISO standard: ISO 19115.  In some workflows, it could be useful to record important metadata about the time/place/equipment/accuracy of recorded observations, not to mention the necessary coordinate reference scheme for any coordinates.  It makes no sense to think of expanding OPC metadata to cover everything in ISO 19115.  DDI [] is another such rich domain-specific metadata scheme.  If there was a recognized place in the package to look for metadata chunks in other schemes, that could be useful in many workflows.

This does not mean that I don't think there is room for improvement in the "basic" metadata that goes with any OPC package.  Any changes probably need to be marketplace relevant -- and I don't claim to be an expert on the overall marketplace for "office documents" or OPC.  What I do know is that no one enters metadata at all unless it is easy and they have a reason to do it!!   I think metadata for OPC is an issue that will warrant quite a bit of discussion.


Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource

** Views expressed are personal and not necessarily those of the institution **
From: Rex Jaeschke [rex at]
Sent: Friday, March 23, 2012 2:14 PM
To: 'SC 34 WG4'
Subject: The draft minutes from Thursday's teleconference are at

More information about the sc34wg4 mailing list