FYI on LoC's new recommendations on formats

Arms, Caroline caar at loc.gov
Thu Jul 3 21:17:11 CEST 2014


All,

On getting back into thinking about formats for the Library of Congress, I discover that a document that has been in the works for about three years has finally seen the light of day, at
  http://www.loc.gov/preservation/resources/rfs/index.html

In particular see http://www.loc.gov/preservation/resources/rfs/textmus.html and scroll down to the Textual Works - Digital section.  You will find EPUB, PDF/UA , and PDF/A in the Preferred list and OOXML and ODF together in the Acceptable list.  In general LoC collects finished works and is not interested in being able to edit what it collects.

As far as WG4 is concerned, I just wanted to highlight the fact that ODF and OOXML are together on the list.  Further reading is entirely optional.

BACKGROUND:  
The document is organized in an odd way because it started life as an attempt to produce an update to an official publication of the U. S. Copyright Office -- http://www.copyright.gov/circs/circ07b.pdf -- the official regulation that declares the Library of Congress's format preferences for content deposited with the Library as required by U.S. Copyright Law.  Circular 7b and the new document are inclusive, in that they are intended to support the continued building of its "universal" collection.

However, for various reasons, including delays because of serious illness of the original chair and general budget chaos, by the time it was ready, the Copyright Office (with a new leader and congressional pressure on some other fronts) decided that updating Circular 7b was not a high priority.   However, the document still reflects the categorization of types of stuff and the organization used in Circular 7b.  In addition, this was developed by a hierarchy of committees of curatorial and reference staff, and I certainly wouldn't vouch for its internal consistency.

So the document has been recast (by giving it a new title and front matter) more as recommendations to guide staff who acquire content for the library via routes other than copyright deposit.  It has been issued under the Preservation Directorate and preservability is certainly one objective.   But ease of immediate use and the formats used widely for content of high interest also guided the choices.  

I played a small direct role in two areas -- early on, as an informal advisor to the chair of textual works section, and in late 2013 and early 2014, as a resource for the new editor of the document to call on wrt datasets.  My role was not to establish the preferences, but more to recognize the most blatant errors in description/categorization and serious inconsistencies in the particular sections and suggest ways they might be addressed.  

   Caroline

PS I have previously pointed to http://www.digitalpreservation.gov/formats/ and http://www.digitalpreservation.gov/formats/sustain/sustain.shtml 
The latter describes the factors considered in assessing sustainability (preservability) of content in a format.  We also consider the ability of a format to support quality and functionality -- factors that depend on the category of content.  [See http://www.digitalpreservation.gov/formats/fdd/descriptions.shtml to see the categorization in use.]  These sets factors and current assessments were considered as part of the exercise in expressing preferences.    Producing Format Description Documents (FDDs) to cover OOXML and ODF is on the to-do list for my new contract.  To date, I have only an FDD for OPC -- http://www.digitalpreservation.gov/formats/fdd/fdd000363.shtml  

Should you be interested, the proposed revision of OPC will warrant a separate FDD because expanding the characters permitted in part names is significant.

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource
http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **



More information about the sc34wg4 mailing list