Growth hint -- used in file format identification

Arms, Caroline caar at loc.gov
Tue Nov 4 19:25:24 CET 2014


Just FYI.  The OOXML record in the primary database of file format signatures used by the archival community includes:
http://apps.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=910&strPageToDisplay=signatures
which takes advantage of the growth hint bytes actually found in .xslx, .docx, and .pptx (etc.) files to recognize OOXML files from the Zip-based package without unpacking it.

The DROID tool that is driven by this information should be viewed as more flexible than the traditional magic number tool, but basically used in archive ingest workflows for tasks like (a) checking that a file's content appears to match its file extension, (b) distinguishing between filetypes that use the same extension, and (c) triage (for example to invoke more complex characterization or validation steps).

For more on DROID, see http://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/droid/

  Caroline

Caroline Arms
Library of Congress Contractor
Co-compiler of Sustainability of Digital Formats resource http://www.digitalpreservation.gov/formats/

** Views expressed are personal and not necessarily those of the institution **

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20141104/f2cd403a/attachment.html>


More information about the sc34wg4 mailing list