Proposal 34

From DROID 5.0 Wiki

Jump to: navigation, search
Previous | Index | Next

[edit] More reliable coverage of file formats

Description

More reliable coverage of file formats allow us to say with confidence the major type of the file format and the version of the file format.

Motivation

To reduce ambiguity and increase confidence in a result. To allow users to make stronger decisions about their collections based on precise information. Examples where DROID v4.0/Signature File v.16 are deficient:

  • TIFF where DROID can currently return multiple version information for a single TIFF.
  • Microsoft 97/2003 .doc files created with Word 2007 are always identified as OLE2 compound. (Excel 2003 and PowerPoint 2003 files created with the 2007 applications do not have these problems).
  • Embedded MS Office files seem to be identified erratically. Excel 2003 with embedded PowerPoint 2003 is identified as BIFF8/8X and PowerPoint 2003, which is a good result. PowerPoint with embedded files is only identified as PowerPoint. Word files with embedded files are always OLE2.
  • Every Excel 2003 file is identified as both BIFF 8 and BIFF 8X. Perhaps this is the way Excel 2003 files are (I understand they have a backwards compatibility feature, where the newer format is just appended to the older format).
  • Some files positively identified as Word 97/2003 are also positively identified as Word 95. (In Windows Explorer, these files are deemed 97/2003 only).

Proposed Solutions

~Proposed Solution~

Notes

~Notes~

  • Comment as for Proposal 33. Adrian Brown 13:44, August 27, 2009 (UTC)
  • This isn't so much a PRONOM requirement - we are talking about limitations with the current DROID algorithm. Some of this can only be achieved here by modifying DROID itself! There may, of course, also be signature changes that will help too. MattPalmer 13:49, August 27, 2009 (UTC)
  • We currently overcome this DROID limitation by using JHOVE's characterization result to pinpoint the exact TIFF version. We will be interested to see if there is any other possible implementations CChou 10:48, September 9, 2009 (UTC)
  • JHOVE2 will continue to determine version information by examining the full set of internal data structures. But if the PRONOM signature database could be made more precise, better signature-based version identification would be a general benefit. --Slabrams 23:59, September 17, 2009 (UTC)


TNA

Declare Interest

  • MattPalmer 11:41, July 29, 2009 (UTC)
  • Adrian Brown 13:44, August 27, 2009 (UTC)
  • Rspencer1 09:30, September 4, 2009 (UTC)
  • CChou 10:07, September 9, 2009 (UTC)
  • Ndelpozo 01:28, September 15, 2009 (UTC) (National Library of Australia)
  • JHOVE2 project; --Slabrams 23:59, September 17, 2009 (UTC)
Google AdSense