![]()
(Note that instead of ExifTool to extract the XMP another tool could have been used, e.g. For the above example we can so this as follows.īut let me first define a pipeline to extract the XMP, a couple filters to strip out processing instructions (includes the open and close bracketing XMP PI’s as well as an undocumented - legacy? - Adobe PI), and then fed into cwm as RDF/XML and read out as RDF/N3. For example, if one has cwm installed (Tim BL’s Closed World Machine for semweb dabblings - a Python application, so again cross-platform) one can pipe the XMP packet into cwm as RDF/XML, verify it as valid RDF and read out in another format, e.g. #USING EXIFTOOL HOW DO YOU EXTRACT DESCRIPTIONS PDF#Note that this PDF also included XMP packets for illustrations but the tool extracted the main, or document, XMP packet.Īnd now that it’s easier to extract the metadata one can look to do something more interesting. ![]() Going back to the first example we can extract the (document) XMP packet as: % exiftool -xmp -b nature05428.pdf Now, with ExifTool it is also possible to list out the terms by group, e.g. Nonetheless this gives more than an idea of what things could look like. Nor are identifier considerations fully taken into account. the XMP value types for the seperate terms), or b) how PRISM might (because it isn’t yet) be defined as an XMP schema. Note that the DC and PRISM terms are encoded as my earlier examples and do not take account of a) how DC is defined as an XMP schema (i.e. Producer : Acrobat Distiller 6.0.1 for Macintosh Metadata Date : 2006:12:19 15:03:23+08:00ĭocument ID : uuid:f598740b-ad11-41c5-a49e-7caffea783f0īy way of comparison, if we take a demo (metadata rich) PDF with added descriptive DC and PRISM metadata terms, we then get this dump: % exiftool 445037a.pdfįile Modification Date/Time : 2007:07:26 16:18:17Ĭreator Tool : InDesign: pictwpstops filter 1.0ĭocument ID : uuid:4cd39128-2c8e-41c0-9cad-eea2a1fdb64fĬopyright : © 2007 Nature Publishing GroupĬreator : InDesign: pictwpstops filter 1.0 Producer : Acrobat Distiller 6.0.1 (Windows)Ĭreation Date : 2006:12:18 16:57:58+08:00Ĭreator : 3B2 Total Publishing System 7.51n/WĬreator Tool : 3B2 Total Publishing System 7.51n/W Taking one of our standard (metadata poor) PDFs we get this dump: % exiftool nature05428.pdfįile Modification Date/Time : 2007:07:26 14:01:23 ![]() But some preliminary finds are listed below. There’s quite a number of features to explore. And moreover, intriguingly, can dump the raw (document) XMP packet. ![]() dmg are also provided.) Besides handling EXIF tags in image files this veritable swissknife of metadata inspectors can also read PDFs for the information dictionary and the document XMP packet. This is both a Perl library and command-line application (so it’s cross-platform - a Windows. Having previously stooped to an extremely crass hack for pulling out a document information dictionary from PDFs (for which no apologies are sufficient but it does often work) I feel I should make some kind of amends and mention the wonderful ExifTool by Phil Harvey for reading and writing metadata to media files. Should actually have read the spec instead of just blogging about it earlier here. This is actually wrong and as I blogged here today, the new PRISM 2.0 spec does indeed have a mapping of PRISM terms to XMP value types. Also - a biggie - I said that PRISM had no XMP schema defined. #USING EXIFTOOL HOW DO YOU EXTRACT DESCRIPTIONS UPDATE#( Update - 2007.08.28: I inadvertently missed out the term names in the last example of XMP as RDF/N3 with QNames and have now added these in. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |