Introduction
Metadata is information about an image (or other media type) imbedded in the file. There are several types of metadata. The most commonly used is EXIF (Exchangeable Image File Format). Digital cameras save shooting data in an EXIF segment. Browser and image management applications such as the Photoshop Elements Organizer or (my favorite) Photo Mechanic save user entered information such as the image caption; they can also edit shooting data such as the creation time. They update the original EXIF segment, but can also save data in XMP (Extensible Metadata Platform) segments. Access to XMP Metadata is the primary focus of this article.
One of the neatest features of Visual Studio 2005 and .NET 2.0 is the ability to data bind directly to the properties of an object. I will describe a class library that is used to create bindable XMP data objects. The XMP data model in the Adobe XMP Specification contains a set of XMP Schema definitions. Each Schema lists a group of related properties. The classes in the library map to the schema in the XMP data model. To increase its usability, the library also contains a class that binds to EXIF metadata.
In this article, I will first describe a sample application that illustrates how the class library is used. Look at the article by Rockford Lhotka for an in-depth description of the object data binding techniques I used. Next I will briefly discuss some of the details of the class library. For more information about construction of bindable classes see the Code Project article on the subject and other articles that describe the INotifyPropertyChanged
interface. Finally, I will describe how I generated the library from the information in XMP Specification. I particular, I will reveal the trick I used to extract usable source data from the PDF format specification file.
Using the code
The screen shot shows some of the features of the sample application. The Load Data button loads metadata from an image file. The Write Data button writes the XMP metadata to an XML file. The Tab Control displays the properties of several photo metadata objects in separate tab pages.
The Tab Control contains five pages which contain the photo metadata we want to display.
One of the powerful features of Visual Studio 2005 is the ability to create the property display and editing fields by simply dragging an object data source from the Data Sources window to the form. Visual Studio also creates most of the required data binding source code. At the point shown here, a tab page to display the PDF Metadata properties is being created.
The first step is to create a Data Source bound to the PdfMetadata object. The screen shot shows the Object Selection page in the Data Source Configuration Wizard. It was displayed by clicking the Add New Data Source button in the Data Sources window and choosing an Object Data Source Type. Selecting the PdfMetadata object and clicking the Finish button adds PdfMetadata to the Data Sources list.
The next screen shot shows how to configure the property fields in the Data Sources window. In the view on the left, I have selected a "Detailed" layout, which displays the properties in separate fields. The view on the right shows how to configure the control type used for each property. In this case, I have chosen to not to display the PDFVersion property.
After configuring the data source, all I had to do was drag the PdfMetadata item from the Data Sources window to the PDF Metadata tab. Visual Studio 2005 did the layout and created most of the data binding code.
To complete the process I added the following code:
private void LoadData(string imageFile)
{
xmpData = XmpMetadata.CreateNewXmpData(imageFile);
this.photoshopMetadataBindingSource.DataSource = xmpData
.XmpPhotoshopMetadata;
this.dcMetadataBindingSource.DataSource = xmpData
.XmpDcMetadata;
this.exifMetaDataBindingSource.DataSource = xmpData
.ImageExifMetadata;
this.tiffMetadataBindingSource.DataSource = xmpData
.XmpTiffMetadata;
this.pdfMetadataBindingSource.DataSource = xmpData
.XmpPdfMetadata;
}
LoadData
, which called during the Load Data button event processing, receives the selected image file. The first line creates the xmpData object, which is a container for the
Metadata objects. The remaining lines link the Photo Metadata objects to the automatically generated
BindingSource objects.
The Write Data button writes the XMP properties to an XML file. Where allowed by XMP Specification (see the discussion below), editing changes to the displayed property fields are written to the output file. Your application could use the edited output to update XMP segment in the source image file.
This about covers the basics. Again, I recommend that you review the Rockford Lhotka article for an in-depth discussion of how to use object data binding.
The remainder of this article is background information about the library and how it was created.
XMP Metadata Class Library Overview
XMP Metadata Schema
The fifteen XMP metadata classes map to the schemas defined in the Adobe XMP Specification. This mapping is illustrated using one of the shorter schemas, the "EXIF Schema for Additional EXIF Properties". This is a screen shot of the schema taken from the specification document.
It schema name is "http://ns.adobe.com/exif/1.0/aux/". Its schema namespace prefix is aux. The class names are prefixed by the schema namespace prefix.
The first column contains the property name. The second column shows the value type of the property as defined in the specification. The third column is the "Category", which can be either "Internal" or "External". An "External" property, such as a "Caption", can be updated by the user. An "Internal" property, such as the camera "SerialNumber", can not be changed. In the library, "Internal" properties are read-only. The last column is the property description. In the library, the property description is included as an XML comment.
The next screen shot shows the photo metadata classes displayed in the Object Browser.
The "EXIF Schema for Additional EXIF Properties" schema is mapped to the AuxMetadata
class. The class name is derived from the schema namespace prefix. The object property names are derived from the schema property names. Because the selected property, Lens
, belongs to the "Internal" category, only the get
accessor is present. "External" properties have both accessors and can be edited. Because the schema comment is used as an XML comment in the AuxMetadata
class code, it is displayed in the Summary section in the Object browser.
The XmpMetadata Class
The XmpMetadata
object is the container for the metadata objects. When it is instantiated, it extracts the XMP Segment from the specified file and loads it into an XmlDocument
object. The file can be a Jpeg or TIFF image file or an XML file. It parses the XmlDocument
object to extract the individual schema elements, which are stored as XmlNode
objects. The metadata objects, which are exposed as properties, are created from the XmlNode
objects. They are instantiated when they are first referenced in the application.
It maintains a list of the instantiated metadata objects in the metadataList Dictionary object. The list is used in the UpdateXmpSegment
method, which applies metadata property changes to the original XmlDocument
object.
Its WriteXmpSegment
method writes XmlDocument
object to the specified stream. The indent parameter specifies whether the output is formatted.
There is a bunch of rather nit picky code to extract the imbedded XMP segment from JPEG and TIFF image files. One part, which may be of general interest, searches a TIFF file for a specified Tag ID and returns the tag in a C# data structure. The TIFF code supports both "Big Endian" and "Little Endian" byte orders. The files I have looked at turn out to be "Little Endian" probably due to their origin on a PC. Files that originate on an older Mac may be "Big Endian".
The Photo Metadata Base Class
The PhotoMetadata
class is the base class for the XMP metadata classes. It implements the required INotifyPropertyChanged
interface. When it is instantiated, it receives the XmlNode
object that was extracted from the XMP Segment. The object contains properties belonging to the class's XMP schema. It also receives the top level XmlDocument
object, which it uses during property editing.
Its GetInitialValue
method extracts the specified property value from XmlNode
object, which it returns as a string. GetInitialValue
is called during the initial data load process. It receives the property name and property value type as defined in the XMP schema definition.
Its UpdateData method is called when a metadata property is changed. It receives the new property value and the XMP property name and property value type. It updates the XmlNode
object with the changes.
Its WriteXML
method writes the XmlNode
object to an output stream. This turned out to be an evolutionary dead end that is not actually used in the sample application, but may be of some use to other applications.
The XMP Schema Metadata Classes
Each of the XMP metadata classes, which are based on the PhotoMetadata
class, exposes the properties defined in its XMP schema. They each contain a LoadInitialData
method, which is called when the base class is instantiated. LoadInitialData
extracts the initial property values from the XmlNode
object.
The following is sample property code.
private string m_Contributor = null;
public string Contributor
{
get
{
return m_Contributor;
}
set
{
string adjustedValue = AdjustedValue(value);
if (adjustedValue != m_Contributor)
{
m_Contributor = adjustedValue;
ElementUpdated("Contributor", "dc:contributor",
"bag ProperName", adjustedValue);
}
}
}
This code creates the first property in the Dublin Core, (DcMetadata) class. The XML documentation is taken from the property description in the XMP specification. The m_Contributor string was initialized by the LoadInitialData
method. The "set" accessor is present here because the Contributor property are read-only. The AdjustedValue
method cleans up the input string. The ElementUpdated
method is called to output the required notification event and to update the property value in the XmlNode
object. It arguments are the .Net property name, the XMP schema property name and property value type, and the updated value.
The Code Generator
Since there are fifteen XMP schema and several hundred properties, it is impractical to write the XMP metadata class code by hand. Therefore, I used the code generator contained in the CodeGenerator project, which is part of the code distribution. The source data used by the code generator is in the XML files in the "doc" subdirectory.
I extracted the source files from the PDF file containing the "Adobe XMP Specification". My copy of the specification file is also in the "doc" subdirectory. I extracted the data using Adobe Acrobat Professional. My version was 6.0.5. I loaded the specification file, and then extracted the page or generate. I used the Document/Pages/Extract menu selection and entered the pages I wanted to extract. In the extracted document, I chose to save the document as XML. I then proceeded to hand edit the resulting XML. I first removed any lines not contained in <table> tags. Then, if necessary, I merged multiple tables into a single table.
The code generator loads the the raw XML created by this process into an XmlDocument
object. Then the CleanXmlTable
method merges continued lines until there is one and only one property per XmlNode
object.
The remaining code is straightforward. The application iterates twice through each property node in the XmlDocument
object. In the first pass, it generates the property code. In the second, it generates the property initialization statements.
Additions
In addition to the classes discussed earlier, I have included the PhotoMechanicMetadata class in the library. It accesses data in the custom XMP schema created by Photo Mechanic (see www.camerabits.com). Photo Mechanic saves user entered metadata that does not fit into any other schema in this custom schema. My main reason for adding it is that, in my own applications, I want to have access to the color coded photo quality rating information featured by Photo Mechanic. I created the class by reverse engineering the data in the custom schema.
As mentioned in the Introduction, the library also contains the ImageExifMetadata
class that binds to EXIF metadata. The ImageExifMetadata
class uses the EXIFExtractor
class described in the Code Project article by Asim Goheer. ImageExifMetadata
uses EXIFExtractor
to access the EXIF properties in the image file and store them in a Hashtable. It contains generated property code to create bindable, read-only properties. It uses the function, LoadData
, to initialize the properties from data in the Hashtable.
Omissions
A number of potentially useful features have been omitted from this library. Here are some:
My own applications do not require a complete editing capability, so editing has not been fully implemented. In particular, although new properties can be added, the result may not conform completely to the XMP specification. The necessary, "Property Value Type", information is supplied in the ElementUpdated
function call. This could be, but is not now, used to control the format of newly inserted property values.
This library can access data in TIFF, JPEG and XML files. The XMP specification lists several other file types which can support imbedded XMP metadata. A common format which I would like to access is Adobe Photoshop (*.psd). One of the other listed types is PNG, which I like because it is an open standard, is lossless, has a good file compression, and is widely supported. It would be useful to be able to access metadata in PNG files and perhaps some of the other listed image file types.
This list is a basis for future improvements.
Finally, because of a lack of test data, many of the classes have not actually been tested. Because of the way they were created using an automated process, this is not likely to be a problem, but, without testing, I can't be sure.
Points of interest
Where is the metadata? If you run the sample application using a file taken directly from your camera, you will see data on the EXIF Metadata tab, but the other tabs will probably be empty. The reason is that most cameras save metadata according to the old (and complex) EXIF standard, but do not support the XML based XMP standard. You will need to save the file from an application than can be configured to write an XMP Segment to the image file.
Some applications, such as Adobe Photoshop Elements, allow you to view the contents of the XMP segment. For example, in Elements, select File Info from the File menu. In the File Info window, select the "Advanced" tab. This will display a tree structure with the XMP properties organized by Schema.
History
This article was originally submitted on January 24, 2007.