Limitations
Installation
The first and second sets of sources are open source XML parsing software. The versions we used to develop the validator are provided along with it as a convenience - feel free to download more recent versions yourself from the web addresses below.
expat-1.95.5.tar.gz is James Clark's stream-oriented XML parser. No changes were made to the sources downloaded from http://www.libexpat.org. To install it, just unpack it, go to its main directory, type ./configure, make, and make install to install it in /usr/local. If you want to install it elsewhere, read the README file.
scew-0.1.1.tar.gz is the Simple C Expat Wrapper, a library that uses expat to generate an in-memory tree of an XML document similar to a Domain Object Model (DOM) structure. No changes were made to the sources downloaded from http://www.nongnu.org/scew/. It installs the same way that expat (above) does.
he5v.tar.gz is the source for the validator. Unpack it and edit the Makefile - you will have to change the value of SCEW_ROOT to the install directory for SCEW. Before building, you will also have to set up the HDF-EOS 5 environment by sourcing the appropriate environment script. Once all that is done, you can build the validator by typing make, which will compile and link all the sources to create the executable he5v.
Note that the build procedure uses the HDF portable compile script h5cc (which means the HDF binaries need to be on your PATH). This script turns the compiler warning level way up, so the compile will generate a lot of warning messages. I tried to minimize them, but getting rid of all of them would have required modifying library sources. Sorry about that...
Running the validator
he5v file-to-validate -x XML-definition
where file-to-validate is the HDF-EOS 5 file being validated, and XML-definition is a file containing the validation definition - an XML document with a top-level <HDF-EOS-Validation-Definition> element. If there are any errors, the validator will print one or more error report messages to standard error and exit with a non-zero error code. If everything is OK, the validator will print nothing and exit with a normal (zero) error code.
Validator error messages
A fatal error is any message beginning with "FATAL ERROR:". Fatal errors are caused by file system errors (inability to open/read one of the input files) or errors in the XML definition file. Fatal errors cause validation to stop immediately.
Normal error reports are of the form:
ERROR LOCATION description of the error...
Here is an example error report:
SWATH HIRDLS GEOLOCATION_FIELD Pressure ATTRIBUTE Units string "km " doesn't match <StringValue value="hPa *"/>
Writing HDF-EOS 5 validation definitions
As mentioned previously, the validator compares an HDF-EOS 5 file against a definition file, which is an XML document consisting of an <HDF-EOS-Validation-Definition> element. A Document Type Definition (DTD) for those definition files is in the validator sources as he5v.xml. Example definition files are in the validator sources as HIRDLS-definition.xml, MLS-definition.xml, OMI-definition.xml, and TES-definition.xml
The structure of an HDF-EOS 5 definition file at a high level is as follows:
<HDF-EOS-Validation-Definition> <SavedValues> <!-- Initialization of global variables ---> <SavedInt name="int-tag"/> <i>42</i> <SavedFloat name="float-tag"/> <f>2.71828</f> <SavedString name="string-tag"/> <s>Random string</s> </SavedValues> <Attributes> <!-- Definitions for global (file-level) attributes --> <Mandatory> <!-- Attribute definitions go here (see below) --> <!-- These attributes must be found --> </Mandatory> <Optional> <!-- More Attribute definitions go here --> <!-- These attributes must match the specification if they are found--> </Optional> </Attributes> <Swaths> <!-- Definitions for Swaths --> <Every-Swath> <!-- Every Swath must match these specifications --> <Attributes> <!-- Mandatory and Optional Attributes for all Swaths --> </Attributes> <Swath-Dimensions> <Mandatory-Dimensions> <!-- These dimensions must be defined for this Swath --> <!-- See below for syntax of Dimension entries --> <Dimension name="nTimes"/> </Mandatory-Dimensions> <Optional-Dimensions> <!-- Other Dimension definitions go here --> <!-- If these dimensions are defined, they must match these specifications --> </Optional-Dimensions> </Swath-Dimensions> <GeolocationFields> <Every-GeolocationField> <Attributes> <!-- Mandatory and Optional Attributes for all GeolocationFields --> </Attributes> </Every-GeolocationField> <GeolocationField-Named name="Time"> <!-- There must be a geolocation field named Time --> <Attributes> <!-- Mandatory and Optional Attributes for Time --> </Attributes> <!-- Time must have exactly one dimension, named nTimes --> <Dimensions><d>nTimes</d></Dimensions> </GeolocationField-Named> </GeolocationFields> <DataFields> <Every-> <!-- Every data field must match these specifications --> <Attributes> <!-- Mandatory and Optional Attributes for all data fields --> </Attributes> </Every-> </DataFields> </Every-Swath> <Swath-Named name="HIRDLS"> <!-- There must be a swath named HIRDLS --> <GeolocationFields> <GeolocationField-Named name="Latitude"> <Dimensions><d>nTimes</d></Dimensions> <Attributes> <!-- Mandatory and Optional attributes for HIRDLS Latitude --> </Attributes> <!-- See below for syntax of DataType entries --> <DataType> <FloatValue> <FloatRange> <MinFloat> <MaxFloat> </FloatRange> </FloatValue> </DataType> </GeolocationField-Named> </GeolocationFields> <DataFields> <-Named name="Temperature"> <!-- The usual Dimensions/Attributes/DataType --> </-Named> </DataFields> </Swath-Named> </Swaths> </HDF-EOS-Validation-Definition>
Value specifications
The validator can check the size, type, and range of the values of attributes and fields. It does this by examining the values and matching them against value specifications.
A value specification is an expression describing the limits placed on a value - saying that the value must be an integer between 1 and 10, a floating-point number between -90.0 and 90.0, or a string matching a pattern.
Constants and Variables
Values in value specifications can be integers, floating-point numbers, or strings. They can be constants:
<i>42</i> <!-- integer --> <f>3.14159</f> <!-- float --> <s>a string</s> <!-- string -->
or references to named variables, like this:
<GetInt name="int-tag"/> <GetFloat name="float-tag"/> <GetString name="string-tag"/>
Variable names are case-sensitive; all characters that are legal in XML attribute strings are allowed. GetInt, GetFloat, and GetString have separate name spaces - <GetInt name="a"/>, <GetFloat name="a"/>, and <GetString name="a"/> refer to different, non-conflicting values.
A variable reference is legal anywhere the corresponsing constant element is legal - <i> and <GetInt>, <f> and <GetFloat>, <s> and <GetString>.
Variables can be set with the SaveInt, SaveFloat, and SaveString elements. These elements can appear in a SavedValues element, like this:
<SavedValues> <SaveInt name="int-tag"/> <i>17</i> <SaveFloat name="float-tag"/> <f>2.71828</f> <SaveString name="string-tag"/> <s>Random string</s> <SaveInt name=copy-int/> <GetInt name="int-tag/> </SavedValues>
This allows you to specify magic constants once at the top of a validation document, then refer to them by name in the rest of the document.
The Save elements can also appear inside a value specification, where they will be set to the value of the HDF-EOS attribute or field being checked. This allows the specification to capture a value from the HDF-EOS file and use it to validate other parts of the file.
It is a fatal error to refer to a variable that has no value.
There are four different possible value specifications:
StringValue
StringValue elements may contain zero or more string value specifications. A string value specification can be a constant:
<s>Constant String</s>
Or it can specify a pattern, using the <Matches> element:
<Matches><s>foo.*bar</s></Matches>
Or it can be a variable reference, using the <GetString> element:
<GetString name="string-var"/>
Or it can put the value into a variable using the <SaveString> element:
<SaveString name="string-var"/>
Here are some possible ways to specify a string value:
Here is a more complex example:
<StringValue> <Matches><s>[A-Z][A-Z][A-Z] [0-9][0-9][0-9]</s></Matches> <s>NO LICENSE</s> <SaveString name="license-plate"/> </StringValue>
This says the string value must be either three capital letters, a space, and three digits, or the string "NO LICENSE". Whatever it is, put the string value in the variable license-plate for future reference.
IntValue
The IntValue element has one attribute, size, which can be 1, 2, 4, or 8 to specify the size of the integer in bytes. Not specifying the size attribute means you don't care about the size.
IntValue elements can contain zero or more integer value specifications. An integer value specification can be a constant:
<i>17</i>
Or it can specify a range, using the IntRange element:
<IntRange><MinInt><i>1</i></MinInt><MaxInt><i>10</i></MaxInt></IntRange>
Or it can be a variable reference:
<GetInt name="int-var"/>
Or it can put the value into a variable:
<SaveInt name="int-var"/>
Here are some possible ways to specify an integer value:
Here is a more complex example:
<IntValue size="1"> <IntRange> <MinInt<><i>0</i></MinInt> <MaxInt><i>127</i></MaxInt> </IntRange> <GetInt name="missing-value"/> </IntValue>
This says the integer value must be one byte in size, and it must be either between 0 and 127, or the same as the integer variable "missing value" (set earlier in the specification).
FloatValue
The FloatValue element has one attribute, size, which can be 4 or 8 to specify the size of the floating-point number in bytes. Not specifying the size attribute means you don't care about the size.
FloatValue elements can contain zero or more floating-point value specifications. A floating-point value specification can be a constant:
<f>1.23456</f>
Or it can specify a range, using the FloatRange element:
<FloatRange><MinFloat><f>2.71828</f></MinFloat><MaxFloat><f>3.14159</f></MaxFloat></FloatRange>
Or it can be a variable reference:
<GetFloat name="float-var"/>
Or it can put the value into a variable:
<SaveFloat name="float-var"/>
SameDataTypeAsField
This is useful for specifying that an attribute must have the same data type as the field it is attached to.
Data type definitions
Dimension definitions
Yeah, yeah, the size, min, and max attributes ought to be elements that can then be filled in using GetInt.
Attribute definitions
<Attribute name="InstrumentName"> <!-- value specification goes here --> </Attribute>
Performance characteristics