HDF-EOS 5 Profile Validator Documentation


Limitations


The validator in its current state does only a little bit more than what is required to validate Aura files. For instance, there is no support for Grid or Point files. Such support can be added as needed in the future.

Installation


The validator currently comes as three sets of sources. It depends on HDF-EOS version 5.1.5, and is guaranteed not to work in any version before that.

The first and second sets of sources are open source XML parsing software. The versions we used to develop the validator are provided along with it as a convenience - feel free to download more recent versions yourself from the web addresses below.

expat-1.95.5.tar.gz is James Clark's stream-oriented XML parser. No changes were made to the sources downloaded from http://www.libexpat.org. To install it, just unpack it, go to its main directory, type ./configure, make, and make install to install it in /usr/local. If you want to install it elsewhere, read the README file.

scew-0.1.1.tar.gz is the Simple C Expat Wrapper, a library that uses expat to generate an in-memory tree of an XML document similar to a Domain Object Model (DOM) structure. No changes were made to the sources downloaded from http://www.nongnu.org/scew/. It installs the same way that expat (above) does.

he5v.tar.gz is the source for the validator. Unpack it and edit the Makefile - you will have to change the value of SCEW_ROOT to the install directory for SCEW. Before building, you will also have to set up the HDF-EOS 5 environment by sourcing the appropriate environment script. Once all that is done, you can build the validator by typing make, which will compile and link all the sources to create the executable he5v.

Note that the build procedure uses the HDF portable compile script h5cc (which means the HDF binaries need to be on your PATH). This script turns the compiler warning level way up, so the compile will generate a lot of warning messages. I tried to minimize them, but getting rid of all of them would have required modifying library sources. Sorry about that...

Running the validator


The validator command-line looks like this:

he5v file-to-validate -x XML-definition

where file-to-validate is the HDF-EOS 5 file being validated, and XML-definition is a file containing the validation definition - an XML document with a top-level <HDF-EOS-Validation-Definition> element. If there are any errors, the validator will print one or more error report messages to standard error and exit with a non-zero error code. If everything is OK, the validator will print nothing and exit with a normal (zero) error code.

Validator error messages


The validator produces two major types of error messages: fatal errors and normal error reports.

A fatal error is any message beginning with "FATAL ERROR:". Fatal errors are caused by file system errors (inability to open/read one of the input files) or errors in the XML definition file. Fatal errors cause validation to stop immediately.

Normal error reports are of the form:

ERROR LOCATION
 description of the error...

Here is an example error report:

SWATH HIRDLS GEOLOCATION_FIELD Pressure ATTRIBUTE Units
       string "km                                      "
       doesn't match <StringValue value="hPa *"/>

Writing HDF-EOS 5 validation definitions


As mentioned previously, the validator compares an HDF-EOS 5 file against a definition file, which is an XML document consisting of an <HDF-EOS-Validation-Definition> element. A Document Type Definition (DTD) for those definition files is in the validator sources as he5v.xml. Example definition files are in the validator sources as HIRDLS-definition.xml, MLS-definition.xml, OMI-definition.xml, and TES-definition.xml

The structure of an HDF-EOS 5 definition file at a high level is as follows:

<HDF-EOS-Validation-Definition>
 <SavedValues>
   <!-- Initialization of global variables --->
   <SavedInt name="int-tag"/> <i>42</i>
   <SavedFloat name="float-tag"/> <f>2.71828</f>
   <SavedString name="string-tag"/> <s>Random string</s>
 </SavedValues>
 <Attributes>
   <!-- Definitions for global (file-level) attributes -->
   <Mandatory>
     <!-- Attribute definitions go here (see below) -->
     <!-- These attributes must be found -->
   </Mandatory>
   <Optional>
     <!-- More Attribute definitions go here -->
     <!-- These attributes must match the specification if they are found-->
   </Optional>
 </Attributes>
 <Swaths>
 <!-- Definitions for Swaths -->
   <Every-Swath>
   <!-- Every Swath must match these specifications -->
     <Attributes>
       <!-- Mandatory and Optional Attributes for all Swaths -->
     </Attributes>
     <Swath-Dimensions>
       <Mandatory-Dimensions>
         <!-- These dimensions must be defined for this Swath -->
         <!-- See below for syntax of Dimension entries -->
          <Dimension name="nTimes"/>
       </Mandatory-Dimensions>
       <Optional-Dimensions>
         <!-- Other Dimension definitions go here -->
         <!-- If these dimensions are defined, they must match these specifications -->
       </Optional-Dimensions>
     </Swath-Dimensions>
     <GeolocationFields>
       <Every-GeolocationField>
         <Attributes>
       <!-- Mandatory and Optional Attributes for all GeolocationFields -->
         </Attributes>
       </Every-GeolocationField>
       <GeolocationField-Named name="Time">
         <!-- There must be a geolocation field named Time -->
         <Attributes>
       <!-- Mandatory and Optional Attributes for Time -->
         </Attributes>
         <!-- Time must have exactly one dimension, named nTimes -->
         <Dimensions><d>nTimes</d></Dimensions>
       </GeolocationField-Named>
     </GeolocationFields>
     <DataFields>
       <Every->
   <!-- Every data field must match these specifications -->
         <Attributes>
           <!-- Mandatory and Optional Attributes for all data fields -->
         </Attributes>
       </Every->
     </DataFields>
   </Every-Swath>
 <Swath-Named name="HIRDLS">
   <!-- There must be a swath named HIRDLS -->
     <GeolocationFields>
       <GeolocationField-Named name="Latitude">
         <Dimensions><d>nTimes</d></Dimensions>
         <Attributes>
           <!-- Mandatory and Optional attributes for HIRDLS Latitude -->
         </Attributes>
         <!-- See below for syntax of DataType entries -->
         <DataType>
           <FloatValue>
             <FloatRange>
              <MinFloat>
              <MaxFloat>
             </FloatRange>
           </FloatValue>
         </DataType>
       </GeolocationField-Named>
     </GeolocationFields>
     <DataFields>
       <-Named name="Temperature">
         <!-- The usual Dimensions/Attributes/DataType -->
       </-Named>
     </DataFields>
   </Swath-Named>
 </Swaths>
</HDF-EOS-Validation-Definition>

Value specifications

The validator can check the size, type, and range of the values of attributes and fields. It does this by examining the values and matching them against value specifications.

A value specification is an expression describing the limits placed on a value - saying that the value must be an integer between 1 and 10, a floating-point number between -90.0 and 90.0, or a string matching a pattern.

Constants and Variables

Values in value specifications can be integers, floating-point numbers, or strings. They can be constants:

<i>42</i>        <!-- integer -->
<f>3.14159</f>   <!-- float   -->
<s>a string</s>  <!-- string  -->

or references to named variables, like this:

<GetInt name="int-tag"/>
<GetFloat name="float-tag"/>
<GetString name="string-tag"/>

Variable names are case-sensitive; all characters that are legal in XML attribute strings are allowed. GetInt, GetFloat, and GetString have separate name spaces - <GetInt name="a"/>, <GetFloat name="a"/>, and <GetString name="a"/> refer to different, non-conflicting values.

A variable reference is legal anywhere the corresponsing constant element is legal - <i> and <GetInt>, <f> and <GetFloat>, <s> and <GetString>.

Variables can be set with the SaveInt, SaveFloat, and SaveString elements. These elements can appear in a SavedValues element, like this:

<SavedValues>
  <SaveInt name="int-tag"/> <i>17</i>
  <SaveFloat name="float-tag"/> <f>2.71828</f>
  <SaveString name="string-tag"/> <s>Random string</s>
  <SaveInt name=copy-int/> <GetInt name="int-tag/>
</SavedValues>

This allows you to specify magic constants once at the top of a validation document, then refer to them by name in the rest of the document.

The Save elements can also appear inside a value specification, where they will be set to the value of the HDF-EOS attribute or field being checked. This allows the specification to capture a value from the HDF-EOS file and use it to validate other parts of the file.

It is a fatal error to refer to a variable that has no value.

There are four different possible value specifications:

StringValue

StringValue elements may contain zero or more string value specifications. A string value specification can be a constant:

<s>Constant String</s>

Or it can specify a pattern, using the <Matches> element:

<Matches><s>foo.*bar</s></Matches>

Or it can be a variable reference, using the <GetString> element:

<GetString name="string-var"/>

Or it can put the value into a variable using the <SaveString> element:

<SaveString name="string-var"/>

Here are some possible ways to specify a string value:

* <StringValue/>
A string, any value will do
* <StringValue><s>foo</s></StringValue>
The string "foo"
* <StringValue><Matches><s>foo.*bar</s></Matches></StringValue>
A string matching the regular expression "foo.*bar"
* <StringValue><s>foo</s><s>bar</s></StringValue>
Either the string "foo" or the string "bar"

Here is a more complex example:

<StringValue>
 <Matches><s>[A-Z][A-Z][A-Z] [0-9][0-9][0-9]</s></Matches>
 <s>NO LICENSE</s>
 <SaveString name="license-plate"/>
</StringValue>

This says the string value must be either three capital letters, a space, and three digits, or the string "NO LICENSE". Whatever it is, put the string value in the variable license-plate for future reference.

IntValue

The IntValue element has one attribute, size, which can be 1, 2, 4, or 8 to specify the size of the integer in bytes. Not specifying the size attribute means you don't care about the size.

IntValue elements can contain zero or more integer value specifications. An integer value specification can be a constant:

<i>17</i>

Or it can specify a range, using the IntRange element:

<IntRange><MinInt><i>1</i></MinInt><MaxInt><i>10</i></MaxInt></IntRange>

Or it can be a variable reference:

<GetInt name="int-var"/>

Or it can put the value into a variable:

<SaveInt name="int-var"/>

Here are some possible ways to specify an integer value:

* <IntValue>
An integer, any value will so
* <IntValue size="2"><IntRange><MinInt><i>17</i></MinInt><MaxInt><i>42</i></MaxInt></IntValue>
A two-byte integer between 17 and 42 inclusive. Specifying MinInt only means no maximum value, specifying MaxInt only means no minimum value.

Here is a more complex example:

<IntValue size="1">
 <IntRange>
  <MinInt<><i>0</i></MinInt>
  <MaxInt><i>127</i></MaxInt>
 </IntRange>
 <GetInt name="missing-value"/>
</IntValue>

This says the integer value must be one byte in size, and it must be either between 0 and 127, or the same as the integer variable "missing value" (set earlier in the specification).

FloatValue

The FloatValue element has one attribute, size, which can be 4 or 8 to specify the size of the floating-point number in bytes. Not specifying the size attribute means you don't care about the size.

FloatValue elements can contain zero or more floating-point value specifications. A floating-point value specification can be a constant:

<f>1.23456</f>

Or it can specify a range, using the FloatRange element:

<FloatRange><MinFloat><f>2.71828</f></MinFloat><MaxFloat><f>3.14159</f></MaxFloat></FloatRange>

Or it can be a variable reference:

<GetFloat name="float-var"/>

Or it can put the value into a variable:

<SaveFloat name="float-var"/>

SameDataTypeAsField

This is useful for specifying that an attribute must have the same data type as the field it is attached to.

Data type definitions


The DataType element is used to specify the data type and range of data and geolocation fields. It is just a container for IntValue/FloatValue/StringValue elements, like this:

* <DataType><IntValue><IntRange><MinInt><i>17</i></MinInt><MaxInt><i>42</i></MaxInt></IntRange></IntValue></DataType>
the field must be integer type, with values between 17 and 42 inclusive.
* <DataType><StringValue/></DataType>
the field must be a string type.

Dimension definitions


Dimensions are specified at the Swath (and eventually Grid and Point) level by name and size, as follows:
* <Dimension name="nXtrack"/>
A dimension named "nXtrack"
* <Dimension name="nXtrack" size="42"/>
A dimension named "nXtrack" whose size must be 42.
* <Dimension name="nXtrack" min="10" max= "20"/>
A dimension named "nXtrack", whose size must be between 10 and 20 inclusive. Omitting min means no minimum value, omitting max means no maximum value.

Yeah, yeah, the size, min, and max attributes ought to be elements that can then be filled in using GetInt.

Attribute definitions


Attributes are specified by name, type and value, as follows:

<Attribute name="InstrumentName">
   <!-- value specification goes here -->
</Attribute>

Performance characteristics


The validator should not be a particularly CPU-intensive application, and will only be I/O-intensive if it is validating value ranges for large data fields, since it must read the actual data from the file to check it. To keep the system simple internally, the validator will attempt to read in an entire field at a time if it needs to range-check its data, so its worst-case memory usage will be roughly equal to the size of the largest field in the file being validated.