ValidatorDocumentation

Opening Remarks
This software is preliminary, for several reasons.

The validator in its current state does only a little bit more than what is required to validate Aura files. For instance, there is no support for Grid or Point files. Such support can be added as needed in the future.

The HDF-EOS API calls used to inquire about the data types of attributes and fields are inadequate for this task (or for any task that requires completely self-describing files). These APIs will likely change in the near future, so this program will also have to change. Currently the validator uses modified copies of several HDF-EOS API functions to extract the metadata it needs. For more on this subject, see ValidatorIssues on the shiraz whiteboard.

Installation


The validator currently comes as four sets of sources. The first set, he5patch.tar.gz, is a modified set of HDF-EOS 5 sources. These sources come from the HDF-EOS 5 developers, and they (or code descended from them) are likely to be the next major release of HDF-EOS 5. Before installing the rest of this program, unpack he5patch.tar.gz, replace the corresponding files in the HDF-EOS 5 src and include directories, and rebuild HDF-EOS 5. It would probably be a good idea at this point to test any other HDF-EOS 5 code you have against these new sources.

The second and third set of sources are open source XML parsing software. expat-1.95.5.tar.gz is James Clark's stream-oriented XML parser. No changes were made to the sources downloaded from http://www.libexpat.org. To install it, just unpack it, go to its main directory, type ./configure, make, and make install to install it in /usr/local. If you want to install it elsewhere, read the README file.

scew-0.1.1.tar.gz is the Simple C Expat Wrapper, a library that uses expat to generate an in-memory tree of an XML document similar to a Domain Object Model (DOM) structure. No changes were made to the sources downloaded from http://www.nongnu.org/scew/. It installs the same way that expat (above) does.

validate-hdfeos5.tar.gz is the source for the validator. Unpack it and edit the Makefile - you will have to change the values of HDFROOT and SCEW_ROOT. Before building, you will have to set up the HDF-EOS 5 environment by sourcing the appropriate environment script. Once all that is done, you can build the validator by typing make, which will compile and link all the sources to create the executable validate_hdfeos5.

Running the validator


The validator command-line looks like this:

validate_hdfeos5 -h file-to-validate -x XML-definition

where file-to-validate is the HDF-EOS 5 file being validated, and XML-definition is a file containing an XML document with a top-level <HDF-EOS-Validation-Definition> element. If there are any errors, the validator will print one or more error report messages to standard error and exit with a non-zero error code. If everything is OK, the validator will print nothing and exit with a normal (zero) error code.

Validator error messages


The validator produces two major types of error messages: fatal errors and normal error reports.

A fatal error is any message beginning with "FATAL ERROR:". Fatal errors are caused by file system errors (inability to open/read one of the input files) or syntactic errors in the XML definition file.

Normal error reports are of the form:

ERROR LOCATION
 description of the error...

Here is an example error report:

SWATH HIRDLS GEOLOCATION_FIELD Pressure ATTRIBUTE Units
       string "km                                      "
       doesn't match <StringValue value="hPa *"/>

Writing HDF-EOS 5 Validation Definitions


As mentioned previously, the validator compares an HDF-EOS 5 file against a definition file, which is an XML document consisting of an <HDF-EOS-Validation-Definition> element. A Document Type Definition (DTD) for those definition files is in the validator sources as validate-hdfeos5.xml. Example definition files are in the validator sources as HIRDLS-definition.xml, MLS-definition.xml, OMNI-definition.xml, and TES-definition.xml

The structure of an HDF-EOS 5 definition file at a high level is as follows:

<HDF-EOS-Validation-Definition>
 <Attributes>
   <!-- Definitions for global (file-level) attributes -->
   <Mandatory>
     <!-- Attribute definitions go here (see below> -->
     <!-- These attributes must be found -->
   </Mandatory>
   <Optional>
     <!-- More Attribute definitions go here -->
     <!-- These attributes must match the specification if they are found-->
   </Optional>
 </Attributes>
 <Swaths>
 <!-- Definitions for Swaths -->
   <Every-Swath>
   <!-- Every Swath must match these specifications -->
     <Attributes>
       <!-- Mandatory and Optional Attributes for all Swaths -->
     </Attributes>
     <Swath-Dimensions>
       <Mandatory-Dimensions>
         <!-- These dimensions must be defined for this Swath -->
          <Swath-Dimension name="nTimes"/>
       </Mandatory-Dimensions>
       <Optional-Dimensions>
         <!-- Other Dimension definitions go here -->
         <!-- If these dimensions are defined, they must match these specifications -->
       </Optional-Dimensions>
     </Swath-Dimensions>
     <GeolocationFields>
       <Every-GeolocationField>
         <Attributes>
       <!-- Mandatory and Optional Attributes for all GeolocationFields -->
         </Attributes>
       </Every-GeolocationField>
       <GeolocationField-Named name="Time">
         <!-- There must be a geolocation field named Time -->
         <Attributes>
       <!-- Mandatory and Optional Attributes for Time -->
         </Attributes>
         <!-- Time must have exactly one dimension, named nTimes -->
         <Dimensions><d>nTimes</d></Dimensions>
       </GeolocationField-Named>
     </GeolocationFields>
     <DataFields>
       <Every-DataField>
   <!-- Every data field must match these specifications -->
         <Attributes>
           <!-- Mandatory and Optional Attributes for all data fields -->
         </Attributes>
       </Every-DataField>
     </DataFields>
   </Every-Swath>
 <Swath-Named name="HIRDLS">
   <!-- There must be a swath named HIRDLS -->
     <GeolocationFields>
       <GeolocationField-Named name="Latitude">
         <Dimensions><d>nTimes</d></Dimensions>
         <Attributes>
           <!-- Mandatory and Optional attributes for HIRDLS Latitude -->
         </Attributes>
         <DataType type="float" min="-90" max="90"/>
       </GeolocationField-Named>
     </GeolocationFields>
     <DataFields>
       <DataField-Named name="Temperature">
         <!-- The usual Dimensions/Attributes/DataType -->
       </DataField-Named>
     </DataFields>
   </Swath-Named>
 </Swaths>
</HDF-EOS-Validation-Definition>

Attribute Definitions


Attributes are specified by name, type and value, as follows:

<Attribute name="InstrumentName">
   <!-- value specification goes here -->
</Attribute>

There are four different possible value specifications:

StringValue

Here are the possible ways to specify a string attribute value:

* <StringValue/>
A string, any value will do
* <StringValue value="foo" />
The string "foo"
* <StringValue matches="foo.*bar" />
A string matching the regular expression "foo.*bar"
* <StringValue><OneOf><s>foo</s><s>bar</s></OneOf></StringValue>
Either the string "foo" or the string "bar"

IntValue

Here are the possible ways to specify an integer attribute value:

* <IntValue>
An integer, any value will so
* <IntValue min="17" max="42"\>
An integer between 17 and 42 inclusive. Specifying min only means no maximum value, specifying max only means no minimum value.

FloatValue

Here are the possible ways to specify a floating-point attribute value:

* <FloatValue/>
A floating-point number, any value will do
* <FloatValue min="2.71828" max="3.14159"/>
A floating-point number between 2.71828 and 3.14159 inclusive. Specifying min only means no maximum value, specifying max only means no minimum value.

SameDataTypeAsField

This specification is useful for specifying that an attribute must have the same data type as the field it is attached to.

Data type definitions


The DataType element is used to specify the data type and range of data and geolocation fields. It is similar to an attribute value specification, as follows:
* <DataType type="integer" min="17" max="42" />
the field must be integer type, with values between 17 and 42 inclusive.
* <DataType type="float" min="2.71828" max="3.14159" />
the field must be a floating-point type, with values between 2.71828 and 3.14159 inclusive.
* <DataType type="string" />
the field must be a string type.

Dimension definitions


Dimensions are specified at the Swath (and eventually Grid and Point) level by name and size, as follows:
* <Dimension name="nXtrack"/>
A dimension named "nXtrack"
* <Dimension name="nXtrack" size="42"/>
A dimension named "nXtrack" whose size must be 42.
* <Dimension name="nXtrack" min="10" max= "20"/>
A dimension named "nXtrack", whose size must be between 10 and 20 inclusive. Omitting min means no minimum value, omittin max means no maximum value.