The S4PM (Simple, Scalable, Script-based Science Processor for Missions) system implements a fully functioning processing system that supports a variety of science processing algorithms and scenarios. It is built on top of the S4P (Simple, Scalable, Script-based Science Processor) kernel: an engine, toolkit, and graphical monitor for automating script-based, data-driven processing. It can be run in a data-driven mode, a polling mode, and/or an on-demand mode. Though written in Perl, it can accept any algorithms that can be run from the command line (sometimes through the addition of simple wrapper scripts). To date, S4PM has been run with standard Earth Observing System (EOS) product algorithms from the MODIS and AIRS teams, as well as GES DISC subsetters, Direct Broadcast algorithms from the GSFC Direct Readout Lab, and algorithms in the University of Wisconsin's IMAPP package.
S4PM supports several interfaces within EOS Data and Information System (EOSDIS), including ingest into EOSDIS Core System (ECS) archives, insertion into ECS Data Pools, distribution from ECS, and interactions with the EOSDIS Data Gateway as an External Subsetting System. These interfaces can be swapped out for user-defined interfaces by adding custom scripts.
S4PM requires Perl (ideally 5.6 or higher) and has been run successfully on Irix, Linux (RedHat), Solaris, and Mac OS X.
The main goal of S4PM is to automate science processing at the GES DISC to the extent that a single operator can monitor all of the processing in an "industrial-size" data processing center. A second goal is to be flexible enough to easily add new processing strings or new algorithms to an existing string with a minimum of effort.
Although the goals of this system arose from internal needs, S4PM has proven adaptable enough to new requirements that we are making it available for reuse by the science processing community.
S4PM supports automated processing in a number of different "flavors":
- Low-latency data-driven processing
- On-demand processing
- Olling-driven processing
Also, a number of different production rules are supported, including required and optional inputs, timers and temporal offsets.
The architecture of S4PM and S4P was specifically designed to be highly modular so that it could evolve quickly and flexibly. It has already evolved from data-driven processing of MODIS instrument data to AIRS processing to on-demand subsetting based on user requests. Currently, the GES DISC is in the process of incorporating its Near-Archive Data Mining system into S4PM, allowing users to upload algorithms for execution at the GES DISC.
For the future, S4PM will evolve to:
- Support an ever-increasing variety of processing algorithms, scenarios and data interfaces
- Increase the automation of failure monitoring and recovery
- Reduce the time and expertise needed to setup and adapt S4PM to new processing algorithms
- Support additional platforms, such as Windows
We hope that some or all of these goals will be reached by collaborating with the open source community.
High usability is a key goal of S4PM, deriving from the need for more automation at less operational cost. Specific goals are:
- Allow a single operator to manage and monitor hundreds of jobs simultaneously
- Drill down to troubleshoot a problem in two mouse clicks
- Set up a new processing string in less than 30 minutes
While the first and second goals are largely satisfied, work continues on the third.