DISC Banner Image

S4PM-DME

The Simple, Scalable, Script-based, Science Processor for Measurements – Data Mining Edition (S4PM-DME) allows users to run their own data mining algorithms on data at the GES DISC. Users first upload algorithms into the S4PM-DME system and then mine for data through the GES-DISC’s  Web site. S4PM-DME allows algorithms for first be tested against small amounts of data before getting run on the full data record. After the data mining is completed, the output is placed in an FTP holding area. A daily email is sent to the user with information about the pickup directory and the number of files. S4PM-DME’s goals are to give users control over the data they want to process and allows this through easy Internet access using the server’s resources rather than the user’s local system resources. Thus, the resulting data volume transferred over the Internet is minimized.

Getting Started

To begin exploring S4PM-DME, a Quick Tour is available. We welcome your feedback on this and all our data services.

Architecture

The S4PM-DME system is based on an earlier data mining system developed for data from the Tropical Rainfall Measuring Mission (TRMM). Both systems are based on the underlying Simple, Scalable Script-based Science Processor for measurements, also known as S4PM. S4PM, in turn, is based upon the S4P core whose concept is similar to a factory assembly line where pieces of the product are assembled at each station until the product is completed.

S4PM consists of multiple “stations”. These stations are directories created within the UNIX or Windows environment. The stations idle while waiting for an input file (or work order) to process. A daemon called Stationmaster runs in the background to detect the input work order, locate the appropriate script for the work order type, and map the output work order to the downstream stations and other secondary tasks. After the work order arrives at the station, one or more executable scripts process the work order and the output is sent to the next station. At any time, it is possible to have multiple work orders arriving at any one station. In addition, an output work order can be sent to multiple stations where these stations are built for each user. This process continues until the final work order is completed.

S4PM-DME Typical Mining Scenario

  1. A first time user registers within EOSDIS.
  2. Users then log onto the S4PM-DME Web site.
  3. The user enters the requested information. New user should expect a phone call for user name and password. An established user signs in with his or her username and password.
  4. For a first time user, an algorithm will need to be uploaded to the S4PM-DME system, using the development tools.
  5. Once uploaded, S4PM-DME scans, compiles and installs the algorithm. If errors or security flaws are detected a message is displayed to the user.
  6. The user can run the algorithm interactively or the algorithm can run automatically by initiating the algorithm for data subscription.
  7. If the algorithm has been set up for automatic execution, the user can expect daily email notification of the previous day’s output data.

Future Directions

To further utilize S4PM-DME system and to allow science users more data mining capabilities, GES-DISC will be collaborating with the University of Alabama in Huntsville to utilize their ADaM system (http://datamining.itsc.uah.edu/adam/index.html). ADaM is a mining and image processing toolkits with components that can be configured in a variety of ways to create customized mining processes. With these added enhancements to S4PM-DME, scientists and data miners will be able to advance future research and effective analyze the vast NASA’s Earth science data collection.





NASA Logo - nasa.gov

  • Last updated: July 20, 2009 20:33:05 GMT