NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Analytics: Workflow Management

Scientific exploration processes often consist of several tasks such as data generation, data transfer, data analysis and data visualization. Each of these tasks might involve using different tools and technologies that require information exchange. For instance, the data analysis task can only start if the long running simulation program has generated all the data that, in turn, are successfully transfered to the data analysis farm. Workflow systems such as Kepler help orchestrate complex dependences of tasks by providing mechanisms for specifying sequences of steps.

Simple workflows that require some basic fault tolerance can also be built with shell scripts such as ftsh (fault tolerant shell). ftsh allows specifying how many times a specific command shall be retried. Higher level workflow tools such as Kepler provide powerful mechanisms for building workflows visually. The visual representation makes it very easy for scientists to understand complex processes and data flows. This kind of mechanism is not supported by shell scripts. Another important feature of Kepler is that workflows can easily be re-used and are portable across different platforms since they are stored in XML. Moreover, Kepler provides various mechanisms for building parallel and distributed workflows that do not suffer from deadlocks, i.e. one resource might wait for another resource and vice versa.

Another commonly used workflow system is SCIRun, which is a problem solving environment for modeling, simulation and visualization of scientific workflows. The basic mechanisms for designing dataflows based on input and output ports is are very similar to the ones in Kepler. However, the main difference is that Kepler provides more support for data management, such as remote directory listing, Grid job submission etc, whereas SCIRun provides more powerful visualization mechanisms.

The Analytics Team currently is assessing the usage of Kepler for a specific workflow from the accelerator modeling community. This page will be updated as our work with Kepler progresses.

For more information about these technologies, contact the NERSC Analytics Team at consult@nersc.gov.


LBNL Home
Page last modified: Tue, 18 Sep 2007 23:53:18 GMT
Page URL: http://www.nersc.gov/nusers/analytics/workflow/
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science