NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

The Analytics Program at NERSC

At NERSC, several key technologies - data management, data analysis and data mining, visualization, workflow management, and interactive data exploration - contribute to the Analytics Program.

In general, scientific data management and workflow management are enabling technologies. Scientific data management provides tools for efficient access to large amounts of data, as well as supporting data organization and security. Workflow management describes a

      

 collage of analytics images

systematic approach to data processing pipelines or the pre-processing and post-processing steps involved in running simulations. Workflow management tools can be used to automate repetitive processing tasks and make processing pipelines more robust.

Data analysis and data mining tools are used to compare datasets or find features within a dataset. Visualization of data is one of the primary tools for data exploration, and may precede or inspire more formal data analyses. The technologies described above may be used individually or together to explore data. Data exploration discusses how analytics technologies can be integrated to provide a framework for discovery.

Each of the first five topic pages provides an overview or background information about the topic, as well as links to software tools currently supported by NERSC, or that may be supported in the future.

Questions for or requests for services from the NERSC Analytics Team should be sent by email to consult@nersc.gov.

Scientific Data Management

Scientific Data Management (SDM) refers to storage and retrieval of scientific data from various storage sources such as main memory, disk and tape. SDM covers integration of data formats; data description, organization and metadata management; efficient indexing and querying; and file transfer as well as remote access and distributed data management across networks.

    Quick Link:  NERSC Tools for Scientific Data Management

Data Analysis and Data Mining

Data analysis techniques include simple post-processing (e.g., aggregating data) of experimental data or simulation output, as well as the use of mathematical methods (e.g., filtering data) and statistical tests. Data mining usually refers to the application of more advanced mathematical techniques such as classification, clustering, pattern recognition, etc.

    Quick Links: NERSC Tools for Data Analysis and Data Mining  |  Case Studies

Visualization

Visualization of data is an invaluable tool for getting a feel for how simulation output may vary in time or with changes in parameter values, or for the locations of interesting regions in large data sets. The term visualization often is used to describe the rendering of 3D isosurfaces and volumes, whereas the term graphics usually is applied to plots such as scatter plots and histograms. Several visualization tools supported by NERSC also provide interactive manipulation of the display of data in order to facilitate data exploration.

    Quick Links: NERSC Visualization Applications and Graphics Libraries  |  Case Studies

Workflow Management

Workflow management refers to the process of connecting various software tools based on specific input and output parameters. The goal of workflow management is to automatize specific sets of tasks that are repeated many times and thus simplify execution and avoid typical human errors that often occur when repetitive tasks are performed.

Data Exploration

Highly interactive data exploration is a key component of scientific analytics, often combining multiple analytics technologies, such as data mining and visualization, to find important features within data or to discover the essential results from a simulation or experiment. A relatively new term, discourse, refers not only to the integration of analytics technologies, but also to the goal - knowledge discovery - and to the interactive nature of the process of reaching that goal.

Analytics Server

DaVinci is an SGI Altix 350 server with 32 64-bit Intel Itanium-2 processors running at 1.4 GHz. DaVinci's main purpose, for which it is well suited - due to its SMP architecture, 192 GB of shared memory, and 24 TB of disk space - is to provide analytics capabilities to the NERSC user community,

Top of Page


LBNL Home
Page last modified: Thu, 07 Aug 2008 18:48:26 GMT
Page URL: http://www.nersc.gov/nusers/analytics/
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science