NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
 

Analytics: Data Exploration

Introduction

Many of today's important scientific breakthroughs are being made by large, interdisciplinary collaborations of scientists working in geographically widely distributed locations, producing and collecting vast and complex datasets. These large-scale science projects require software tools that support, not only insight into complex data, but collaborative science discovery. Scientific analytics approaches, combining statistical algorithms and advanced analysis techniques with highly interactive visual interfaces that support data exploration and collaborative work, offer scientists the opportunity for in-depth understanding of massive, noisy, and high-dimensional data.

Case Study: Astrophysics

Astrophysics in particular lends itself to a visual analytics approach to data exploration due to the inherently visual nature of much astronomical data (including images and spectra). One of the grand challenges in astrophysics today is the effort to comprehend the mysterious "dark energy," which accounts for three-quarters of the matter/energy budget of the universe. The existence of dark energy may well require the development of new theories of physics and cosmology. Dark energy acts to accelerate the expansion of the universe (as opposed to gravity, which acts to decelerate the expansion). Our current understanding of dark energy comes primarily from the study of supernovae.     

Supernova 1994D in the outskirts of the galaxy NGC 4526. This example of a type Ia supernova shows that at peak brightness they rival the cores of galaxies in luminosity (Hubble Space Telescope photo).

The Nearby Supernova Factory (SNfactory) is an international astrophysics experiment designed to discover and measure Type Ia supernovae in greater number and detail than has ever been done before. It is the largest data volume supernova search currently in operation. Type Ia supernovae are stellar explosions that have a consistent maximum brightness, allowing them to be used as "standard candles" to measure distances to other galaxies and to trace the rate of expansion of the universe and how dark energy affects the structure of the cosmos. The SNfactory receives 50-80 GB of image data per night, which must be processed and examined by teams of domain experts within 12-24 hours to obtain maximum scientific benefit from the study of these rare and short-lived stellar events.

Custom Production Software: Sunfall


Supernova Warehouse DataTaking view from Sunfall (click on image for higher resolution view).

     In order to facilitate the supernova search and data analysis process and enable scientific discovery for project astrophysicists, members of the NERSC Analytics Team, together with project scientists, developed Sunfall (SuperNova Factory AssembLy Line). In line with the NERSC Analytics business model, team members provided in-depth, collaborative help to science stakeholders to implement an effective analytics solution to their scientific data challenge. Sunfall is a collaborative visual analytics and supernova data exploration system that has been in production use at the SNfactory for over a year.

Sunfall incorporates sophisticated astrophysics image processing algorithms, machine learning capabilities including boosted trees and support vector machines, and astronomical data analysis with a usable, highly interactive visual interface designed to facilitate collaborative data exploration and decision making.

Sunfall components span all aspects of scientific analytics: workflow management, scientific data management, data analysis and mining, visualization, and interactive data exploration. Analytics improvements to SNfactory search software in many areas produced measurable labor savings. Improved image processing algorithms such as Fourier contour analysis of supernova images, machine learning algorithms including support vector machines and boosted decision trees, and improved control interfaces for scanning and vetting led to significant overall performance enhancements.

Bottom line: The development of a custom analytics solution for a NERSC user led to up to 90% labor savings in areas of the SNfactory supernova search and followup workflow. Additionally, project scientists now have new data exploration and analysis capabilities, which had previously been too time-consuming to attempt.

For More Information

Contact the NERSC Analytics Team (consult@nersc.gov) if you are interested in learning more about in-depth collaborations to develop custom analytics software.

For more information on Sunfall or its components, see Sunfall, Fourier contour analysis, or Supernova recognition using support vector machines.

Top of Page 


LBNL Home
Page last modified: Wed, 19 Sep 2007 21:49:16 GMT
Page URL: http://www.nersc.gov/nusers/analytics/exploration/
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science