1998 Annual Report
Computer Science and Applied Mathematics

Storage Management for Massive Datasets

The next generation of high energy and nuclear physics (HENP) experiments at Brookhaven National Laboratory, Stanford Linear Accelerator Center, Fermilab, and CERN will present computational challenges that are new in scale and nature. For example, about 200 terabytes of data per year are expected from the Relativistic Heavy Ion Collider (RHIC) experiments at Brookhaven beginning in late 1999.

Meeting those challenges requires close collaboration of computer-knowledgeable physicists and physics-knowledgeable computer scientists. NERSC staff are working with collaborators from the experimental facilities and Berkeley Lab to advance the state of the art in physics computing and data-intensive computing in general.

The Grand Challenge Application on HENP Data aims to develop techniques and tools for efficient access to these massive datasets, and NERSC is working on the storage management component of this Grand Challenge. The goals are to achieve efficient access to experimental event data from tertiary storage, to provide multidimensional event indexing for query estimation and execution, and to support simultaneous multiple query execution.

Efficient access requires reorganizing event clusters on tertiary storage according to anticipated access patterns rather than in the order they were generated. Each particle collision event will need to be indexed by 100 properties, such as time, energy, momentum, type and number of particles, etc., so that researchers can make queries and retrieve data on the basis of multiple property ranges.

This year the three components of the Storage Manager system were developed: the Query Estimator, the Query Monitor, and the Cache Manager. The Query Estimator builds indexes of event properties, accepts multiple query requests, estimates the requirements for executing the query, and prepares execute information. The Query Monitor accepts query execute requests, manages the file queue, and schedules file caching. The Cache Manager tracks available cache space, requests caching from HPSS, and purges files when requested by the Query Monitor.

The Storage Manager software will give physicists efficient access to massive datasets from experimental facilities now under construction. Key contributors to this project included Henrik Nordberg, Luis Bernardo, and Alex Sim.


The Storage Manager is designed to work with queries from StAF (Standard Analysis Framework), a modular, customizable, scalable, CORBA-compliant framework for the analysis of physics data, and with Objectivity, a commercial object-oriented database management system. FY99 plans include integrating Storage Manager into the RHIC Computing Facility environment at Brookhaven, performance testing, monitoring, and enhancements.

Major collaborators on the Storage Manager project include Henrik Nordberg, Alex Sim, Luis Bernardo, and Craig Tull from NERSC; Torre Wenaus from RHIC; David Malon from Argonne National Laboratory; and Doug Olson and Jeff Porter from Berkeley Lab's Nuclear Sciences Division.


 INDEX  NEXT >>