The Challenge of Keeping Up With the Data

To speed up the RHIC data analysis, RCF processors will at times be augmented by computing resources from collaborating sites around the world using the latest rendition of large-scale computer networking, known as the Grid. The Grid keeps track of all the networked computers, and distributes jobs among them.

During the 2005 RHIC run, the PHENIX experiment used Grid technology to transfer nearly 270 terabytes of data to the RIKEN Institute in Japan using Grid-aware software tools at an average rate of 100 MB per second. This is equivalent to transferring the entire contents of a CD halfway around the world every seven seconds. During the 2006 RHIC run, the bulk of the PHENIX share (250 terabytes) of the new data recorded has again been transferred via network to RIKEN.

This year’s transfer is making use of a newly upgraded Wide Area Network connection which increases the possible transfer rate in and out of Brookhaven by a factor of eight from approximately 300 MB per second to 2.4 gigabytes (GB) per second. This very high data transfer capability is required to avoid interference between RHIC data transfers and data transfers being done for another large-scale high-energy physics experiment: ATLAS.

Shouldering ATLAS Data

ATLAS is one of the detectors located at the Large Hadron Collider (LHC), a new accelerator complex now under construction at the European Center for Nuclear Research (CERN). It was designed to analyze the thousands of particles streaming from proton-proton or heavy-ion collisions, some with as much as 30 times the energy as the collisions occurring at RHIC. ATLAS is being built by a large international collaboration and is due to come online in 2007.

“The computing needs of ATLAS will be enormous by today’s standards,” Gibbard said. ATLAS is expected to collect five to eight petabytes of data per year — the equivalent of 7.5 million CDs of data.

“Individual scientific laboratories do not have the human or computing resources required by this demand,” Gibbard said. “That’s why the international physics computing community worked together to develop the new computing tools of the Grid, as well as a highly distributed, Grid-based data-analysis infrastructure.”

Hundreds of thousands of computers distributed worldwide with hundreds of petabytes of tape and disk-storage capacity and state-of-the-art networking will be linked to meet the demand. Resources currently available to ATLAS at Brookhaven include 200 terabytes of disk storage, 700 Linux Farm processors, and a second new 2.4 petabyte robotic tape storage system dedicated to ATLAS.

Besides providing a large portion of the overall computing resources for U.S. collaborators in ATLAS, Brookhaven is also the central hub for storing and distributing ATLAS experimental data among U.S. collaborators. A series of exercises to establish the capability of filling this data handling role have been ongoing in preparation for the start up of ATLAS. In the most recent of this series of exercises, referred to as Global Service Challenge 4, data was transferred from CERN in Switzerland to the ACF site at Brookhaven at rates as high as 300 MB per second, with an average over the two-week period of 200 MB/sec. The fact that this exercise was going on at the same time that PHENIX data was being transferred to Japan underscores the importance of the new very high Wide Area Network transfer capacity.

“The ACF staff will play an important role in developing the next generation ATLAS Grid production tools. They’ll also operate a support center to assist U.S. collaborators in using Grid-based resources,” Gibbard said. Together, the people and the machines they run will delve into the mysteries of matter, perhaps uncovering that needle in a petabyte that offers new insight on the world around us.