Petascale Computing on Jaguar

The National Center for Computational Sciences (NCCS), sponsored by the Department of Energy's (DOE) Office of Science, manages the 1.64-petaflop Jaguar supercomputer for use by scientists and engineers solving problems of national and global importance. The new petaflops machine will make it possible to address some of the most challenging scientific problems in areas such as climate modeling, renewable energy, materials science, fusion and combustion. Annually, 80 percent of Jaguar's resources are allocated through DOE's Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program, a competitively selected, peer reviewed process open to researchers from universities, industry, government and non-profit organizations.

Through a close, four-year partnership between ORNL and Cray, Jaguar has delivered state-of-the-art computing capability to scientists and engineers from academia, national laboratories and industry. The XT system has grown in strength through a series of advances since being installed as a 25-teraflop XT3 in 2005. By early 2008 Jaguar was a 263-teraflop Cray XT4 able to solve some of the most challenging problems that could not be solved otherwise. In 2008 Jaguar was expanded with the addition of a 1.4-petaflop Cray XT5. The resulting system has over 181,000 processing cores connected internally with Cray's Seastar2+ network. The XT4 and XT5 parts of Jaguar are combined into a single system using an InfiniBand network that links each piece to the Spider file system.

Throughout its series of upgrades, Jaguar has maintained a consistent programming model for the users. This programming model allows users to continue to evolve their existing codes rather than write new ones. Applications that ran on previous versions of Jaguar can be recompiled, tuned for efficiency, and then run on the new machine.

Jaguar is the most powerful computer system for science with world leading performance, more than three times the memory of any other computer, and world leading bandwidth to disks and networks. The AMD Opteron processor is a powerful, general purpose processor that uses the X86 instruction set which has a rich set of applications, compilers, and tools. Jaguar has hundreds of applications that have been ported and run on the Cray XT system, many of which have been scaled up to run on 25,000 to 150,000 cores. Jaguar is ready to take on the most challenging problems for the world.

Exploring Science Frontiers at Petascale (PDF)

Anatomy of Jaguar

The Jaguar system now consists of an 84 cabinet quad-core Cray XT4 system and 200 new Cray XT5 cabinets, also using quad-core processors. Both parts of the system have 2 gigabytes of memory per core, giving the users a total of 362 terabytes of high-speed memory in the combined system. After the acceptance of the new XT5, the two systems will be combined by linking each to the Scalable I/O Network (see SION below) which links the systems together, and to the Spider file system. The XT5 system has 214 service and I/O nodes providing up to 240 gigabytes per second of bandwidth to SION and 200 gigabits per second to external networks. The XT4 has 116 service and I/O nodes providing 44 gigabytes per second of bandwidth to SION and 100 gigabits per second to external networks.

There are four nodes on both the XT4 and XT5 boards. XT4 nodes have a single AMD quad-core Opteron 1354 "Budapest" processor coupled with 8 gigabytes of DDR2-800 memory. The XT5 is a double density version of the XT4. It has double the processing power, memory, and memory bandwidth on each node. The XT5 node has two Opteron 2356 "Barcelona" processors linked with dual HyperTransport connections. Each Opteron has directly attached 8 gigabytes of DDR2-800 memory. The result is a dual-socket, eight-core node with 16 gigabytes of shared memory and a peak processing performance of 73.6 gigaflops.

Each node runs Cray's version of the SuSE Linux operating system. Cray has tuned the Linux kernel to remove unnecessary services from the compute nodes. The result is that the operating system minimizes interruptions to the application codes running on the system, thus giving predictable, repeatable run times for applications. The SuSE Linux operating system on the nodes joins the system services, networking software, communications, I/O, and mathematical libraries, as well as compilers, debuggers, and performance tools to form the Cray Linux Environment. Jaguar supports MPI, OpenMP, SHMEM, and PGAS programming models. The NCCS supports compilers from PGI, Pathscale, and GNU on Jaguar.

With the high power-density of about 2,000 watts per square foot, Jaguar could not be built without using some form of liquid cooling to prevent hot spots. At 4,400 square feet, the XT5 segment is as large as a NBA basketball court. It would be very difficult to provide evenly controlled temperature and air pressure to each of the 200 cabinets using traditional under-floor, forced-air cooling. Jaguar solves this problem by using Cray's new ECOphlex cooling technology. This technology uses R-134a refrigerant, the same as in automobile air-conditioners, to remove the heat as the air enters and exits each cabinet. The result is that the NCCS is saving 900 kilowatts of electricity and over $500,000 per year that would be required just to power the fans in a traditional forced-air cooling system. Further savings are realized due to the 480-volt power supplies in each cabinet. By keeping the voltage high, the electrical losses in the power cords are minimized, saving $500,000 over the system life-cycle.

Spider File System

A Lustre-based file system dubbed Spider will replace multiple file systems on the NCCS network with a single scalable system. Spider provides centralized access to petascale data sets from all NCCS platforms thereby eliminating islands of data. File transfers among computers and other systems will be unnecessary. Transferring petascale data sets between Jaguar and the visualization system, for example, could take hours, tying up bandwidth on Jaguar and slowing simulations in progress. Eliminating file transfers will improve performance, convenience, and cost. Data analytics platforms will benefit from the high bandwidth of Spider without requiring a large investment in dedicated storage.

In order to access Spider each NCCS platform is configured with Lustre routers. These routers allow Lustre clients on the compute nodes to access Spider as if the storage was locally attached. All other Lustre components reside within the Spider infrastructure providing ease of maintenance, accessibility during service outages on compute platforms and the ability to expand the file system performance and capacity independently of these platforms.

Moving towards a centralized file system required increased redundancy and fault tolerance. Spider was designed to eliminate single points of failure and thereby maximize availability. By using failover pairs, multiple networking paths and the resiliency features of the Lustre file system, Spider provides a reliable centralized storage solution.

Spider File System Specs

Unlike previous storage systems, which are simply high-performance raids, connected directly to the computation platform, Spider is a large-scale storage cluster. 48 DDN S2A9900s provide the backend object storage which in aggregate provides over 240 gigabytes per second of bandwidth, over 10 petabytes of RAID6 capacity from 13,440 1 terabyte SATA drives. This object storage is accessed through 192 Dell dual socket quad core Lustre OSS servers providing over 14 teraflops in performance and 3 terabytes of RAM. Each object storage server can provide in excess of 1.25 gigabytes per second of file system level performance. Metadata is stored on 2 LSI Engino 3992s and served by 3 Dell quad socket quad core systems. These systems are interconnected via our scalable I/O network (SION) providing a high performance backplane for Spider.

Scalable I/O Network - SION

In order to provide a truly integrated computing facility the LCF deployed a system area network (SAN) dubbed SION. SION is a multi-stage InfiniBand network which connects all NCCS platforms. SION provides a backplane for integration of multiple systems such as Jaguar, Spider, Lens (visualization cluster), Ewok (end-to-end productivity cluster), Smoky (application readiness cluster), HPSS and GridFTP servers. By providing a high-performance link between multiple systems SION allows communication between the two segments of Jaguar. New capabilities such as on-line visualization are now possible as data from the simulation platform can stream to the visualization platform at extremely high data rates.

As new platforms are deployed at LCF, SION will continue to scale out providing an integrated backplane of services. Rather than replicating infrastructure services for each new deployment SION will allow access to existing services thereby reducing total costs, enhancing usability and decreasing the time from initial acquisition to production readiness.

SION Specs

SION is a high-performance InfiniBand DDR network providing over 889 gigabytes per second of bisectional bandwidth. The core network infrastructure is based on three 288-port Cisco 7024D IB switches. One switch provides an aggregation link while the remaining 2 switches provide connectivity between the two Jaguar segments and the Spider file system. A fourth 7024D switch provides connectivity to all other LCF platforms and is connected to the single aggregation switch. Spider is connected to the core switches via 48 24-port Flextronics IB switches, which allows storage to be addressed directly from SION. Additional switches provide connectivity for the remaining LCF platforms.

The LCF spans over 40,000 ft2 of raised floor space with platforms spread throughout the center. In order to span the distance requirements imposed by such a large-scale center, SION utilizes Zarlink IB optical cables in a number of lengths of up to 60 meters. These long length cables allowed connectivity between the two-story facility, an impossibility with copper cables. In total, SION has over 3,000 InfiniBand ports and over 3 miles of optical cables providing high performance connectivity.

NCCS Networking

Networking capability at the NCCS is being expanded in parallel with its computing capability to ensure accurate, high-speed data transfer. High-throughput networks among its systems and upgraded connections to ESnet (Energy Sciences Network) and Internet2 have been installed to speed data transfers between the NCCS and other institutions.

NCCS has direct connection to DOE's ESnet, providing a high bandwidth pipe that links the center with more than 40 other DOE sites, as well as fast interconnections to more than 100 additional networks.

NCCS is also connected to the Internet2 network and NSF's TeraGrid. Internet2 provides the U.S. research and education community with a network that meets their bandwidth-intensive requirements. The network is a dynamic, robust and cost-effective hybrid optical and packet network. It furnishes a high-speed network backbone that can handle full motion video and 3D animations to more than 200 U.S. educational institutions, corporations, and non-profit and government agencies.

The NCCS core LAN network consists of 2 Cisco 6500 series routers along with a Force10 E1200 router. The core network provides over 100 10GE ports for intra-switch connections, as well as directly connected hosts using 10GE. NCCS provides over 1200 ports of gigabit Ethernet for machines with lesser data-transfer needs.

Networking Specs

The NCCS is connected to multiple WANs. The DOE Ultrascience network is accessible via 2 OC-192 connections. Internet 2 is accessible via 1 OC-192 and ESnet is accessible via 1 OC-192. TeraGrid is accessible via 1 OC-192 connection. Connectivity is made possible via ORNL's substantial investment in network infrastructure including a Ciena CoreStream, Juniper T640, Ciena CoreDirector and a Cisco 6509.

Archival Storage - HPSS

The High Performance Storage System (HPSS), NCCS's archival data storage facility, has been significantly upgraded to ensure high-speed, reliable storage and retrieval of petascale data sets, which may contain petabytes of data. HPSS currently stores more than 3 petabytes of data, and up to 40 terabytes are added daily. The amount stored has been doubling every year, and the addition of two petascale systems is expected to escalate that rate. In order to keep pace with the demands of our petascale simulation platforms HPSS is continuously expanded each year. Integration efforts will bring HPSS connectivity to SION allowing new capabilities such as seamless integration with Spider. This integration will enable extremely high-performance data transfers in/out of HPSS directly from Spider using multiple transfer mechanisms such as the HPSS transfer agent or the local file mover.

HPSS Specs

HPSS infrastructure includes 28 Production Dell servers used as Core, ACSLS, user interface gateway, and Movers (disk/tape). Tape storage is made up of two STK PowderHorn robotic libraries containing 14 STK 9840 tape drives and over 11,000 tapes. Two Sun Storage Tek SL8500 robotic libraries containing 16 9940, 24 T10000A and 24 T10000B tape drives with over 9800 tapes were added to increase tape capacity and throughput. Four DDN 9550 with over 1,500 terabytes of storage make up the disk tier of HPSS and provide high performance access for small and medium files while also acting as a cache mechanism for larger files destined for tape.

Science and Petascale Computing

From probing the potential of new energy sources to dissecting the dynamics of climate change to manipulating protein functions, terascale systems have been an indispensable tool in scientific investigation and problem solving. The capability offered by petascale machines to expand on these advances and address some of humankind's most pressing problems is unprecedented. ORNL provides the scientific community with the most powerful tools on the planet for addressing some of the world's toughest challenges.


Library of Flames Illuminates Design of Advanced Combustion Devices

Insight into how flames stabilize, extinguish, and reignite may spawn new predictive models that guide the design of engines that burn less fuel and generate fewer pollutants and greenhouse gases.

The Future of Climate Research: A Q&A with ORNL's James Hack

James Hack leads ORNL's Climate Change Initiative, which aims to accelerate discoveries about Earth's climate system through lab-wide engagement of scientists and engineers from diverse directorates encompassing energy, environment, computing, and national security. The NCCS is poised to help climate researchers gain insights that may guide policymakers and planners in exploring options for addressing some of the greatest challenges of our time.

Oak Ridge Delivers Several Breakthroughs

A recently released document (Breakthroughs2008.pdf) showcasing 10 scientific computing milestones includes five projects conducted at the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL). The document, Breakthroughs 2008, chronicles major advances in simulation over the past 18 months under the auspices of the Department of Energy's (DOE's) Office of Advanced Scientific Computing Research (ASCR).

Invisible Means of Support

A team led by astrophysicist Piero Madau of the University of California-Santa Cruz (UCSC) has given us a glimpse into the invisible world of dark matter, performing the largest computer simulation ever of dark matter evolving in a galaxy such as the Milky Way. The results of their findings appear in the August 7 issue of the journal Nature.

Journal Cover Highlights Jaguar Simulations

Fusion simulations performed on ORNL's Cray XT4 Jaguar supercomputer are featured in the cover article of July's edition of the journal Physics of Plasmas, published by the American Institute of Physics. A team led by ORNL physicist Fred Jaeger used simulations to demonstrate that radio waves will be effective in heating the multinational, multi-billion-dollar ITER fusion reactor.

Tap It and Trap It

One proposal for mitigating the effect of coal power on the earth�s climate involves separating carbon dioxide from power plant emissions and pumping it deep underground, where it can remain indefinitely dissolved in the groundwater or converted into a solid form of carbonate minerals. A team of researchers led by Peter Lichtner of Los Alamos National Laboratory (LANL) is using Jaguar to simulate this process, known as carbon sequestration

Jaguar XT5 Image Gallery

The new 1.64-petaflop Cray XT Jaguar features more than 180,000 processing cores, each with 2 gigabytes of local memory. The resources of the ORNL computing complex provide scientists with a total performance of 2.5 petaflops. Images of the NCCS petaflop Jaguar system are seen here. For the latest science visualization images, see the NCCS photo gallery.


Jaguar XT5 Video Gallery

Use the video playlist below to select a video for viewing.

Get the Flash Player to see the wordTube Media Player.