Road Map

The NCCS has aggressive plans to increase the capability of its Cray XT supercomputer, known as Jaguar. Jaguar is the largest computer in the U.S. Department of Energy’s (DOE’s) Office of Science and is the major computing resource for DOE’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program.

These upgrades will increase Jaguar’s capacity fivefold, to 250 TF, and pave the way for the center’s upcoming petaflop computer. To minimize the disruption to our users and allow them to anticipate changes, the NCCS has outlined the schedule below.

Previous Activities

  • Late November 2006. New XT4 hardware arrived that eventually allowed Jaguar to reach 100 TF. The hardware, consisting of 68 cabinets with 5,294 dual-core processor nodes, was installed on the second floor of the NCCS building at Oak Ridge National Laboratory. During this period, the existing XT3 system continued to be available to users.
  • November–February 2006: XT4 acceptance testing was conducted.
  • February 10–13, 2007: Users moved from the XT3 to the XT4; the faster memory of this system is expected to benefit a number of applications.
  • February 14, 2007: The XT3 system was shut down and reassembled on the second floor next to the new XT4 system. During this period, the new XT4 system continued to be available to users.
  • February 28, 2007: The XT3 and XT4 systems were combined into a single 119 TF system. The combined system has more than 11,500 dual-core AMD Opteron processors and more than
    46 TB of memory in the compute partition. Neither system was available to users during this period.
  • March 26, 2007: Acceptance testing began on the combined single 119 TF system.
  • April 5, 2007: The combined 119 TF XT system was returned to service.
  • July 30, 2007: Thirty-two XT3 cabinets were removed from the 124-cabinet XT system, Jaguar. The 32 cabinets were used to build a transition system that will be used during the Catamount-to-CNL transition as well as the quad-core upgrade. After removing the cabinets, Jaguar’s compute partition was reduced from 11,508 dual-core compute nodes to 8,532 dual-core ones.
  • November 1, 2007: The operating system was migrated to a stripped-down version of Linux on the compute nodes (CNL). Eight additional cabinets were removed from Jaguar and added to the transition system, increasing it from 32 to 40 cabinets.
  • December 2007: Upgrades began on Jaguar to bring the machine to 250 TF. The dual-core processors were replaced with quad-core ones, and the memory is being doubled.
  • May 2008: An upgraded system containing 7,832 quad-core 2.1 GHz AMD Opteron processors and 62 TB of memory (2 GB per core). Aggregate system performance is approximately 250 TF. Approximately 600 TB are available in the scratch filesystems.

Future Activities

  • 2008: A petaflop system will be installed at the NCCS.

It is expected that even with the planned downtimes, users will be able to use all their allocated time. In addition, there may be opportunities for some users to make “hero” runs during the upgrade periods, before the system is returned to production.

We realize that the above path to increase Jaguar to a 250 TF system and pave the way for the center’s upcoming petaflop computer is aggressive. We are working hard to limit disruptions in system availability; however, future disruptions in service will be necessary and should be expected.

We apologize for any inconvenience caused by required outages as we move forward to a 250 TF system and beyond. If you have questions about transitioning your project-related materials, don’t hesitate to contact the NCCS User Assistance Center.