NERSCPowering Scientific Discovery Since 1974

Cloud Computing

Cloud computing is gaining a foothold in the business world, but can clouds meet the specialized needs of scientists? That was one of the questions NERSC’s Magellan cloud computing testbed explored between 2009 and 2011.

The goal of Magellan, a project funded through the U.S. Department of Energy (DOE) Oce of Advanced Scientific Computing Research (ASCR), was to investigate the potential role of cloud computing in addressing the needs of the DOE Office of Science (SC), particularly related to the needs of midrange computing and future data-intensive computing workloads. The research probed various aspects of cloud computing, especially performance, usability, and cost. A distributed testbed infrastructure was deployed at the Argonne Leadership Computing Facility (ALCF) and NERSC. The testbed was designed to be flexible and capable enough to explore a variety of computing models and hardware design points to understand the impact for various scientific applications. The testbed also served as a valuable resource to application scientists, and a diverse set of applications from projects such as MG-RAST (a metagenomics analysis server), the Joint Genome Institute, the STAR experiment at the Relativistic Heavy Ion Collider, and the Laser Interferometer Gravitational Wave Observatory (LIGO), were used for benchmarking.  The project teams were also able to accomplish important production science utilizing the Magellan cloud resources.

Research Goals

The Magellan project was charged with answering the following research questions:

  • Are the open source cloud software stacks ready for DOE HPC science?
  • Can DOE cyber security requirements be met within a cloud?
  • Are the new cloud programming models useful for scientific computing?
  • Can DOE HPC applications run eciently in the cloud? What applications are suitable for clouds?
  • How usable are cloud environments for scientific applications?
  • When is it cost effective to run DOE HPC science applications in a cloud?

Findings

The complete report from the Magellan project is available now at a DOE web site.  The 169-page report includes a series of recommendations for the DOE Office of Science, DOE resource providers, application scientists, and tool developers.  Here is a brief summary of some high-level findings from the study.

  • Cloud computing provides many advantages, including customized environments that enable users to bring their own software stack and try out new computing environments without significant administration overhead, the ability to quickly surge resources to address larger problems, and the advantages that come from increased economies of scale. Virtualization is the primary feature that provides these capabilities. Our experience working with application scientists on a cloud testbed demonstrated the power of virtualization to enable fully customized environments, flexible resource management, and the potential of these characteristics to scientists.
  • Significant initial effort and unique skills can be required to port applications to clouds. This is particulalrly true for some of the emerging programming models being used in cloud computing. Scientists should consider this upfront investment in any economic analysis when deciding whether to move to the cloud.
  • Significant gaps and challenges exist in managing virtual environments, workflows, data, cyber-security, and other areas. Further research and development is needed to ensure that scientists can easily and effectively harness the capabilities introduced by these new computing models. This includes tools to simplify using cloud environments, improvements to open-source clouds software stacks, providing base images that help bootstrap users while allowing them fexibility to customize these stacks, investigating new security techniques, and enhancements to MapReduce models to better fit scientific data and workflows. Additionally, there are opportunities for exploring ways to enable these capabilities in traditional HPC platforms, which would combine the flexibility of cloud models with the performance of HPC systems.

Configuration

Magellan was purpose-built for the special requirements of scientific computing using technology and tool sets unavailable in commercial clouds, including

  • High bandwidth, low-latency node interconnects (InfiniBand),
  • High-bin processors tuned for performance,
  • Preinstalled scientific applications, compilers, debuggers, math libraries and other tools,
  • High-bandwidth parallel file system, and a
  • High-capacity data archive.

Magellan was subdivided into smaller resource pools based on the requirements of different cloud testbeds and types of cloud research being done.

Base Compute Nodes
  • 560 nodes
  • 2 quad-core Intel Nehalem 2.67 GHz processors per node
  • 8 cores per node (4,480 total cores)
  • 24 GB DDR3 1333 MHz memory per node
Expanded Compute Nodes
  • 160 nodes
  • 2 quad-core Intel Nehalem 2.67 GHz processors per node
  • 8 cores per node (1,280 total cores)
  • 48 GB DDR3 1066 MHz memory per node
  • 1 TB (local) SATA disk per node
Login/Network Service Nodes
  • 18 nodes
  • 2 quad-core Intel Nehalem 2.67 GHz processors per node
  • 8 cores per node (144 total cores)
  • 48 GB DDR3 1066 MHz memory per node
High Performance Interconnect
  • 4X QDR InfiniBand, fibre optic cables
  • Local fat-trees with a global 2D mesh
Cooling
  • Liquid Cooled