Parallel Systems Software
The Parallel System Software Research capability of the Scalable Computing Systems department in CCIM supports the needs of ASC program by
providing the software foundation that enables the scaling of Massively Parallel
Processors (MPPs) to thousands of processors. In collaboration with
the University
of New Mexico, Sandia personnel have developed operating systems, communications technology, and run-time system software to provide these capabilities. Sandia
National Labs has also performed pioneering work on
the software needed to enable large scale cluster computers. We
continue this tradition with research in communications technology, operating
systems, and system software that will enable novel new architectures and system
scaling to unprecedented levels.
Sandia's open-source software used
as communication middleware for high performance computing and scalable parallel
file systems.
Areas of Research:
- Alternative
Programming Models: Although MPI is the most commonly used
programming model at Sandia, some within the broader research community
believe that that it could ultimately impose barriers to higher productivity
and greater scalability. Researchers at Sandia are beginning to
investigate alternative programming models such as UPC. (Contact: Zhaofang
Wen)
- Configurable Operating/Runtime
Systems: This is a callaborative research project with the University
of New Mexico and the California Institute of Technology to design and
implement a framework for configuring, building, and deploying application-specific
operating and runtime systems for peta-scale scientific computing environments.
(Contact: Ron
B. Brightwell)
- Georgia Tech Collaboration: Sandia
is collaborating with Georgia Tech to enable high performance simulation
of large scale supercomputer networks on large scale supercomputers. The
goal is to leverage a variety of simulation techniques to provide both
high fidelity simulations and accurate simulations using high-level network
models.
(Contact: Keith
D. Underwood)
- Light
Weight Kernel Development: The success of Cougar on ASCI Red
led to its selection as the OS for ASCI Red Storm. The Scalable Computing
Systems department is contributing to the effort to implement the next
version of Cougar (called Catamount) and develop the full system software
environment for ASCI Red Storm.
(Contacts: John
P. Vandyke and Kevin
Pedretti)
- Light
Weight Kernel Research: The light weight kernel (LWK) known
as Cougar on ASCI Red is
a fundamental component of its success. Indeed, the LWK approach
has prevented the "Rogue OS" effects seen on other large scale systems. LWK
research is seeking to extend this technology to new architectures and
system scales of 100,000 processors or more. (Contact: Ron
B. Brightwell).
- Network Simulation: In
an effort to validate future generations of supercomputers before buying
them, this project seeks to build a simulation infrastructure to allow for
the ready exploration of the supercomputer architecture space. The goal is
for real applications to run on high fidelity simulations of both the processor
and the network. (Contact: Rolf
E. Riesen)
- Network
Usage Analysis: A key aspect of application performance is the
way in which the application uses the network. The network usage
model is also a key parameter for optimizing the network and communications
libraries. MPI level
network usage analysis
(Contacts: Ron
B. Brightwell and Keith
D. Underwood)
- Portals: Sandia and the University of New Mexico continue to actively development of the
Portals Message Passing API, including improvements of the specification and implementations on commodity hardware.
Program Contact: Neil D. Pundit
Return to Top of the Page
Newsnotes | Info
and Events (internal - SNL only) | Open-Source
Software Downloads | Privacy
and Security
Sandia National
Laboratories Home Page - External or Internal
(SNL only)
Maintained by: Bernadette M. Watts
Modified on:
May 6, 2008
|