SC2008 ORNL Banner

ORNL participation in SC2008 committees

ORNL participation in SC2008 tutorials

S06: Introduction to Scientific Workflow Management

Presenters:
Ilkay Altintas (San Diego Supercomputer Center)
Mladen Vouk (North Carolina State University)
Scott A. Klasky (Oak Ridge National Laboratory)
Norbert Podhorszki (Oak Ridge National Laboratory)

Tutorials Session 08:30AM - 05:00PM Room TBD

Abstract:

A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Scientific workflow systems provide graphical user interfaces to combine different technologies along with efficient methods for using them, and thus increase the efficiency of the scientists. This tutorial provides an introduction to scientific workflow construction and management (Part I) and includes a detailed hands-on session (Part II) using the Kepler system. It is in-tended for an audience with a computational science background. It will cover principles and foundations of scientific workflows, Kepler environment installation, workflow construction out of the available Kepler library components, and workflow execution management that uses Kepler facilities to provide process and data monitoring and provenance information, as well as high speed data movement solutions. This tutorial also incorporates hands-on exercises and application examples from different scientific disciplines.

 

ORNL participation in SC2008 technical papers

Wide-Area Performance Profiling of 10GigE and Infiniband Technologies

Authors:
Nageswara S. V. Rao (Oak Ridge National Laboratory)
Weikuan Yu (Oak Ridge National Laboratory)
William R. Wing (Oak Ridge National Laboratory)
Stephen W. Poole (Oak Ridge National Laboratory)
Jeffrey S. Vetter (Oak Ridge National Laboratory)

Papers Session - Networks
02:00PM - 02:30PM Room Ballroom F

Abstract:

For wide-area high-performance applications, light-paths provide 10Gbps connectivity, and multi-core hosts with PCI-Express can drive such data rates. However, sustaining such end-to-end throughputs across connections of thousands of miles remains challenging, and the current performance studies of such solutions are very limited. We present an experimental study of two solutions based on different technologies to achieve such throughputs: (a) 10Gbps Ethernet with TCP/IP transport protocols, and (b) InfiniBand and its wide-area extensions. For both, we generate performance profiles over 10Gbps connections of lengths up to 8600 miles, and discuss the components, complexity, optimizations and limitations of sustaining such throughputs, using various connections and host configurations. IB solution is better suited for applications with a single large flow, and 10GigE solution is better for those with multiple competing flows. Furthermore, wide-area IB enables applications to transparently operate across thousands of miles at multiple Gbps rates using native IB-based message passing protocol.

Early Evaluation of BlueGene/P

Authors:
Sadaf Alam (Oak Ridge National Laboratory)
Richard Barrett (Oak Ridge National Laboratory)
Michael Bast (Oak Ridge National Laboratory)
Mark Fahey (Oak Ridge National Laboratory)
Jeffrey Kuehn (Oak Ridge National Laboratory)
Collin McCurdy (Oak Ridge National Laboratory)
James Rogers (Oak Ridge National Laboratory)
Philip Roth (Oak Ridge National Laboratory)
Ramanan Sankaran (Oak Ridge National Laboratory)
Jeffrey Vetter (Oak Ridge National Laboratory)
Patrick Worley (Oak Ridge National Laboratory)
Weikuan Yu (Oak Ridge National Laboratory)

Papers Session - Large-Scale System Performance
04:00PM - 04:30PM Room Ballroom E

Abstract:

BlueGene/P (BGP) is the second generation BlueGene architecture from IBM, succeeding BlueGene/L (BGL). BGP is a system-on-a-chip (SoC) design that uses four PowerPC 450 cores operating at 850 MHz with a double precision, dual pipe floating point unit per core. These chips are connected with multiple interconnection networks including a 3-D torus, a global collective network, and a global barrier network. In this paper, we report on our examination of BGP, presented in the context of a set of important scientific applications, and as it compares to other major large scale supercomputers in use today. Our investigation confirms that BGP has good scalability with an expected lower performance per processor when compared to the Cray XT4’s Opteron. We also find that BGP uses very low power per floating point operation for certain kernels, yet it has less of a power advantage when considering science-based metrics for mission applications.

Proactive Process-Level Live Migration in HPC Environments

Authors:
Chao Wang (North Carolina State University)
Frank Mueller (North Carolina State University)
Christian Engelmann (Oak Ridge National Laboratory)
Stephen L. Scott (Oak Ridge National Laboratory)

Papers Session - Scheduling
11:30AM - 12:00PM Room Ballroom F

Abstract:

As the number of nodes in high-performance computing environments keeps increasing, faults are becoming common place. Reactive fault tolerance (FT) often does not scale due to massive I/O requirements and relies on manual job resubmission. This work complements reactive with proactive FT at the process level. Through health monitoring, a subset of node failures can be anticipated when one's health deteriorates. A novel process-level live migration mechanism supports continued execution of applications during much of processes migration. This scheme is integrated into an MPI execution environment to transparently sustain health-inflicted node failures, which eradicates the need to restart and requeue MPI jobs. Experiments indicate that 1-6.5 seconds of prior warning are required to successfully trigger live process migration while similar operating system virtualization mechanisms require 13-24 seconds. This self-healing approach complements reactive FT by nearly cutting the number of checkpoints in half when 70\% of the faults are handled proactively.

High Performance Multivariate Visual Data Exploration for Extremely Large Data

Authors:
Oliver Rübel (Lawrence Berkeley National Laboratory)
Mr Prabhat (Lawrence Berkeley National Laboratory)
Kesheng Wu (Lawrence Berkeley National Laboratory)
Hank Childs (Lawrence Livermore National Laboratory)
Jeremy Meredith (Oak Ridge National Laboratory)
Cameron Geddes (Lawrence Berkeley National Laboratory)
Sean Ahern (Oak Ridge National Laboratory)
Gunther Weber (Lawrence Berkeley National Laboratory)
Hans Hagen (University of Kaiserslautern)
Bernd Hamann (University of California, Davis)
E. Wes Bethel (Lawrence Berkeley National Laboratory)
Estelle Cormier-Michel (Lawrence Berkeley National Laboratory)
Peter Messmer (Tech-X Corporation)


Papers Session - Visualization and Data Management
02:30PM - 03:00PM Room Ballroom G

Abstract:

One of the central challenges in modern science is the ability to quickly derive knowledge and understanding from large, complex collections of data. We present a novel approach to solve this problem that combines and extends techniques from high performance visual data analysis and scientific data management. This approach is demonstrated within the context of gaining insight from complex, time-varying datasets produced by a laser wakefield accelerator simulation. Our approach leverages histogram-based parallel coordinates for both visual information display as well as a vehicle for guiding a data mining operation. Data extraction and subsetting is implemented with state-of-the-art index/query technology. This approach, while applied here to accelerator science, is generally applicable to a broad set of science applications, and is implemented in a production-quality visual data analysis infrastructure. We conduct an extensive performance analysis and demonstrate good scalability on a distributed memory Cray XT4 system.

 

ORNL participation in SC2008 panels

ORNL participation in SC2008 workshops

Nuclear Energy Advanced Modeling and Simulation: Enhancing Climate and Energy Security Opportunities

Organizers:
Alex R. Larzelere (DOE)
Gil Weigand (Oak Ridge National Laboratory)

Workshops Session
08:30AM - 05:00PM Room 13A/13B

Abstract:

Achieving a secure world energy future requires utilizing a diverse portfolio of technologies that must include fission nuclear energy. This has to be done with the recognition that safety, proliferation, environmental issues related to nuclear waste, and public acceptance are of major concern. Future nuclear energy systems will require a science based understanding of the physical processes associated with nuclear energy systems have not been done. Stepping up to a science based approach requires understanding and using basic physical principles and applying them to larger, more complex, integrated systems through advanced numerical modeling and simulation tools. These tools must be tightly linked to experimental results to ensure that the insights gained are an accurate reflection of reality. This workshop will explore the technical issues with building this level of capability and explore possible ways to overcome the significant challenges.

ORNL participation in SC2008 Education Outreach

Information on current employment opportunities and on student programs (RAMS, internships, summer semesters, etc.) ORNL is sponsoring two students for the Education Program: will be available in the ORNL booth -- Debbie McCoy, contact.

ORNL participation in SC2008 MSI Outreach

Debbie McCoy will be participating in the MSI meetings.

ORNL participation in SC2008 BOFs

The 2008 HPC Challenge Awards

Primary Session Leader:
Jeremy Kepner (MIT Lincoln Laboratory)

Secondary Session Leader:
Piotr Luszczek (University of Tennessee)

Birds-of-a-Feather Session
12:15PM - 01:15PM Room 14

Abstract:

The 2008 HPC Challenge Awards will be given in two classes: Class 1 and 2. Class 1: Best Performance awards best run submitted to the HPC Challenge website. Since there are multiple tests, "best" is a subjective term. It has been decided by the committee that winners in four categories will be announced: HPL, Global-RandomAccess, EP-STREAM-Triad per system and Global-FFT. Class 2: Most Elegant awards implementation of three or more of the HPC Challenge benchmarks with special emphasis being placed on: HPL, Global-RandomAccess, STREAM-Triad and Global-FFT. This award would be weighted 50% on performance and 50% on code elegance/clarity/size. Competition in Class 1 offers a rich view of contemporary supercomputers as they compete for supremacy not just in one category but in four. Class 2, on the other hand, offers a glimpse into high end programming technologies and the effectiveness of their implementation.

Coordinated Fault Tolerance in High-end Computing Environments

Primary Session Leader:
Pete Beckman (Argonne National Laboratory)

Secondary Session Leaders:
Rinku Gupta (Argonne National Laboratory)
Al Geist (Oak Ridge National Laboratory)

Birds-of-a-Feather Session
12:15PM - 01:15PM Room Ballroom G

Abstract:

The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS) initiative provides a standard framework, through the Fault Tolerance Backplane (FTB), where any component of the software stack can report or be notified of faults through a common interface - thus enabling coordinated fault tolerance and recovery. At SC'07, we had an enthusiastic audience of industry leaders, academia, and research institutions participate in the CIFTS BOF. Expanding on our previous success, the objectives of the SC'08 BOF are: 1. Discuss the experiences gained, challenges faced in comprehensive fault management on petascale leadership machines, and the impact of the CIFTS framework in this environment. Teams developing FTB-enabled software such as MVAPICH2, MPICH2, Open MPI, Cobalt, and others, will share their experiences. 2. Discuss the recent enhancements and planned developments for CIFTS and solicit audience feedback. 3. Bring together individuals responsible for high-end, petascale computing infrastructures, who have an interest in developing fault tolerance specifically for these environments.

OSCAR Community Meeting

Primary Session Leader:
Geoffroy Vallee (Oak Ridge National Laboratory)

Secondary Session Leader:
Stephen Scott (Oak Ridge National Laboratory)

Birds-of-a-Feather Session
05:30PM - 07:00PM Room 14

Abstract:

Since the first public release in 2001, there have been well over 200,000 downloads of the Open Source Cluster Application Resources (OSCAR) software stack. OSCAR is a self-extracting cluster configuration, installation, maintenance, and operation suite consisting of "best known practices" for cluster computing. OSCAR has been used on highly ranking clusters in the TOP500 list and is available in both a freely downloadable version as well as commercially supported instantiations. The OSCAR team is comprised of an international group of developers from research laboratories, universities, and industry cooperating in the open source effort. As it has for the past six years at SuperComputing, the OSCAR BoF will be a focal point for the OSCAR community at SuperComputing (SC08), where both developers and users may gather to discuss the "current state" as well as future directions for the OSCAR software stack. New and potential users and developers are welcome.

The Growing Need for Resilience in HPC Software

Primary Session Leader:
Gregory M. Thorson (SGI)

Secondary Session Leaders:
John T. Daly (Los Alamos National Laboratory)
Stephen L Scott (Oak Ridge National Laboratory)

Birds-of-a-Feather Session
12:15PM - 01:15PM Room Ballroom E

Abstract:

Low reliability in petascale systems and the desire for low-cost HPC hardware are increasing the need for resilience in software. Reliability can come through redundancy, but this is often expensive; hence the need for systems and application software resiliency. Due to the search engine market, new techniques have been developed for automatically restarting failed processes. Although useful for some areas of HPC, other algorithms are so fundamentally coupled, that an individual thread or process cannot simply be restarted from the beginning. This is where check pointing has traditionally been deployed. However with the size of systems on the drawing board today, it appears that check pointing, as currently implemented, is quickly reaching the end of its feasibility. A dialogue within the HPC community is needed to explore when resilience techniques are required and which ones are appropriate. We must then work together to lower the barrier to adoption of these techniques.

Open MPI State of the Union

Primary Session Leader:
Jeff Squyres (Cisco Systems)

Secondary Session Leader:
George Bosilca (University of Tennessee, Knoxville)

Birds-of-a-Feather Session
12:15PM - 01:15PM Room 14

Abstract:

With the advent of peta-scale computing, the Open MPI community takes the challenge of extreme scalability very seriously. As part of the software stack that powered Los Alamos's RoadRunner machine to achieve a petaflop several months ago, several "extreme scalability" features were developed and added to the production code base. A team effort of open source development spanning research, academia, and industry was responsible for the historic achievement. Come hear how we achieved the petaflop, where Open MPI has been, where we're going, and how you can join us. The meeting will consist of three parts: 1. Members of the Open MPI core development team will present the current status of Open MPI. 2. Discussion of The Petaflop. 3. Discuss the Open MPI roadmap, to include possible future directions for Open MPI, and actively soliciting feedback from real-world MPI users and ISVs with MPI-based products (please bring your suggestions!).

OpenSHMEM: SHMEM for the rest of the world

Primary Session Leader:
Steve Poole (Oak Ridge National Laboratory)

Secondary Session Leader:
Lauren Smith (US Dept of Defense)

Birds-of-a-Feather Session
05:30PM - 07:00PM Room 19A/19B

Abstract:

The SHMEM parallel programming library is an easy-to-use programming model currently available on several parallel architectures. The SHMEM model uses fast one-sided communication techniques and is highly efficient on globally addressable shared or distributed memory systems. Several dialects of SHMEM are available and with each new generation of systems, there appears to be a greater divergence in SHMEM libraries. This BOF will be led by researchers from commercial entities as well as government departments. SHMEM is trademarked by SGI. The purpose of this meeting is to gather the potential users, developers and applications people where the definition of a new open organization and new open source OpenSHMEM specification building upon the existing SGI API. We expect to see potential additions and to have this API ported to a large variety of platforms. Technical components shall include a specification, validation / conformance test suites, tutorials and a reference implementation.

MPI Forum: The New MPI 2.1 Standard, and Progress toward MPI 2.2 and 3.0

Primary Session Leader:
Rich Graham (Oak Ridge National Laboratory)

Secondary Session Leader:
George Bosilca (University of Tennessee, Knoxville)

Birds-of-a-Feather Session
05:30PM - 07:00PM Room Ballroom G

Abstract:

The Message Passing Interface (MPI) 2.0 standard has served the parallel technical and scientific applications community well over the last decade, and has become the ubiquitous communications API for this community. New technical challenges, such as the emergence of high performance RDMA network support, the need to address scalability at the Peta-Scale order of magnitude, fault-tolerance at scale, and the many-core crisis, require us to rethink MPI's support for this community, and others. This work will be encapsulated in MPI-30. The MPI Forum, chaird by Richard Graham from Oak Ridge National Laboratory, is meeting to revise the MPI standard. MPI-2.1, which consolidates previous documents, has just been ratified. It is also working on near-term incremental changes for MPI-2.2, and longer term on bigger changes for MPI-3.0. This BoF will provide an overview of standardization efforts, provide an opportunity for feedback, and a means to of joining this process.

 

ORNL participation in SC2008 posters


Acceleration of Quantum Monte Carlo Applications on Emerging Computing Platforms

Author:
Akila Gothandaraman (University of Tennessee, Knoxville)

ACM Student Competition Session
05:15PM - 07:00PM Room Rotunda Lobby

Abstract:

Recent technological advances have led to a number of emerging platforms, such as multi-core processors, reconfigurable computing (RC), and graphics processing units (GPUs), which can boost the performance of scientific applications. This work explores RC and GPU based platforms to exploit the best features of each of these platforms for a Quantum Monte Carlo (QMC) application. We have demonstrated a speedup of 25x for the FPGA accelerated kernels over the software-only QMC application on the Cray XD1 HPRC platform. Here, we provide an outline of the computationally intensive kernels of the QMC application and a preliminary analytical performance model, which we are extending to enable us to optimize the use of the emerging computational resources to accelerate the kernels. We will also report on porting the kernels to the nVidia Tesla GPUs, allowing us to exploit the tremendous data parallelism of these platforms for our "inherently parallel" Monte Carlo simulations.

Analyzing Failure Events on ORNL’s Cray XT4

Authors:
Byung-Hoon Park (Oak Ridge National Laboratory)
Ziming Zheng (Illinois Institute of Technology)
Zhilling Lan (Illinois Institute of Technology)
Al Geist (Oak Ridge National Laboratory)

Posters Session
05:15PM - 07:00PM Room Rotunda Lobby

Abstract:

Detection and diagnosis of failures in a supercomputer are challenging, but crucial to improve the reliability, availability, and serviceability (RAS). As the initial step toward this end, we consider to uncover correlated system events that have similar occurrence patterns. Such event set will constitute characteristic signatures of a machine in various states. In this study, we analyzed Cray system log data collected on Cray XT4 at Oak Ridge National Laboratory between May and December 2007. We then applied statistical and data mining approach to sift statically and temporarily correlated event types. From the analysis, we report (1) burst occurrence patterns found in fatal system events, (2) cumulative hazard functions that best fit the occurrences of fatal events, (3) correlated fatal events suggested by association rule mining and lift measure, and (4) sets of both fatal and non-fatal events that are suggested to have causal relationships by cross correlation analysis.

Multiresolution Analysis, Computational Chemistry, and Implications for High Productivity Parallel Programming

Authors:
Aniruddha G. Shet (Oak Ridge National Laboratory)
James Dinan (Ohio State University)
Robert J. Harrison (Oak Ridge National Laboratory)
P. Sadayappan (Ohio State University)

Posters Session
05:15PM - 07:00PM Room Rotunda Lobby

Abstract:

Multiresolution Analysis is a technique for approximating a continuous function as a hierarchy of coefficients over a set of basis functions. This hierarchy is naturally represented as a tree data structure which is used to perform fast computations with guaranteed precision by trading numerical accuracy for computation time. Emerging many-core clusters and existing parallel programming tools pose significant challenges to the efficient and scalable implementation of this type of application due to the issues of data distribution, locality, and load balancing. Global view, high productivity programming languages, such as Chapel, provide new tools for expressing and managing irregular and distributed structures, such as those that arise in multiresolution codes. We investigate the benefits and challenges of expressing this class of applications in Chapel through Madness, a framework for multiresolution computational chemistry. Through Madness, we demonstrate the role of key Chapel language features in separating and managing parallel programming concerns.

Modeling Assertions for Petascale Applications and Systems

Author:
Heike Jagode (University of Tennessee, Knoxville)

Posters Session
05:15PM - 07:00PM Room Rotunda Lobby

Abstract:

Emerging Petaflops scale platforms at the DOE leadership computing sites, although architecturally distinct, share a number of programming and scaling challenges. By far, the most obvious obstacle is scalability to thousands of processing cores (homogeneous and heterogeneous) interconnected using a regular custom network. Petascale applications are expected to exploit hierarchical parallelism available within and across processing units using MPI sub-communicator decomposition and hybrid MPI and shared-memory programming paradigms. Existing parallel workload and performance modeling strategies do not capture hierarchical parallelism; hence have limited capability for yielding high fidelity application models and projections for Petascale systems. We address these requirements by extending the Modeling Assertions (MA) framework to enable scientific code developers to generate application and architecture aware symbolic models efficiently. Parameterized communication workload models of DOE and INCITE applications demonstrate the new features. Moreover, MA models allow exploring the workload parameter space of applications for Petascale problems and system configurations.

Enhancing the Performance of Dense Linear Algebra Solvers on GPU's

Authors:
Marc Baboulin (University of Coimbra)
James Demmel (University of California, Berkeley)
Jack Dongarra (University of Tennessee, Knoxville)
Stanimire Tomov (University of Tennessee, Knoxville)

Vasily Volkov (University of California, Berkeley)

Posters Session
05:15PM - 07:00PM Room Rotunda Lobby

Abstract:

As the peak performance for GPUs has reached 1 teraflop and support for double precision arithmetic has been added, the appeal for GPUs for general purpose HPC has become even higher. Our poster explores various ways to speed up linear algebra solvers on GPUs. The design principles are characterized by BLAS level parallelism and hybrid CPU-GPU calculations. We discuss several approaches to minimize the cost of pivoting for the LU factorization including data structure optimization and randomization techniques. We also emphasize the use of mixed precision iterative refinement technique which allows the overall performance of single precision on the GPU's to be used while still obtaining the solution to an accuracy that is inherent for double precision calculations. We provide experimental results using NVIDIA's next-generation 'G90' GPU for linear system solvers using LU, Cholesky and QR factorizations and we mention some possible use for linear least squares problems.

Acceleration of Quantum Monte Carlo Applications on Emerging Computing Platforms

Author:
Akila Gothandaraman (University of Tennessee, Knoxville)

ACM Student Competition Session
11:42AM - 12:00PM Room 17A/17B

Abstract:

Recent technological advances have led to a number of emerging platforms, such as multi-core processors, reconfigurable computing (RC), and graphics processing units (GPUs), which can boost the performance of scientific applications. This work explores RC and GPU based platforms to exploit the best features of each of these platforms for a Quantum Monte Carlo (QMC) application. We have demonstrated a speedup of 25x for the FPGA accelerated kernels over the software-only QMC application on the Cray XD1 HPRC platform. Here, we provide an outline of the computationally intensive kernels of the QMC application and a preliminary analytical performance model, which we are extending to enable us to optimize the use of the emerging computational resources to accelerate the kernels. We will also report on porting the kernels to the nVidia Tesla GPUs, allowing us to exploit the tremendous data parallelism of these platforms for our "inherently parallel" Monte Carlo simulations.

Gordon Bell Finalists

New algorithm to enable 400+ TFlop/s sustained performance in simulations of disorder effects in high-Tc superconductors

Authors:
Gonzalo Alvarez (Oak Ridge National Laboratory)
Michael S. Summers (Oak Ridge National Laboratory)
Don E. Maxwell (Oak Ridge National Laboratory)
Markus Eisenbach (Oak Ridge National Laboratory)
Jeremy S. Meredith (Oak Ridge National Laboratory)

Jeffrey M. Larkin (Cray Inc.)
John M. Levesque (Cray Inc.)
Thomas A. Maier (Oak Ridge National Laboratory)
Paul R. Kent (Oak Ridge National Laboratory)
Eduardo D'Azevedo (Oak Ridge National Laboratory)
Thomas C. Schulthess (Oak Ridge National Laboratory)

ACM Gordon Bell Finalists Session
04:00PM - 04:30PM Room Ballroom G

Abstract:

Staggering computational and algorithmic advances in recent years now make possible systematic Quantum Monte Carlo simulations of high temperature superconductivity in a microscopic model, the two dimensional Hubbard model, with parameters relevant to the cuprate materials. Here we report the algorithmic and computational advances that enable us to study the effect of disorder and nano-scale inhomogeneities on the pair-formation and the superconducting transition temperature. Significant algorithmic improvements have been made to make effective use of current supercomputing architectures. By implementing delayed Monte Carlo updates and a mixed single/double precision method, we are able to dramatically accelerate the time to solution. On the Cray XT4 systems of the Oak Ridge National Laboratory, for example, we currently reach a sustained performance of 409 TFlop/s on 49 thousand cores. We present here a study of how random disorder in the effective Coulomb interaction strength affects the superconducting transition temperature in the Hubbard model.

 

ORNL participation in SC2008 awards won

ORNL kiosk presentations


  ORNL | Directorate | CSM | NCCS | ORNL Disclaimer | Search
Staff only: CSM computers | who, what, where? | news  
URL: http://www.csm.ornl.gov/SC2008/index.html
Updated: Monday, 03-Nov-2008 12:00:47 EST

webmaster