Oak Ridge Leadership Computing Facility

Overview

Purpose:

The purpose of this Symposium is to advance our understanding of how extreme-scale hybrid-computing architectures are accelerating progress in scientific research.

Motivated by society’s great need for advances in energy technologies, and by the demonstrated achievements and tremendous potential for computational science and engineering, a consortium of major supercomputing centers (Oak Ridge Leadership Computing Facility, National Center for Supercomputing Applications, and Swiss National Supercomputing Centre) will hold a symposium on March 29–30, 2012 in Washington, D.C. Attendees will discuss how computational science on extreme-scale hybrid-computing architectures will advance research and development in this decade, increase our understanding of the natural world, accelerate innovation, and, as a result, increase economic opportunity.

Accelerating Computational Science Symposium 2012 Attendee List Final

Confirmed Speakers:

Jackie Chen (Sandia)
Bill Tang (Princeton)
Jeroen Tromp (Princeton)
Olaf Schenk (Lugano)
David Dean (ORNL)
Jack Dongarra (UTK)
Ray Grout (NREL)
Doug Kothe (ORNL)
Jeremy Smith (UTK)

Chris Mundy (PNNL)
Joost VandeVondele (Zürich)
David Ceperley (U. Illinois at Urbana-Champaign)
Jeongnim Kim (ORNL)
Tom Evans (ORNL)
Stephane Ethier (Princeton)
Eric Lindahl (Stockholm University)
Chris Baker (ORNL)
Jim Phillips (U. Illinois at Urbana-Champaign)

Hosts:

Jim Hack (Oak Ridge)
Rob Pennington (NCSA)
Thomas Schulthess (Zürich)

The agenda will include leading experts speaking in plenary lectures and discussion panels. In addition, the organizers invite posters on the symposium theme to be contributed by participants. Information on the poster session is available on the symposium website.

Sponsors:

National Center for Supercomputing Applications

Agenda

Accelerating Computational Science Symposium 2012 (ACSS 2012) March 29-30, 2012
Agenda
Wednesday, March 28, 2012
5:30 – 8:30 p.m.	Early Registration and Reception Meeting Room: Dupont Foyer	Jack Wells, ORNL
Thursday, March 29, 2012
8:30 – 8:45	Symposium Welcome and Purpose Meeting Room:Dupont F, G, & H (Working Breakfast Provided)	Jim Hack, ORNLRob Pennington, NCSA Thomas Schulthess, CSCS
SESSION 1: Chair, Jim Hack, ORNL
8:45 – 9:15	DIRECT NUMERICAL SIMULATION OF TURBULENCECHEMISTRY INTERACTIONS: FUNDAMENTAL SCIENCE TOWARDS PREDICTIVE MODELS	Jackie Chen, SNL
9:15 – 9:45	S3D Direct Numerical Simulation	Ray Grout, NREL
9:45 – 10:15	Fusion Energy Sciences & Computing at the Extreme Scale	Bill Tang, PPPL
10:15 – 10:45	Gyrokinetic PIC Simulation	Stephane Ethier, PPPL
10:45 – 11:15	Mid-Morning Break
SESSION 2: Chair, Rob Pennington, UIUC
11:15 – 11:45	Toward Global Seismic Imaging based on Spectral-Element and Adjoint Methods Simulation Video (.MOV)	Jeroen Tromp, Princeton University
11:45 – 12:15	Large-Scale Seismic Imaging on HPC Architectures: Applications, Algorithms and Software	Olaf Schenk, University of Basel

12:15 – 13:00	Hybrid Multicore Computing in the FutureWorking Lunch Meeting Room: Georgetown Ballroom	Jeff Nichols, ORNL
SESSION 3: Thomas Schulthess, ETH
13:00 – 13:30	Nuclear Energy Mod & Sim – CASL Project	Doug Kothe^*, ORNL
13:30 – 14:00	DENOVO Radiation Transport	Tom Evans, ORNL
14:00 – 14:30	Computing Nuclei: Present status, future prospects	David Dean, ORNL
14:30 – 15:00	Lattice QCD Experiences on TitanDEV	Balint Joo, JLab
15:00 – 15:20	Afternoon Break
SESSION 4
15:20– 16:10	Panel on Accelerating Atmospheric, Ocean, and Sea Ice Models	Jim Hack, Chair, ORNL
	Challenges in Accelerating Ocean and Ice Models	Phil Jones, LANL
	Accelerating CAM-SE for hybrid GPU Architectures	Matt Norman, ORNL
	Experience Applying GPU Compilers to Atmospheric Modeling	Tom Henderson, NOAA
16:10 – 17:00	Math Libraries	Jack Wells, Chair, ORNL
	Not Your Father’s Math Library MAGMA for Dense Matrix Problems	Jack Dongarra, University of Tennessee
	R&D in Trilinos for Emerging Parallel Systems	Chris Baker, ORNL
	CUDA Library Overview	Ujval Kapasi, NVIDIA
SESSION 5
17:00 – 19:00	Contributed Poster SessionMeeting Room: Georgetown Ballroom
Friday, March 30, 2012
SESSION 6 (Chair is Jim Hack)
8:30 – 9:00	Progress and Prospects in Extreme Scale Supercomputing in Biology, Bioenergy, and MedicineMeeting Room: Dupont F, G, & H	Jeremy Smith, University of Tennessee/ORNL
9:00 – 9:30	Current & Future Exascale MD Challenges from the GROMACS Perspective	Eric Lindahl, Stockholm University
9:30 – 10:00	Scalable Molecular Dynamics with NAMD	Jim Phillips, UIUC
10:00 – 10:30	Mid-Morning Break
SESSION 7, Chair is Thomas Schulthess)
10:30 – 11:00	High Performance Computing in the Chemical Sciences	Chris Mundy, PNNL
11:00 – 11:30	Large Scale and Hybrid Computing with CP2K	Joost VandeVondele, University of Zurich
11:30 – 12:00	Breakthrough Simulations of Condensed Matter Systems	David Ceperley, University of Illinois
12:00 – 12:30	QMCPACK: enabling breakthrough QMC simulations on leadership computing facilities	Jeongnim Kim, ORNL
12:30 – 1:15	Working LunchMeeting Room: Georgetown Ballroom
SESSION 8
13:15 – 14:00	Symposium Summary	Jim Hack, ORNLRob Pennington, NCSAThomas Schulthess, CSCS

* To be confirmed

Hotel/Directions

Hotel and Reservation Information

Hotel: Washington Marriott, 1221 22nd St. NW, Washington, DC 20037 (Directions)
Room Block Reference: Accelerating Computational Science Symposium
Method of Reservations: Input your passcode (ac8ac8a) for on-line reservations here (click on link listed below). Attendees may also call the following numbers and reference the room block name:

800-228-9290 – Marriott worldwide
202-872-1500 – Washington Marriott direct
877-212-5752 – Passkey toll free

Check-in: Wednesday, March 28
Check-out: Friday, March 30
Rate: GSA of $224
Cut-off date for making reservations: March 7

Online Room Reservations: Click here to be make reservations. (You will be redirected to the Washington Marriott website with negotiated rate code and dates already entered in the appropriate field).

Registration

Oops! We could not locate your form.

Poster Abstracts

Automatic Generation of FFT Libraries for GPUs, Christos Angelopoulos, Carnegie Mellon University

In this poster we present an extension of the Spiral code generation system to GPUs. We address the key problems of GPU memory hierarchy and parallelism, and we introduce a variety of FFT algorithms which avoid shared memory bank conflicts without wasting space using padding and optimized global memory bandwidth transfer with minimum register allocation even in low occupancy. We demonstrate high performance results against cuFFT 1-D and 2-D DFTs for single precision. This research is still in progress, but at the moment we are able to match and beat cuFFT library on sizes we have generated optimized code.

Accelerating CAM-SE on Hybrid Multi-Core Systems, Rick Archibald, Oak Ridge National Laboratory

With the commissioning of hybrid multicore HPC systems, such as the Cray XK6 Jaguar/Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), we are bridging the way to the exascale era of high performance computing which will extend the resolution of regional climate predictions with realistic clouds and chemistry. In order to realize the full potential of Titan and similar hardware, we are enhancing the Community Earth System Model (CESM) to take advantage of the GPU accelerators. Here, we discuss the adaptation of High Order Multiscale Modeling Environment (HOMME) in the Community Atmospheric Model(CAM). By adapting CAM-SE to exploit the GPU accelerators, we expect to simulate one model year/day with full chemistry using 106 tracers

Molecular Dynamics with LAMMPS for Hybrid High Performance Computers, W. Michael Brown, Oak Ridge National Laboratory

We present software developments in the LAMMPS molecular dynamics package that allow for efficient utilization of accelerators on hybrid high performance computers. We present benchmark results for solid-state, biological, and mesoscopic systems on the hybrid Cray XK6 supercomputer. We present results from early science efforts and future work to enhance LAMMPS towards simulation capabilities that can match the size and time-scales of experiment.

Accelerating DENOVO Radiation Transport Calculations for Multi-Scale Nuclear Energy Applications, Wayne Joubert, Oak Ridge National Laboratory

Denovo is a 3-D discrete ordinates radiation transport code developed at ORNL with applications to nuclear system safety and design. Denovo, due to its high computational resource requirements and strategic science goals, has been a focus for early porting efforts to the ORNL accelerator-based system, Titan. As a result of modifications to the key Denovo algorithms, substantial performance increases have been demonstrated on current accelerator hardware. Early Cray XK6 results with Denovo on NVIDIA Fermi X2090 GPUs indicate good performance on current accelerator hardware as well as good predicted performance on upcoming NVIDIA Fermi processors.

Speaker Abstracts

Accelerating Computational Science Symposium 2012 (ACSS 2012)

Speaker Abstracts

Thursday, March 29, 2012:

Direct Numerical Simulation of Turbulence-Chemistry Interactions: Fundamental
Insights Towards Predictive Models
Jacqueline H. Chen
Sandia National Laboratories

Recent petascale direct numerical simulation (DNS) of turbulent combustion have transformed our ability to interrogate fine-grained ‘turbulence-chemistry’ interactions in canonical laboratory configurations. In particular, three-dimensional DNS, at moderate Reynolds numbers and with complex chemistry, is providing unprecedented levels of detail to understand fundamental coupling between turbulence, mixing and reaction. This information is leading to new physical insight and is providing unique validation data for assessing model assumptions in coarse-grained engineering CFD approaches used to design modern combustors. The role of petascale DNS is illustrated through selected
examples relevant to controlling ignition and combustion rates in homogeneous charge compression ignition engines and to fuel injection processes in stationary gas turbines for power generation. Petascale simulations presently generate upwards of a petabyte of complex, multi-scale, time-varying data used by combustion modelers to validate subfilter combustion and mixing models in large-eddy simulation. With the advent of 10-20 petaflop hybrid architectures with accelerators like Titan at Oak Ridge National Laboratory, it will be possible to dramatically increase the chemical complexity of DNS. This will help accelerate the development of predictive subprocess models which will be used by engine developers to better understand and tailor the combustion of gasoline and new, more complex types of fuels in advanced engines. With Titan, simulations will move beyond today’s studies of simple fuels—hydrogen, syngas and methane—to more complex, larger-molecule hydrocarbon fuels like isooctane (a surrogate for gasoline), commercially important oxygenated alcohols (for example, ethanol and butanol), and biofuel surrogates.

Fusion Energy Sciences & Computing at the Extreme Scale
William M. Tang
Princeton University, Princeton Plasma Physics Laboratory, USA

Advanced computing is generally recognized to be an increasingly vital tool for accelerating progress in scientific research in the 21st Century. The imperative is to translate the combination of the rapid advances in super-computing power together with the emergence of effective new algorithms and computational methodologies to help enable corresponding increases in the physics fidelity and the performance of the scientific codes used to model complex physical systems. If properly validated against experimental measurements and verified with mathematical tests and computational benchmarks, these codes can provide reliable predictive capability for the behavior of fusion energy relevant high temperature plasmas. The fusion energy sciences community has made excellent progress in developing advanced codes for which computer run-time and problem size scale well with the number of processors on massively parallel supercomputers. A good example is the effective usage of the full power of modern leadership class computational platforms from the terascale to the petascale and beyond to produce nonlinear particle-in-cell simulations which have accelerated progress in understanding the nature of plasma turbulence in magnetically-confined high temperature plasmas. Illustrative results provide great encouragement for being able to include increasingly realistic dynamics in extreme-scale computing campaigns to enable predictive simulations with unprecedented physics fidelity.

S3D Direct Numerical Simulation – Preparations for the 10-100PF era
Ray Grout
National Renewable Energy Laboratory

The evolution of supercomputing into the mid-petaflop era has been typified by heterogenous compute nodes with the majority of the compute capability delivered by a large number of lightweight cores. In order to prepare for the extension of this trend, the DNS code S3D has been retooled in anticipation of a target architecture offering 10s of thousands of heterogeneous nodes containing many X86 cores as well as GPU derived accelerators. Movement of outer loops to the highest level in the code facilitates hybrid MPI-OpenMP performance and an elegant path to accelerated kernels using OpenACC. It is anticipated that relevant scientific simulations at this scale will have a per-node
footprint that can be contained entirely on the accelerator, so provision is made to maintain primary solution variables in accelerator memory with specific regions moved to the CPU for inter-node communication and workload balancing. With the current performance it is estimated that the new code will make it possible to meet early science goals with the full build-out of the anticipated Titan system as well as provide a platform to transition into the exascale software research space.

Toward Global Seismic Imaging Based on Spectral-Element and Adjoint Methods
Jeroen Tromp
Princeton University

Precise information about the structure of the solid Earth comes from seismograms recorded at the surface of a highly heterogeneous lithosphere. Seismic imaging based on spectral-element and adjoint methods can assimilate this information into three-dimensional models of elastic and anelastic structure. These methods fully account for the physics of wave excitation, propagation, and interaction by numerically solving the inhomogeneous equations of motion for a heterogeneous anelastic solid. Such methods require the execution of complex computational procedures that challenge the most advanced high-performance computing systems. Current research is petascale; future research will require exascale capabilities. We illustrate the current state-of-the-art based on an inversion for European upper-mantle structure. Our ultimate goal is to move toward “adjoint tomography” of the entire planet.

Large-Scale Seismic Imaging on HPC Architectures: Applications, Algorithms and Software
Prof. Olaf Schenk
Institute of Computational Science, University of Lugano, Switzerland

One of the outstanding challenges of computational sciences is large-scale nonlinear parameter estimation in seismic imaging. These seismic inverse problems are known as PDE-constrained optimization problems and are significantly more difficult to solve than the PDE forward problems. The inverse medium problem consists in reconstructing the characteristics of the medium from partial (and often noisy) observations. In doing so, a nonlinear functional is minimized, which involves both the misfit to the measurements and a Tikhonov-type regularization term to tackle the inherent ill-posedness. Hence inverse medium problems are naturally formulated as PDE-constrained optimization problems, where the numerical solution of the PDE itself, the forward problem, is but a fraction of the entire process. In addition, achieving scalability for the optimization process on tens of thousands of multicore processors is a task that offers many research challenges. We will address these issues and will present parallel results both in two and three space dimensions from two earth science applications codes (SPECFEM, AWP-ODC) that illustrate the usefulness of the approach.

Modeling and Simulation Challenges and Benefits for Nuclear Power: Impacting Key Phenomena in Pressurized Water Reactor Cores
Doug Kothe
Oak Ridge National Laboratory

The Consortium for Advanced Simulation of Light Water Reactors (CASL) is the first U.S. Department of Energy (DOE) Energy Innovation Hub, established in July 2010 for the modeling and simulation (M&S) of nuclear reactors. CASL applies existing M&S capabilities and develops advanced capabilities to create a usable environment for predictive simulation of light water reactors (LWRs). This environment, designated the Virtual Environment for Reactor Applications (VERA), incorporates science-based models, state-of-the-art numerical methods, modern computational science and engineering practices, and uncertainty quantification (UQ) and validation against data from operating pressurized water reactors, single-effect experiments, and integral tests. With VERA as its vehicle, CASL develops and applies models, methods, data, and understanding while addressing three critical areas of performance for nuclear power plants (NPPs): reducing capital and operating costs per unit of energy by enabling power uprates and lifetime extension for existing NPPs and by increasing the rated powers and lifetimes of new Generation III+ NPPs; reducing nuclear waste volume generated by enabling higher fuel burnup, and enhancing nuclear safety by enabling high-fidelity predictive capability for component performance through the onset of failure. The CASL vision – its future – is to confidently predict the safe, reliable, and economically competitive performance of nuclear reactors, through comprehensive, science-based modeling and simulation technology that is deployed and applied broadly throughout the
nuclear energy enterprise. To achieve this vision, CASL’s mission – its raison d’être – is to provide forefront and usable modeling and simulation capabilities needed to address LWR operational and safety performance-limiting phenomena.

Computing Nuclei: Present Status, Future Prospects
David Dean
Oak Ridge National Laboratory

Nuclei comprise 99.9% of all baryonic matter in the universe and are the fuel that burns in stars. The rather complex nature of the nuclear forces among protons and neutrons generates a broad range and diversity in the nuclear phenomena that we observe. As we have seen during the last decade, developing a comprehensive description of all nuclei requires theoretical and experimental investigations of rare isotopes with unusual neutron-to-proton ratios very different from 1 in light or 1.5 in heavier nuclei. We call these nuclei exotic, or rare, because they are not typically found on earth. They are difficult to produce experimentally since they can have extremely short lifetimes. The goal of a comprehensive description of nuclei and their reactions represents one of the great intellectual opportunities for physics today and requires extensive computational capabilities. I will describe in this talk applications of nuclear coupled-cluster techniques for descriptions of isotopic chains, and applications of nuclear Density Functional Theory applied to determine the limits of existence of nuclei. In both cases, we are benefiting from the application of petascale ccomputing and will benefit from hybrid architectures in the future. I will also discuss a 5-year computational development path for the area.

Lattice QCD Experiences on TitanDev
Balint Joo
Jefferson Laboratory

We present our recent experiences with lattice QCD on TitanDev, showing strong scaling results to 768 GPUs. We compare the performance of the GPU based solver with the current CPU based Chroma code. We summarize prospects for production running on a large accelerated system in the 2013-2014 time frame.

Friday, March 30, 2012:

Progress and Prospects in Extreme Scale Supercomputing in Biology, Bioenergy and Medicine
Jeremy C. Smith
Oak Ridge National Laboratory.

The ramping up of the National Leadership Computing Facility at ORNL brings new opportunities for advances in biology, bioenergy and drug discovery. Examples will be given of scaling and applications in multimillion-atom molecular dynamics simulation of cellulosic ethanol systems in the framework of the DOE Bioenergy Science Center, and in the virtual docking of millions of compounds to potential drug targets. We describe strategies for optimal utilization of exascale supercomputing in biomolecular research, and describe the types of problems that might then be within reach.

Current Status and Future Roadmap to Efficient Exascale Simulation of Small Biological Molecules
Erik Lindahl
Stockholm University

Over the last decade, molecular dynamics simulation has evolved from a severely limited esoteric method into a cornerstone of many fields, in particular structural biology where it is now just as established NMR or X-ray crystallography. Here, I will discuss the recent development, successes and challenges in GROMACS when it comes to efficient parallel molecular simulations, in particular for scaling and execution on heterogeneous accelerator architectures.

A central challenge for the entire community is that we frequently achieve scaling by going to increasingly larger model systems, reaching millions or even hundreds of millions of particles. While this has resulted in some impressive showcases, the problem for structural biology is that most molecules of biological interest (e.g. membrane proteins) only require maybe 250,000 atoms. Since we would still like to reach simulation timescales many orders of magnitude larger than today this would lead to unreasonable requirements for strong scalability. I will discuss our current approaches to address this by automated parallel adaptive molecular dynamics in our new open Copernicus framework that has been designed to be independent of GROMACS. This combine efficient strong scaling with ensemble parallelism, and even makes it possible to achieve sampling of slow dynamics that frequently is even more efficient than special-purpose hardware.

Scalable Molecular Dynamics with NAMD
Jim Phillips
NCSA (UIUC)

This talk will present a status update on NAMD and Charm++ scaling and acceleration on Cray XE6/XK6, a review of NAMD 2.9 including a new replica exchange implementation, and an overview of petascale simulation opportunities in large biomolecular aggregates such as the HIV virus capsid model now running on Blue Waters at NCSA.

High Performance Computing in the Chemical Sciences
Chris Mundy
PNNL

I will discuss the science that is currently supported by established BES programs at PNNL. I will discuss the science challenges of performing research using first-principles interaction potentials in conjunction with statistical mechanics. I will highlight successes of using Leadership Class computing through the INCITE program to the aforementioned scientific problems and future research directions to be addressed with high performance computing.

Large Scale and Hybrid Computing with CP2K
Joost VandeVondele
Nanoscale Simulations, ETH Zurich

Novel materials, precisely tuned chemicals, and a deep understanding of processes at a (sub-)nanoscale level are crucial to deal with challenges put forward by society, including energy, environment and health. CP2K is a package that aims at describing at an atomic level the chemical and electronic processes in complex systems, and can hence contribute to addressing these challenges by providing insight from computer simulation. The next generation hardware will accelerate this research. However, this will require that domain scientists, in collaboration with system experts, design novel and powerful algorithms that are carefully implemented to exploit the strengths of the new hardware. Here, I will present two recent efforts by the CP2K team, which aim at1) simulating larger systems (10’000-1’000’000s of atoms) using GGA DFT 2) increasing the accuracy for condensed phase systems beyond GGA DFT using functionals that include Hartree-Fock-like exchange and MP2-like correlation. Early results on using accelerators in a massively parallel context will be discussed.

Breakthrough Simulations of Condensed Matter Systems
David Ceperley
University of Illinois Urbana-Champaign

Recently, we have developed methods to couple the simulation of the quantum electron degrees of freedom of a microscopic system with the ionic degrees of freedom, without evoking the approximations involved in density functional theory or semi-empirical potentials. This allows an unprecedented description of all condensed matter systems. We give as an example work underway on the various phases of dense hydrogen, important for the understanding of the giant planets and for experiments at the National Ignition Facility (NIF) and elsewhere. We will review what could be done on the next generation of high performance computers with the next generation of algorithms. An outstanding goal of our future work is a complete and accurate description of water, important since water is the key to understanding much of biophysical processes.

QMCPACK: Enabling Breakthrough QMC Simulations on Leadership Computing Facilities
Jeongnim Kim
Oak Ridge National Laboratory

Recent innovations in QMC, new forms of correlated wave functions, efficient and robust optimization methods, and numerical techniques, have allowed QMC to reach a high degree of accuracy at a manageable cost and complexity of the solutions. The resources at DOE leadership computing facilities have allowed QMC to seek fundamental understanding of materials properties, from atoms to liquids, at unprecedented accuracy and time-to-solution. We have developed QMCPACK to enable QMC studies of the electronic structure of realistic systems, while fully taking advantage of such theoretical development and ever-growing computing capabilities and capacity. We will review our recent progress in QMC algorithms and their implementations in QMCPACK, including accelerations using GPUs, and layout our plans for the emerging architectures.

Oak Ridge Leadership Computing Facility

hpss

lens

smoky

Accelerating Computational Science Symposium 2012 (ACSS 2012)