Normand Modine

BES Requirements Worksheet

1.1. Project Information - Center for Integrated Nanotechnology (CINT) - Theory and Simulation Thrust

Document Prepared By	Normand Modine
Project Title	Center for Integrated Nanotechnology (CINT) - Theory and Simulation Thrust
Principal Investigator	Normand Modine
Participating Organizations	Center for Integrated Nanotechnologies, Sandia National Laboratories, Los Alamos National Laboratory
Funding Agencies	DOE SC DOE NSA NSF NOAA NIH Other:

2. Project Summary & Scientific Objectives for the Next 5 Years

Please give a brief description of your project - highlighting its computational aspect - and outline its scientific objectives for the next 3-5 years. Please list one or two specific goals you hope to reach in 5 years.

The Center for Integrated Nanotechnologies (CINT) is a Department of Energy/Office of Science Nanoscale Science Research Center (NSRC), operating as a national user facility devoted to establishing the scientific principles that govern nanoscale integration. Nanoscale integration is defined as assembling diverse nanoscale materials across length scales to design and achieve new properties and functionality. The CINT Theory and Simulation of Nanoscale Phenomena thrust is the component of CINT dedicated to developing and applying theory to enable nanoscale integration. Two examples of specific goals that we hope to achieve over the next five years are developing an understanding of the factors that control the organization and properties of nanoparticles in lipid membranes and developing the ability to simulate coupled electrical and thermal transport in nanostructured systems.

3. Current HPC Usage and Methods

3a. Please list your current primary codes and their main mathematical methods and/or algorithms. Include quantities that characterize the size or scale of your simulations or numerical experiments; e.g., size of grid, number of particles, basis sets, etc. Also indicate how parallelism is expressed (e.g., MPI, OpenMP, MPI/OpenMP hybrid)

LAMMPS and Socorro are our main codes. BLAS, LAPACK, and FFTW are our main computational kernels. Parallelism is typically expressed using MPI.

3b. Please list known limitations, obstacles, and/or bottlenecks that currently limit your ability to perform simulations you would like to run. Is there anything specific to NERSC?

Long waits in queue and limited run times for jobs once they start are serious headaches if not actual bottle-necks.

3c. Please fill out the following table to the best of your ability. This table provides baseline data to help extrapolate to requirements for future years. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions.

Facilities Used or Using	NERSC OLCF ACLF NSF Centers Other: Sandia and Los Alamos institutional resources
Architectures Used	Cray XT IBM Power BlueGene Linux Cluster Other:
Total Computational Hours Used per Year	100000 hours per CINT scientist Core-Hours
NERSC Hours Used in 2009	0 Core-Hours
Number of Cores Used in Typical Production Run	100 to 1000
Wallclock Hours of Single Typical Production Run	100 to 1000 (if allowed by limit
Total Memory Used per Run	100 to 1000 GB
Minimum Memory Required per Core	1 GB
Total Data Read & Written per Run	GB
Size of Checkpoint File(s)	GB
Amount of Data Moved In/Out of NERSC	GB per
On-Line File Storage Required (For I/O from a Running Job)	GB and Files
Off-Line Archival Storage Required	GB and Files

Please list any required or important software, services, or infrastructure (beyond supercomputing and standard storage infrastructure) provided by HPC centers or system vendors.

compilers (including Fortran 90) and efficient and correct blas, lapack, fftw, and mpi libraries

4. HPC Requirements in 5 Years

4a. We are formulating the requirements for NERSC that will enable you to meet the goals you outlined in Section 2 above. Please fill out the following table to the best of your ability. If you are uncertain about any item, please use your best estimate to use as a starting point for discussions at the workshop.

Computational Hours Required per Year
Anticipated Number of Cores to be Used in a Typical Production Run
Anticipated Wallclock to be Used in a Typical Production Run Using the Number of Cores Given Above
Anticipated Total Memory Used per Run	GB
Anticipated Minimum Memory Required per Core	GB
Anticipated total data read & written per run	GB

Anticipated size of checkpoint file(s)	GB
Anticipated On-Line File Storage Required (For I/O from a Running Job)	GB and Files
Anticipated Amount of Data Moved In/Out of NERSC	GB per
Anticipated Off-Line Archival Storage Required	GB and Files

4b. What changes to codes, mathematical methods and/or algorithms do you anticipate will be needed to achieve this project's scientific objectives over the next 5 years.

Modifications to allow efficient use of GPUs should be possible and highly desirable. More efficient implementations of exact-exchange will likely be needed. Use of parallel FFTs at the finest grained level of parallelization would allow scaling to 10,000 cores.

4c. Please list any known or anticipated architectural requirements (e.g., 2 GB memory/core, interconnect latency < 3 #s).

1 GB minimum memory per core. Bandwidth to memory is very important for DFT performance.

4d. Please list any new software, services, or infrastructure support you will need over the next 5 years.

4e. It is believed that the dominant HPC architecture in the next 3-5 years will incorporate processing elements composed of 10s-1,000s of individual cores, perhaps GPUs or other accelerators. It is unlikely that a programming model based solely on MPI will be effective, or even supported, on these machines. Do you have a strategy for computing in such an environment? If so, please briefly describe it.

Performance of our codes depends largely on performance of the BLAS and FFTW libraries. If these run efficiently, our codes should run efficiently.

New Science With New Resources

To help us get a better understanding of the quantitative requirements we've asked for above, please tell us: What significant scientific progress could you achieve over the next 5 years with access to 50X the HPC resources you currently have access to at NERSC? What would be the benefits to your research field if you were given access to these kinds of resources?

Please explain what aspects of "expanded HPC resources" are important for your project (e.g., more CPU hours, more memory, more storage, more throughput for small jobs, ability to handle very large jobs).