Yun (Helen) He
Biographical Sketch
Helen has worked on investigating how large scale scientific applications can be run effectively and efficiently on massively parallel supercomputers: design parallel algorithms, develop and implement computing technologies for science applications. Some of her experiences include climate models, distributed components coupling libraries, parallel programming paradigms, scientific applications porting and benchmarking. Helen has been a staff member at Scientific Computing Group of Computational Research Division at LBNL before she joins USG. Helen has a Ph.D. in Marine Studies and an M.S in Computer Information Science, both from the University of Delaware.
Journal Articles
J. Levesque, J. Larkin, M. Foster, J. Glenski, G. Geissler, S. Whalen, B. Waldecker, J. Carter, D. Skinner, H. He, others, “Understanding and mitigating multicore performance issues on the amd opteron architecture”, January 1, 2007,
Conference Papers
Zhengji Zhao, Yun (Helen) He and Katie Antypas, “Cray Cluster Compatibility Mode on Hopper”, A paper presented in the Cray User Group meeting, Apri 29-May-3, 2012, Stuttgart, German., May 1, 2012,
Yun (Helen) He and Katie Antypas, “Running Large Jobs on a Cray XE6 System”, Cray User Group 2012 Meeting, Stuttgart, Germany, April 30, 2012,
P. M. Stewart, Y. He, “Benchmark Performance of Different Compilers on a Cray XE6”, Fairbanks, AK, CUG Proceedings, May 23, 2011,
- Download File: CUG2011CompilerPaper.pdf (pdf: 518 KB)
There are four different supported compilers on NERSC's recently acquired XE6, Hopper. Our users often request guidance from us in determining which compiler is best for a particular application. In this paper, we will describe the comparative performance of different compilers on several MPI benchmarks with different characteristics. For each compiler and benchmark, we will establish the best set of optimization arguments to the compiler.
K. Antypas, Y. He, “Transitioning Users from the Franklin XT4 System to the Hopper XE6 System”, CUG Procceedings, Fairbanks, Alaska, May 23, 2011,
- Download File: CUG2011Hopperpaper.pdf (pdf: 1.5 MB)
The Hopper XE6 system, NERSC’s first peta-flop system with over 153,000 cores has increased the computing hours available to the Department of Energy’s Office of Science users by more than a factor of 4. As NERSC users transition from the Franklin XT4 system with 4 cores per node to the Hopper XE6 system with 24 cores per node, they have had to adapt to a lower amount of memory per core and on- node I/O performance which does not scale up linearly with the number of cores per node. This paper will discuss Hopper’s usage during the “early user period” and examine the practical implications of running on a system with 24 cores per node, exploring advanced aprun and memory affinity options for typical NERSC applications as well as strategies to improve I/O performance.