Contact
Christian Engelmann
Research and Development Staff Member
System Research Team, Computer Science Research Group
Computer Science and Mathematics Division, Oak Ridge National Laboratory
P.O. Box 2008, Oak Ridge, TN 37831-6173, USA
+1 (865) 574-3132 |
+1 (865) 576-5491 |
engelmannc@ornl.gov |
www.csm.ornl.gov/~engelman
Abstract
Christian Engelmann's work deals with software research and development for next-generation extreme-scale high-performance computing (HPC) systems. As part of the System Research Team at Oak Ridge National Laboratory (ORNL) and in collaboration with other laboratories and universities, Christian's research aims at providing high-level reliability, availability, and serviceability (RAS) for next-generation supercomputers to improve their resiliency (and ultimately efficiency) by performing research and development in novel high availability and fault tolerance system software solutions. Another area Christian Engelmann is focusing on is research and development in core system software technologies to enable "plug-and-play" supercomputing, which offers transparent portability of software to eliminate most of the software modifications caused by divers supercomputing platforms and supercomputing system upgrades.
Other, past research by Christian Engelmann included work on a pluggable lightweight heterogeneous Distributed Virtual Machine (DVM) environment, where clusters of personal computers, workstations, and supercomputers can be aggregated to form one giant DVM (in the spirit of its widely-used predecessor, Parallel Virtual Machine (PVM)). Further past work was part of a Cooperative Research and Development Agreement (CRADA) with IBM that focused on a new generation of scientific algorithms (super-scalable algorithms) to address the challenges in scalability and fault-tolerance for extreme-scale supercomputers, such as the IBM Blue Gene/L system.
News
Upcoming Presentations
Conference Deadlines
-
September 30: 3rd International Conference on Availability, Reliability and Security (ARES) 2009, Fukuoka, Japan, March 16-19, 2009.
-
February 9: 4th IEEE International Conference on Networking, Architecture, and Storage (NAS) 2009, Zhang Jia Jie, China, July 9-11, 2009.
Select Publications
Journal Publications ( Abstract,
Publication,
Citation,
DOI)
- Xubin (Ben) He, Li Ou, Martha J. Kosa, Stephen L. Scott, and Christian Engelmann.
A Unified Multiple-Level Cache for High Performance Cluster Storage Systems.
International Journal of High Performance Computing and Networking (IJHPCN),
volume 5,
number 1-2,
pages 97-109,
2007.
Inderscience Publishers, Geneve, Switzerland.
ISSN 1740-0562.
- Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He.
Symmetric Active/Active High Availability for High-Performance Computing System Services.
Journal of Computers (JCP),
volume 1,
number 8,
pages 43-54,
2006.
Academy Publisher, Oulu, Finland.
ISSN 1796-203X.
- Christian Engelmann, Stephen L. Scott, David E. Bernholdt, Narasimha R. Gottumukkala, Chokchai (Box) Leangsuksun, Jyothish Varma, Chao Wang, Frank Mueller, Aniruddha G. Shet, and Ponnuswamy (Saday) Sadayappan.
MOLAR: Adaptive Runtime Support for High-End Computing Operating and Runtime Systems.
ACM SIGOPS Operating Systems Review (OSR),
volume 40,
number 2,
pages 63-72,
2006.
ACM Press, New York, NY, USA.
ISSN 0163-5980.
Conference Publications ( Abstract,
Publication,
Presentation,
Citation,
DOI)
- Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott.
Proactive Process-Level Live Migration in HPC Environments.
In Proceedings of the IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2008,
Austin, TX, USA,
November 15-21,
2008.
ACM Press, New York, NY, USA.
To appear.
- Arun B. Nagarajan, Frank Mueller, Christian Engelmann, and Stephen L. Scott.
Proactive Fault Tolerance for HPC with Xen Virtualization.
In Proceedings of the 21st ACM International Conference on Supercomputing (ICS) 2007,
pages 23-32,
Seattle, WA, USA,
June 16-20,
2007.
ACM Press, New York, NY, USA.
ISBN 978-1-59593-768-1.
- Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott.
A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance.
In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2007,
pages 1-10,
Long Beach, CA, USA,
March 26-30,
2007.
ACM Press, New York, NY, USA.
ISBN 978-1-59593-768-1.
- Jyothish Varma, Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott.
Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems.
In Proceedings of the 20th ACM International Conference on Supercomputing (ICS) 2006,
pages 219-228,
Cairns, Australia,
June 28-30,
2006.
ACM Press, New York, NY, USA.
ISBN 1-59593-282-8.
- Christian Engelmann and George A. (Al) Geist.
Super-Scalable Algorithms for Computing on 100,000 Processors.
In Lecture Notes in Computer Science: Proceedings of the 5th International Conference on Computational Science (ICCS) 2005, Part I,
pages 313-320,
Atlanta, GA, USA,
May 22-25,
2005.
Springer Verlag, Berlin, Germany.
ISBN 978-3-540-26032-5.
ISSN 0302-9743.