NERSCPowering Scientific Discovery Since 1974

Shane Canon

Shane-Cannon.jpg
Shane Richard Canon
Group Leader, Technology Integration Group, National Energy Research Scientific Computing Center
Phone: (510) 486-7024 , Fax: (510) 486-4316
Lawrence Berkeley National Laboratory
1 Cyclotron Road
Mail Stop 943-256
Berkeley, CA 94720 US

Biographical Sketch

Shane Canon joined NERSC in 2000 to serve as a system administrator for the PDSF cluster.  While working with PDSF he gained experience in cluster administration, batch systems, parallel file systems and the Linux kenrel.  In 2005, Shane left LBNL to take a position as Group Leader at Oak Ridge National Laboratory.  One of the more significant accomplishments while at ORNL was architecting the 10 petabyte Spider File System.  In 2008, Shane returned to NERSC to lead the Data Systems Group.  In 2009, he transitioned to leading the newly created Technology Integration Group in order to focus on the Magellan Project and other areas of strategic focus.  Shane has a Ph.D in Physics from Duke University and B.S. in Physics from Auburn University.

Conference Papers

Ghoshal, Devarshi and Canon, Richard Shane and Ramakrishnan, Lavanya, “Understanding I/O Performance of Virtualized Cloud Environments”, The Second International Workshop on Data Intensive Computing in the Clouds (DataCloud-SC11), 2011,

We compare the I/O performance using IOR benchmarks on two cloud computing platforms - Amazon and the Magellan cloud testbed.

Lavanya Ramakrishnan, Richard Shane Canon, Krishna Muriki, Iwona Sakrejda, and Nicholas J. Wright., “Evaluating Interconnect and Virtualization Performance for High Performance Computing”, Proceedings of 2nd International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS11), 2011,

In this paper we detail benchmarking results that characterize the virtualization overhead and its impact on performance. We also examine the performance of various interconnect technologies with a view to understanding the performance impacts of various choices. Our results show that virtualization can have a significant impact upon performance, with at least a 60% performance penalty. We also show that less capable interconnect technologies can have a significant impact upon performance of typical HPC applications. We also evaluate the performance of the Amazon Cluster compute instance and show that it performs approximately equivalently to a 10G Ethernet cluster at low core counts.

Lavanya Ramakrishnan, Piotr T. Zbiegel, Scott Campbell, Rick Bradshaw, Richard Shane Canon, Susan Coghlan, Iwona Sakrejda, Narayan Desai, Tina Declerck, Anping Liu, “Magellan: Experiences from a Science Cloud”, Proceedings of the 2nd International Workshop on Scientific Cloud Computing, ACM ScienceCloud '11, Boulder, Colorado, and New York, NY, 2011, 49 - 58,

Neal Master, Matthew Andrews, Jason Hick, Shane Canon, Nicholas J. Wright, “Performance Analysis of Commodity and Enterprise Class Flash Devices”, Petascale Data Storage Workshop (PDSW), November 2010,

Keith R. Jackson, Ramakrishnan, Muriki, Canon, Cholia, Shalf, J. Wasserman, Nicholas J. Wright, “Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud”, CloudCom, Bloomington, Indiana, January 1, 2010, 159-168,

Lavanya Ramakrishnan, R. Jackson, Canon, Cholia, John Shalf, “Defining future platform requirements for e-Science clouds”, SoCC, New York, NY, USA, 2010, 101-106,

Kesheng Wu, Kamesh Madduri, Shane Canon, “Multi-Level Bitmap Indexes for Flash Memory Storage”, IDEAS '10: Proceedings of the Fourteenth International Database Engineering and Applications Symposium, Montreal, QC, Canada, 2010,

Presentation/Talks

Richard Shane Canon, Magellan Project: Clouds for Science?, Coalition for Academic Scientific Computation, February 29, 2012,

This presentation gives a brief overview of the Magellan Project and some of its findings.

Richard Shane Canon, Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities, Metagenomics Informatics Challenges Workshop, October 12, 2011,

Lavanya Ramakrishnan & Shane Canon, NERSC, Hadoop and Pig Overview, October 2011,

The MapReduce programming model and its open source implementation Hadoop is gaining traction in the scientific community for addressing the needs of data focused scientific applications. The requirements of these scientific applications are significantly different from the web 2.0 applications that have  traditionally used Hadoop. The tutorial  will provide an overview of Hadoop technologies, discuss some use cases of Hadoop for science and present the programming challenges with using Hadoop for legacy applications. Participants will access the Hadoop system at NERSC for the hands-on component of the tutorial.

Shane Canon, Debunking Some Common Misconceptions of Science in the Cloud, ScienceCloud 2011, June 29, 2011,

This presentation addressed five common misconceptions of cloud computing including: clouds are simple to use and don’t require system administrators; my job will run immediately in the cloud; clouds are more efficient; clouds allow you to ride Moore’s Law without additional investment; commercial Clouds are much cheaper than operating your own system.

Richard Shane Canon, Cosmic Computing: Supporting the Science of the Planck Space Based Telescope, LISA 2009, November 5, 2009,

The scientific community is creating data at an ever-increasing rate. Large-scale experimental devices such as high-energy collider facilities and advanced telescopes generate petabytes of data a year. These immense data streams stretch the limits of the storage systems and of their administrators. The Planck project, a space-based telescope designed to study the Cosmic Microwave Background, is a case in point. Launched in May 2009, the Planck satellite will generate a data stream requiring a network of storage and computational resources to store and analyze the data. This talk will present an overview of the Planck project, including the motivation and mission, the collaboration, and the terrestrial resources supporting it. It will describe the data flow and network of computer resources in detail and will discuss how the various systems are managed. Finally, it will highlight some of the present and future challenges in managing a large-scale data system.

Reports

Katherine Yelick, Susan Coghlan, Brent Draney, Richard Shane Canon, Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Anping Liu, Scott Campbell, Piotr T. Zbiegiel, Tina Declerck, Paul Rich, “The Magellan Report on Cloud Computing for Science”, U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research (ASCR), December 2011,