Center Projects

The National Center for Computational Sciences (NCCS) supports the open-source software process and is pleased to be able to make contributions to this effort.

Applications, Software, Tools, & Support

IOTA

Input/output (I/O) tuning and analysis tool for profiling applications. This work is funded through the National Leadership Computing Facility of the Department of Energy.

Lustre User Toolkit

The Lustre User Toolkit consists of two areas. The first area is application programming interfaces. To this end, we are pleased to release libLUT, an attempt to provide a simplified interface to critical application needs for communication with the Lustre filesystem. The second area is to provide utility applications that return capability functionality to the user. The first offering in this area spdcp, may be used in batch jobs or used to stage batch jobs from an interactive session to employ the compute capability of the cluster to effect copy of large datasets. The spdcp utility can effectively exploit multiple levels of parallelism in datasets to achieve the copy in much less wall clock time than if the Linux cp utility function is used. This work is funded through the National Leadership Computing Facility of the Department of Energy.

Other Resources

NCCS Spider Project Page

Introduction

With the increasing computing capabilities and multiple platforms of the NCCS, the clear need for a centralized and unified file system, available from all platforms, has emerged. The Spider project was initiated in late 2005 to investigate this centerwide centralized file-system approach.

Scope

Early on, Lustre was selected as the file system for the Spider project. Becaue Lustre was already being used on the Jaguar system (Cray XT3 and XT4), it was a natural choice. Expansions and upgrades to the Spider project are already planned to satisfy the increasing needs for bandwidth and capacity driven by the NCCS road map.

Current Status

As of early 2006, the Spider project was in late proof-of-concept phase. A 20+1 node cluster had been deployed and was running Lustre 1.4.7.3. Back-end disk storage was soon to be upgraded to ten DataDirect Networks (DDN) 8500 units with approximately 100 TB of storage capacity. Lustre service nodes (1 metadata server and 20 object storage servers) were to be connected to the DDN 8500s via direct 2 Gb Fibre Channel links. The aggregate block I/O bandwidth was to be around 10 GB/s. The host side network to the Lustre service node was then over 10G Ethernet. However, the use of InfiniBand as a centerwide fabric was also being evaluated. The end of this phase was scheduled to be mid-March 2007. By then, the Spider project was in quasi-production mode, serving select NCCS resources and users.

Future Plans

Late 2007/Early 2008
The goal for the late 2007/early 2008 time frame is to deliver around 50 GB/s of aggregate I/O bandwidth to the NCCS production systems. To reach this goal, procurements of storage devices and I/O servers will be required.

Late 2009
The goal for the 2009 time frame is to deliver more than 200 GB/s of aggregate I/O bandwidth to the NCCS production systems. By this time frame NCCS anticipates having a 1 PF Cray supercomputer on the floor and in production.

People

Shane Canon, Project Leader (canonrs@ornl.gov)
Sarp Oral, Testing and Evaluation (oralhs@ornl.gov)
David Vasil, System Administrator (dmvasil@ornl.gov)
Makia Minich, Testing and Networking (minich@ornl.gov)

News

  • Lustre Router Nodes (January 2006): The NCCS has commissioned CFS to plan and implement the Lustre Router Node concept by mid-2005. This work has been completed and rolled into standard Lustre releases (LNET) after extensive and successful field tests. With this additional capability, the NCCS Spider project now has the ability to serve an external Lustre file system to its Cray machines. This step was a key milestone on the Spider project road map.
  • Disk Procurement (late 2006): The NCCS has canceled its procurement plans to acquire approximately 12 GB/s and 150 TB capacity disk storage units. Instead, it has decided to use existing DDN 8500 on the Spider cluster. This storage was originally connected to the Cray XT3 system.
  • Disk Procurement (late 2007): The NCCS plans to acquire approximately 60 GB/s and 1 PB capacity block storage units. The Request for Proposal for this procurement has not been finalized and released yet.