Demonstration of High-Performance Infiniband Connections across the United States
As the scale of DOE science projects reaches beyond the terascale and approaches the petascale, the problem of data-locality is rapidly becoming a central impediment to scientific progress. The problem is that as data files approach peta-scale, they become almost immobile. One of the biggest challenges facing supercomputer users is being able to move the data files they produce from a supercomputer to a remote location. Building networks with sufficient bandwidth (tens or hundreds of Gigabits/sec) is only the start. The next hurdle is to design a system of networks that can support dedicated connections as has been demonstrated by DOE’s UltraScience Net (USN). The last step is actually moving data at rates that match network capacities. This is a problem because protocols have to be tuned to each connection length. If a supercomputer needs to talk to many different remote sites, tuning a protocol to many different distances presents a problem.
DOE (working in conjunction with Obsidian Research Inc.) has demonstrated an elegant solution to this problem based on a new transport system that combines both a protocol and an interface standard. The system, known as Infiniband, was developed originally as an ultra high-speed communications standard for use in computer rooms. For complex technical reasons having to do with the fact that it is credit-based rather than acknowledgement-based, it can be operated in a mode that doesn’t require tuning but requires stable, low-loss connections. In theory, this should allow it to deliver uniform performance independent of distance over suitably provisioned connections.
Because USN is based on switches, it can be used to establish loop-around connections that allow two computers sitting side-by-side to appear to each other as though they are thousands of miles apart. Using a series of such connections, ORNL has demonstrated that a pair of Infiniband equipped computers can achieve less than five percent variation in throughput while their separation distance was scaled from 0.2 mile to 8,600 miles. This is in dramatic contrast to the normal situation where performance would fall a factor of a hundred or more.
For more information, please contact:
William Wing
|