LambdaStream gets HPC wired
Participants: Alan Verlo, Jason Leigh, Lance Long, Maxine Brown, Thomas A. DeFanti, Venkatram Vishwanath, Linda Winkler, Thomas Hutton
Institutions: Argonne National Laboratory, San Diego Supercomuting Center, CalIT2, TeraGrid
Chicago - IL, San Diego - CA
LambdaStream Application-level Transport Protocol Sustains Nearly 10Gbps Over Routed and Non-Routed Infrastructures
As a number of large-scale, multinational experiments prepare to go online in the next 2-3 years, a new generation of data retrieval and transmission techniques and tools will be required. The data yielded by these experiments will be prolific, and a diverse, globally distributed community of scientists will be eager to acquire and explore this data.
Computer scientists at the University of Illinois at Chicago’s (UIC) Electronic Visualization Laboratory (EVL) are working to give application scientists “user-friendly” options for transferring large datasets over wide-area networks; one such research effort focuses on the development of specialized application-level transport protocols.
Today, most research and education networks are shared, and operate as “best effort”, with multiple routers, data paths and dynamic packet reassignment decisions that provide an equitable sharing among many users of bandwidth, but a less-than-optimal handling of cluster-computer-initiated data flows in the multi-hundred-megabit and multi-gigabit range. The current cost of routers makes it prohibitive to permanently optimize and dedicate a routed network infrastructure among specific sites that have large bursts of traffic, but are not utilizing the full bandwidth continuously. An alternative networking model, one which relies upon less costly switches, can be dedicated, and as such provide applications with known and knowable characteristics, such as guaranteed bandwidth (for data movement), guaranteed latency (for visualization, collaboration and data analysis) and guaranteed scheduling (for remote instruments).
EVL is conducting application-level protocol research so applications can maximally exploit the available bandwidth, whether a network is routed, switched or both (hybrid). A “fairness factor” is being designed into this new generation of protocols to avoid forcing the data through and interfering with existing traffic. “For the past five years, we have been designing specialized transport protocols for data-intensive interactive visualization and streaming media applications, where low latency and high reliability are of utmost importance,” said EVL co-director Jason Leigh.
This January, EVL researchers began testing whether data-intensive science can effectively use hybrid networks to move large files effectively using its specialized, application-level, UDP transport protocol called LambdaStream.
EVL ran cluster-to-cluster file-transfer tests to determine whether the same network performance obtained over Layer-2 (switched) networks could be obtained with dedicated 10Gbps paths on Layer-3 (routed) networks using LambdaStream. The tests specifically compared throughput over a TeraGrid 10Gbps SONET-routed network using MultiProtocol Label Switching (MPLS) between Chicago and San Diego, and CAVEwave, a 10Gbps switched LAN PHY network between the same locations.
The file-transfer programs invoked LambdaStream to send multiple 1Gbps data streams over each of these networks to saturate the 10Gbps links for 30-60 minute intervals. The dataset used was a 1-foot- resolution map of 5,000 square miles of the city of Chicago provided by the U.S. Geological Survey (USGS) National Center for Earth Resources Observation and Science (EROS). The map consists of 3,000 files of tiled images that are 75MBytes each, for a total of 220GBytes of information.
EVL performed three different LambdaStream tests over the TeraGrid and CAVEwave networks: moving data from memory to memory, from disk to memory, and from disk to disk; the three ways scientists typically use networks to access data. The measured performance across CAVEwave and TeraGrid was comparable, proving that a routed and non-routed infrastructure could both sustain near 10Gbps and behave the same when a specialized protocol is used.
The January tests also affirmed that a switched, end-to-end dedicated wavelength is a reliable alternative to shared networks when transferring high-throughput scientific application data.
As partners in the National Science Foundation’s OptIPuter project, EVL and the University of California, San Diego (UCSD) are proving it is economical to have point-to-point connections once you have the right endpoint technologies in place. “We are propagating a scalable, economical networking solution that puts high-performance networking resources into the control of individual scientists,” said Tom DeFanti, EVL co-director and OptIPuter program co-PI. “Lambda services offer scientists a networking means to solve problems that are cost prohibitive to do any other way.”
EVL is one of a few laboratories in the United States today that has access to its own persistent transcontinental 10GE optical connection, called the CAVEwave, on the National LambdaRail (NLR). CAVEwave extends from the StarLight optical Internet exchange in downtown Chicago, to the Pacific Northwest GigaPoP (PNWGP) in Seattle, to the UCSD campus. This link allows scientists at both ends to test experimental protocols and tools, and supports high-end networked application demonstrations.
LAMBDASTREAM EXPERIMENT DETAILS
A 30-node cluster at EVL connects to an in-house switch that aggregates traffic and sends it to a Force10 optical switch at StarLight. At UCSD, campus fibers connect the CAVEwave to a 28-node cluster in the Calit2 building. This configuration enables research among Chicago, San Diego, the University of Washington, and international colleagues who connect to the PNWGP via Pacific Wave on the U.S. west coast, or to StarLight in Chicago.
The StarLight Force10 switch that connects EVL to the CAVEwave also connects to the TeraGrid Juniper T640 router located at StarLight. The TeraGrid connects Chicago to Los Angeles to the San Diego Supercomputer Center at UCSD where, for this test, it was directly connected to the same OptIPuter cluster at Calit2. Since their physical fiber routes are not the same, the CAVEwave, going through Seattle, has an average round-trip-time (RTT) of 78.3ms, and the TeraGrid, a more direct path, has an average RTT of 56.2ms.
MRTG (Multi Router Traffic Grapher) was used to monitor traffic through StarLight’s Force10 switch, and a similar traffic utilization tool called Cricket was used to monitor traffic through the TeraGrid’s Juniper T640 router at StarLight. To determine the maximum performance, researchers first used the iperf bandwidth measurement tool to send the same data using the unreliable UDP protocol.
Using iperf, the maximum throughput achieved over the switched 10GE (10Gbps) CAVEwave network was 9.75Gbps, and the maximum throughput over the routed OC-192 (9.6Gbps) TeraGrid was 9.45Gbps. Using LambdaStream, CAVEwave achieved speeds of 9.23Gbps reliable memory-to-memory, 9.21Gbps reliable disk-to-memory, and 9.30Gbps reliable disk-to-disk transfers; and, TeraGrid achieved speeds of 9.16Gbps reliable memory-to-memory, 9.15Gbps reliable disk-to-memory, and 9.22Gbps reliable disk-to-disk transfers. Bidirectionally, CAVEwave achieved 18.19Gbps and TeraWave achieved 18.06Gbps doing reliable memory-to-memory transfers.
The MRTG-measured bandwidth speeds were verified by LambdaStream, which is instrumented to monitor the amount of information it sends and receives. Moreover, whereas MRTG measurements are five-minute averages, LambdaStream measurements are continuous, and therefore more accurate. For several years now, EVL has been designing application-level protocols and instrumenting them for the user’s awareness so scientists can have easy access to measurement of their own traffic. These protocols have been used over routed and switched networks but are particularly necessary when using Layer-1 optical switches, as bandwidth utilization cannot be measured any other way.
It should be noted that these tests relied on the availability of TeraGrid routers with OC-192 ports, which are an order of magnitude more expensive than 10GE ports on switches, not an issue if the routers already are in place, as in this case, but important to note for future network implementations to serve high-performance data transfer. These tests were performed on transcontinental networks; EVL researchers believe that similar results would be achieved using hybrid transoceanic networks as well.
The HPC wire article can be read at: www.hpcwire.com/hpc/600190.html
Date: March 22, 2006 - March 23, 2006