LambdaRAM: A scalable high-performance, multi-dimensional, distributed cache for data-intensive applications over ultra high-speed networks
Participants: Venkatram Vishwanath
Electronic Visualization Laboratory
842 W. Taylor Street
Friday, October 26, 2007, 11:00 a.m.
LambdaRAM as proposed - is a scalable, high-performance, multi-dimensional, distributed cache that harnesses the memory of multiple clusters interconnected by ultra-high-speed networks and employs efficient latency mitigation heuristics to provide data-intensive applications with low latency, seamless access to local and remote data. Additionally, It enables time-critical data collaboration between applications over geographically distributed clusters.
Interactive real-time exploration and correlation of multi-terabyte and petabyte datasets from multiple sources has been identified as an critical enabler for scientists to glean new insights in a variety of disciplines vital for national security including climate modeling and prediction, biomedical imaging, geosciences and high energy physics.
Practically, these large-scale datasets must flow among a Grid of instruments, physical storage devices, visualization displays, and computational clusters. These applications are now realized by interconnecting Grid resources with dedicated networks dynamically created by concatenating optical lightpaths (lambdas). This is called a LambdaGrid.
Critical criterions for these data-intensive applications to achieve high-performance, include, low-latency access to local and remote data, and, time-critical data sharing between the applications running on geographically distributed clusters.
A typical LambdaGrid application is NASA’s Modeling, Prediction and Analysis (MAP) project, to analyze and predict tropical cyclones in the Atlantic basin, the global forecast models have substantial I/O latency in the analysis segment that causes the forecast computation to idle for 25-50%.
Additionally, these forecast models are currently unable to share their analyses in a time-critical manner with other regional models to make more informed decisions. LambdaRAM mitigates these latency bottlenecks associated with storage systems and remote data access over both local and wide-area networks by harnessing the memory of multiple clusters interconnected by ultra high-speed optical networks. Additionally, It employs latency mitigation heuristics, including prefetching, presending and hybrid heuristics, based on the access patterns of an application.
LambdaRAM enables time-critical data collaboration between applications over geographically distributed clusters by providing a shared cache over the LambdaGrid clusters.
Novel memory harnessing framework encompassing the memory of multiple clusters, spanning a gamut of ultra-high-speed networks including Local Area Networks (LAN), Metropolitan Area Networks (MAN) and Wide Area Networks (WAN). This would address the latency and data collaboration requirements of data-intensive applications based on a number of parameters, including, network topology, bandwidth, network latency, available memory, and application’s data access patterns.
A latency mitigation framework, incorporating pull-based, push-based and hybrid heuristics, to address the latency needs of Data-Intensive applications over ultra-high-speed LAN, MAN and WAN.
Scalable memory management heuristics and multi-dimensional distributed data structures addressing the multi-terabyte datasets of data-intensive applications.
LambdaRAM mitigates I/O latency and enables time-critical, high-performance data collaboration over both local and wide-area for data-intensive applications. In the NASA’s MAP project, LambdaRAM would result in faster weather prediction and improve the accuracy of the forecast by enabling models of higher complexity. Computational Chemistry, Genomics, Biomedical imaging suffer from similar bottlenecks and would benefit significantly from LambdaRAM.
LambdaRAM will help in the design of efficient I/O systems for petascale applications and in the design of efficient LambdaGrids for data-intensive applications.
Advisor: Jason Leigh
Committee Members: Robert Grossman, Andrew Johnson, Michael Seablom, Larry Smarr
Date: October 26, 2007