Ian Foster, Michael E. Papka, and Rick Stevens
Mathematics and Computer Science Division
Argonne National Laboratory
Argonne, IL 60439 USA
{foster, papka, stevens}@mcs.anl.gov
Tim Kuhfuss
Electronics and Computing Technologies
Argonne National Laboratory
Argonne, IL 60439 USA
kuhfuss@anl.gov
This paper discusses the I-WAY project and provides an overview of the papers in this issue of IJSA. The I-WAY is an experimental environment for building distributed virtual reality applications and for exploring issues of distributed wide area resource management and scheduling. The goal of the I-WAY project is to enable researchers use multiple internetworked supercomputers and advanced visualization systems to conduct very large-scale computations. By connecting a dozen ATM testbeds, seventeen supercomputer centers, five virtual reality research sites, and over sixty applications groups, the I-WAY project has created an extremely diverse wide area environment for exploring advanced applications. This environment has provided a glimpse of the future for advanced scientific and engineering computing.
A major goal of the I-WAY project was to continue to work with the applications community to push the state of the art for distributed supercomputing. A primary thrust was applications that would use more than one supercomputer and one or more VR devices and that would begin exploring collaborative technologies to build shared virtual spaces in which to conduct computational science.
A related goal for the I-WAY project was to uncover the problems that are preventing widespread use of distributed supercomputing over ATM networks. Areas that were investigated included security, uniform computing environments, wide area scheduling and resource reservation, and distributed collaborative VR.
Future I-WAY research is focused on (1) making the I-WAY persistent, (2) understanding the performance constraints and requirement of applications, and (3) improving networking performance as seen by user applications codes.
Highlights of the I-WAY project include the following:
By using ATM Adaptation Layer 5 (AAL5), one of ATM's five adaptation levels, packets were sent across the I-WAY (Partridge, 1994). AAL5 enabled researchers to use standard TCP/IP protocols and tools. The I-WAY also supported direct ATM-oriented protocols for its users.
Much of the I-WAY's physical networking made use of existing smaller ATM research networks. These networks included the following:
During the Supercomputing '95 conference a networking operations center was set up at San Diego that was used to interconnect the testbeds. Several OC-3 lines were brought into the conference center, including one from vBNS, one from AT and two from Sprint.
The I-Soft system was designed to run on dedicated I-WAY point of presence (I-POP) machines.

Each I-POP is accessible via the Internet, but operates inside a site's firewall. An ATM interface allows it to monitor and, in principle, manage the site's ATM switch; it also allows the I-POP to use the ATM network for system management traffic. Site-specific implementations of a simple remote resource management interface allow I-POP systems to communicate with other machines at the site to allocate resources to users, start processes, and so forth. The Andrew distributed file system (AFS) is used as a repository for system software and status information.
A critical component of the software environment was the distributed scheduler. Political and technical constraints made it infeasible to provide a single ``I-WAY scheduler'' to replace the schedulers that are already in place at various sites. Instead, we implemented a two-part strategy that allowed administrators to configure dedicated resources into virtual machines, and allowed users to request time on particular virtual machines. The strategy involved a (1) central scheduler daemon that managed and allocated time on the different virtual machines on a first-come, first-served basis, and (2) a local scheduler daemon communicating directly with the local site scheduler. Local schedulers performed site-dependent actions in response to requests from the central scheduler to allocate resources, create processes, and deallocate resources (Foster et al., 1996b).
Security is a major and multifaceted issue in I-WAY--like systems. Authentication to I-POPs was handled by using a telnet client modified to use Kerberos authentication and encryption. This approach avoided the need for passing passwords in clear text over the network. The scheduler software kept track of which user id was to be used at each site for a particular I-WAY user and served as an ``authentication proxy,'' performing subsequent authentication to other I-WAY resources on the user's behalf. This proxy service was invoked each time a user used the command language described above to allocate computational resources or to create processes.
A variety of parallel programming tools were provided on the I-WAY system. The challenge here was to enable users to exploit the features of the various computers, and yet to hide the unnecessary details of networks and systems. To this end, we adapted the Nexus multithreaded communication library (Foster et al., 1994, 1996a) to execute in the I-WAY environment. Nexus supports automatic configuration mechanisms that allow it to use information contained in resource databases to determine which startup mechanisms, network interfaces, and protocols to use in different situations. Several other libraries, notably the CAVEcomm virtual reality library (Disz et al., 1995a, 1995b) and the MPICH (Gropp et al., 1996) implementation of MPI, were extended to use Nexus mechanisms.
The linked sites provided access to a wide variety of supercomputers, including the Thinking Machines CM-5, SGI Power Challenge Array, IBM SP2, Cray T3D and C90, and Intel Paragon. The computer power represented the largest ensemble of distributed computing resources ever interconnected on a high-performance wide area network.




The article, ``Collaborative Virtual Environments Used in the Design of Pollution Control Systems'' by Diachin et al. describes the development of a virtual reality environment for the interactive analysis and design of injective pollution control systems. Several mechanisms for the real-time computation and visualization of particle dynamics in fluid flow and spray simulations are described. The required computations are performed on an IBM SP2 processor and communicated to one or more CAVE Automatic Virtual Environments using the CAVEcomm message passing library. The paper analyzes both the computational and communication costs for typical interactions with the model and examines the network bandwidth required for real-time response.
The article ``Performance Modeling of Interactive, Immersive Virtual Environments for Finite Element Simulations'' by Taylor et al. analyzes the components of the lag time resulting from the execution of a finite element simulation on the IBM SP2 supercomputer at Argonne and displayed in an interactive virtual environment at Supercomputing '95 in San Diego. The paper presents results comparing the I-WAY network with that of the traditional Internet. In addition, the paper discusses the components of end-to-end lag.
The article ``The Chesapeake Bay Virtual Environment (CBVE): Initial Results from the Prototypical System'' by Wheless et al. discusses the use of virtual environments to visualize ecological datasets. The datasets are generated by numerical supercomputer simulations that provide insight into the ecological impact of physical and biological interactions.
The article ``Implementing a Collaboratory for Microscopic Digital Anatomy'' by Young et al. introduces a system that will allow biologists around the world to acquire, over the network, data taken from an electron microscope. Using remote high-performance computers, biologists can process and enhance three-dimensional datasets and then view the results on their local display device.
The article ``Interactive Scientific Exploration of Gyrofluid Tokamak Turbulence'' by Kerbel et al. examines the use of an interactive visual environment coupled to high-performance computations. The paper discusses previous experience with such coupling efforts and discusses the demonstration at Supercomputing '95. The system connects a Cray T3D running the physics simulations and postprocessing visualization code to an interactive virtual environment.
The article ``Exploring Coupled Atmosphere-Ocean Models Using Vis5D'' by Hibbard et al. discusses the use of a distributed client/server system for viewing large datasets. Using an IBM SP2 at Argonne National Laboratory as a server, researchers fed data across the I-WAY to a pair of SGI Onyx machines at Supercomputing '95. The SGI machines are used to process the graphics for display in the CAVE, where viewers are allowed to interact with the data.
The article ``Radio Synthesis Imaging -- A Grand Challenge HPCC Project'' by Crutcher et al. discusses the issues of real-time transfer and archiving of data from remote telescopes, the processing of data on supercomputers for image improvement, and the final archiving of images into a digital library. In addition, the paper discusses these issues as they applied to the demonstration at Supercomputing '95.
The article ``Early Experiences with Distributed Supercomputing on I-WAY: First Principles Materials Science and Parallel Acoustic Wave Propagation'' by Geist et al. introduces the details and results of their materials science simulation and seismic simulation. The paper also describes the library used to steer and visualize both applications.
The article ``Galaxies Collide on the I-WAY: An Example of Heterogeneous Wide-Area Collaborative Supercomputing'' by Norman et al. describes the attempt to use the resources of the NSF supercomputing centers as a metacomputer. The paper discusses the development of a simulation that is independent of the host architecture and that takes advantage of resources on the local compute servers.
It also required meeting an extremely tight deadline. The project was initiated in April 1995; by May, the concept had been demonstrated with a transcontinental virtual underwater experiment; by fall, parallel languages, scaleable Unix tools, graphics libraries, and communications software had been integrated into a cohesive environment. And by December 1995, more than sixty high-performance applications were successfully demonstrated at the Supercomputing '95 conference.
We are now working to address some of the critical research issues identified in I-WAY project. The Globus project is addressing issues of resource location (computational resource brokers), automatic configuration, scaleable trust management, and high-performance distributed file systems. In addition, we and others are defining and constructing future I-WAY--like systems that will provide further opportunities to evaluate management and application programming systems.