Thomas A. DeFanti, Daniel J. Sandin, Gary Lindahl, and Maxine D. Brown Electronic Visualization Laboratory University of Illinois at Chicago.
The Electronic Visualization Laboratory (EVL) at the University of Illinois at Chicago (UIC) specializes in virtual reality (VR) and scientific visualization research. EVL is a major potential beneficiary of guaranteed latency and bandwidth promised by cell switch networking technology as the current shared Internet lines are inadequate for doing VR-to-VR, VR-to-supercomputer, and VR-to-supercomputer-to-VR research. EVL's computer scientists are working with their colleagues at Argonne National Laboratory (ANL) and the National Center for Supercomputing Applications (NCSA) to develop an infrastructure that enables computational scientists to apply VR, networking and scalable computing to problem solving. ATM and other optical networking schemes usher in a whole new era of sustainable high-speed networking capable of supporting telecollaboration among computational scientists and computer scientists in the immersive visual/multi-sensory domain.
Why does VR require high bandwidth and low latency?
Can you imagine driving a car if the response to steering were 10 seconds delayed? Worse, what if you didn't know how long the delay was, or what if it varied from one second to the next? It would be tough to get insurance. Known and guaranteed latency is important for interactive problem solving. The response time of the Internet is not currently dependable. The ATM networks, however, promise to provide us with guaranteed latency.
VR technology has the potential of enabling users to interact with their data, view the results of their simulation codes, and collaborate with other scientists at geographically disparate locations. In order for users to immerse themselves in interactive feedback situations that rely upon networking, latency and bandwidth must be guaranteed.
VR terminology
The following terminology is specific to the EVL's development efforts, which focus on VR projection-based technology. There are other VR devices, notably the head-mounted display (HMD) and binocular omni-oriented monitors (BOOM). Much of the discussion below can be applied to HMD and BOOM technologies as well. The CAVE, ImmersaDesk, and I-Wall are EVL research projects. EVL collaborates with NCSA and the Mathematics and Computer Science Division of ANL to further develop VR as a scientific discovery and communications tool within the High Performance Computing and Communciations (HPCC) community. EVL continues to advance visualization and VR research and development, and collaborates with NCSA and ANL on VR application and toolkit development, with emphasis on supercomputing and networking research. EVL is developing the I-Wall in collaboration with the University of Minnesota and NCSA.
CAVE
The CAVE is a multi-person, room-sized, high-resolution, immersive 3D video
and audio environment. The CAVE is a room 10x10x9 feet inside, made up
of three rear-projection screens for walls and a down-projection screen
for the floor. Electrohome Marquis 8000 projectors throw full-color workstation
fields (1024 x 768 stereo resolution) at 96Hz onto the screens, giving
greater than a 2,000x2,000 pixel resolution to the surrounding composite
image. Computer-controlled audio provides a directional sonification capability.
The user's head and hand are tracked with Ascension tethered electromagnetic
sensors. Stereographics' LCD stereo shutter glasses are used to separate
the alternate fields, thus providing stereo vision. A Silicon Graphics
Inc. (SGI) Onyx with three Reality Engine2's is used to create the imagery
that is projected onto three of the four walls; a second Onyx is used for
the fourth wall in the current configuration. Both ScramNet and HIPPI networking
are used locally between the Onyxes to synchronize screen updates and data
distribution.
ImmersaDesk
The ImmersaDesk is a drafting-table format virtual reality device. Using
stereo glasses and tethered electromagnetic head and hand tracking similar
to the CAVE, this projection-based system offers a type of virtual reality
that is semi-immersive. The ImmersaDesk features a 4x5-foot rear-projected
screen at a 45-degree angle. The size and position of the screen give a
sufficiently wide-angle view and the ability to look down as well as forward.
The resolution is 1024x768 at 96Hz. The ImmersaDesk's chief feature is
simplicity - it uses only one Reality Engine2 with a deskside Onyx. In
addition, it was carefully engineered to fold up, fit through a standard
door, and roll on its own wheels for rapid deployment.
I-Wall
The I-Wall is a large-screen, high-resolution (2560x2048) stereo projection
display. The I-Wall uses four Reality Engine2s spread across two Onyx racks.
The I-Wall has four rear-projected screens in a 2x2 arrangement, each driven
by a Reality Engine2 displaying field sequential stereo images. The same
active stereo glasses used in the CAVE and ImmersaDesk systems are worn
by viewers and the same trackers are currently used. The I-Wall achieves
its immersion by wide-screen projection, but does not allow, unfortunately,
a way to look down, a problem with any normal audience seating arrangement.
(Omnimax/Imax theater seating addresses this problem by steeply pitched
seats). Research is ongoing into suitable rear-screen projection materials
which would preserve passive polarization for stereo image separation so
that inexpensive cardboard glasses (like those used in 3D movies) may be
used for larger audiences.
Collaborative terminology
EVL's research efforts are not limited to developing VR technologies, but also include the development of tools and techniques to make VR an intuitive user interface to the emerging National Information Infrastructure. Specifically, we are interested in distributed computing, telecollaboration, and telepresence.
Distributed computing
Scientific simulation codes are typically large and complex. They require
HPCC resources - scalable computers, vector processors, massive datastores,
large memories, or high-speed networks. Depending on the data and type
of analysis scientists want to do, they set up their simulation codes to
calculate greater detail, different time steps, or different states defined
by new parameters. With CAVE technology, we can run applications in one
of two modes: locally on the Onyx/CAVE and/or distributed between a backend
computer and the Onyx/CAVE. In local mode, CAVE participants either "interactively
steer" computationally-modest simulations or explore precomputed datasets.
("Computationally modest" is a changing concept, however - it
is possible to load the Onyx up with R8000 processors giving it the 64-bit
computng power of a half-dozen Cray Y-MPs!)
In distributed computing mode, CAVE participants can steer simulation codes. That is, they can explore/experience visualizations of datasets, identify areas to enhance, and then invoke simulation codes on the networked computers to compute new datasets. The backend machine generates new data, which is then transferred to the Onyx for rendering and display in the CAVE. To date, the CAVE has been locally networked to several backend machines: Thinking Machine Corporation's CM-5, Silicon Graphics' Challenge Array, and an IBM SP-2. In the descriptions that follow, we refer to distributed computing as "VR-to-supercomputer" and "VR-to-supercomputer-to-VR." Note that while we use the term "supercomputer," we can backend our VR technologies to anything: a supercomputer, a scalable computer architecture, a massive datastore, or remote instrumentation.
Telecollaboration
Scientists and engineers often work in teams, and these teams often optimally
reside at geographically disparate locations. We are interested in inter-VR
communication, where two or more people at different sites can use VR to
communicate with one another and can interact with the same data and simulation
codes. Several projects exhibited as part of the GII Testbed activites
at Supercomputing '95 in San Diego (December 4-8, 1995) explored using
the networks to achieve telecollaboration.
Minimally, we ship head and hand tracker information to give one viewer a sense of where the other viewer is looking and pointing. In the descriptions that follow, we refer to telecollaboration as "VR-to-VR" or "VR-to-supercomputer-to-VR," as there may or may not be backend computers/storage arrays/instrumentation that participants act upon.
Telepresence
Telepresence involves the use of VR to project the viewer into another
environment. While all of VR can be defined as providing this type of experience
to some extent, telepresence is usually associated with taking viewers
into other physical worlds. For example, cameras on satellites allow scientists
on Earth to feel like they are in outer space, and scientists can interact
with robotic arms on the satellites to manipulate objects in space. Another
example of telepresence is teleconferencing, where a participant is projected
into the room of an associate in order to interact. Using VR technologies
and high-speed networks, a first-order goal is to enable digital video/audio
transmissions as well as computer-generated imagery. Soon, this research
will interface VR technologies to an ATM-based 155-Mb network, experimenting
with video and audio both ways. The idea is to enable users to conduct
remote teleconferencing and distributed virtual prototyping, a keen interest
of industrial partners as well as scientists. Latency of long-distance
networks will be a particularly interesting artifact to study.
Latency
VR-to-VR
Latency needs to be low so that interactions between human participants
at geographically disparate sites can be as natural as possible. Emphasis
should be on transmitting tracker position information and 3D input device/voice/gesture-derived
information. Low latency in audio is required for reasonable voice communication
between researchers doing collaborative design. Similarly, low latency
in video is desirable when facial and/or gesture data is critical. In situations
where latency cannot be guaranteed or is too high, we can send bit mapped
images and map them onto avatars (computer-generated representations of
people). In situations where low latency can be guaranteed, we can send
2-1/2 dimensional image-processed video images of a person, enabling the
viewer to see gestural and facial information of the person communicating.
VR-to-supercomputer(s)
Latency between the human steering the simulation from the VR device and
the supercomputer's response is dependent more on the supercomputer than
the networks in most applications to date. This is because small amounts
of data are sent by the person steering the computation - they are sending
an instruction, a new parameter, or a menu choice. Information sent back
from the supercomputer for display can be considerable; for instance, the
supercomputer can be computing enormous volumetric information. Naturally,
one can still be immersed in the previous dataset while waiting for the
new data to finish computing/transmitting.
VR-to-supercomputer(s)-to-VR
Latency issues are the same as those for VR-to-supercomputer(s) for computation
phases and the same as VR-to-VR for interactive viewing of the computed
model, once downloaded.
Bandwidth
VR-to-VR
Coupled with low-latency, emphasis should be on transmitting tracker position
information and 3D input device/voice/gesture-derived information in real
time. Fortunately, this is low-bandwidth information in the hundreds to
thousands of bytes per second. Models and precomputed data can be sent
beforehand and only changes transmitted in real time. This is reasonable
as long as latency is controlled.
The VR activities can be very bandwidth-consuming. If the computer model is being significantly modified by user interaction (either locally by one of the users or remotely on a backend supercomputer), then the model needs to be updated on all of the local machines. The machine on which the modifications occurred has to broadcast the changes to the model. It may need to transmit quite a bit of information to synchronize the models.
Audio and video information can also be very consumptive of bandwidth. For example, a minimum voice quality channel requires 10 KB/second; even at this reduced rate, we have observed noticeable breakup and delayed communication between two sites over the Internet. NTSC-quality video that has been JPEG compressed into bitmaps requires a 500 KB/second transmission rate; this is decent quality for a 640x480 image.
VR-to-supercomputer(s)
VR visual information is normally efficiently compressed by sending polygon
meshes and lists instead of bitmaps - this preserves the 3D data as well.
The raw bandwidth of the CAVE if sent as full-screen bitmaps is around
8 Gbps, which far exceeds the networking capability of all system components
(except memory to screen!). The Onyx can display something like 12 MB of
mesh coordinates per second, which is therefore an upper bound for the
1994-5 generation of VR (five times improvement is anticipated in mid-1996).
There are cases where we need to send bitmaps because no polygonal representation
of the object exists. This could require bandwidth way too high for current
OC-3 rates. Instead, we use techniques such as successive refinement or
compression to handle transmission. Texture maps, used in volume visualization
and medical applications, also require significant bandwidth.
Sometimes the supercomputer is dependent upon information from real-time instrumentation. Typically, these instruments operate at modest bandwidths unless they are imaging devices.
VR-to-supercomputer(s)-to-VR
This is a combination of two or more sites interacting in a VR-to-VR mode
when navigating a model and in a VR-to-supercomputer mode when requesting
an updated simulation.
Quality of Service
Networking Quality of Service must ensure that users do not lose data before VR can become a necessary problem-solving tool.
Ensuring data integrity
There are two types of data in VR: content data and control data. Scientific
or engineering content data typically does not have redundancy, so, in
VR-to-supercomputer and VR-to-supercomputer-to-VR, the integrity of data
must be ensured. Models, stored paths, and so on, can be downloaded (in
advance, perhaps) with full verification. Quality of service must be high.
Navigational information, on the other hand, which includes the head and hand tracker positions of participants, should have high quality of service to avoid lurching from discontinuities of motion data and missed events like button pushes and menu choices.
Guaranteed latency and bandwidth
The successor to the Internet for this kind of use needs guaranteed latency
and bandwidth to send images and audio in real time. Experience in the
CAVE with the tracker indicates that latency less than 10 ms is not needed.
We currently tolerate tracker latencies of 50 ms. The theoretical minimum
transmission across the country is 16 ms (which is 3,000 miles at the speed
of light). We currently get on the order of 100 ms with the Internet. This
definitely presents a perceptible delay to a human being.
Other considerations
Audio/video digital transmission and reception
Devices are available for purchase that enable video and audio to be encoded
and decoded for ATM transmission and reception. The ImmersaDesk is more
amenable to video capture than the CAVE, since the placement of a video
camera at the top of the ImmersaDesk screen is fairly easy to do. Very
small unobtrusive cameras are currently being tested in the CAVE and I-Wall
configurations to get around placement issues. (Note that video cannot
meaningfully be integrated with HMDs or BOOMs since seeing someone in a
HMD or glued to a BOOM is probably uninteresting.)
It is clear that distributed virtual prototyping needs all the advanced capabilities of desktop multimedia (e.g., video and audio, note taking, pointing and selecting, etc.) extrapolated to the VR domain. One of EVL's historic strengths has been the ability to combine video and computer graphics; in fact, deep knowledge of video has been a key factor in the development of the CAVE ahead of commercial, military, and other investigators. It must be noted that audio is even more sensitive to latency and bandwidth restrictions than video. People are more accustomed to video frames freezing than they are to audio stuttering. While there are some compression/decompression technologies that can help, networking factors play a critical role.
Recording and playback of virtual-prototyping sessions
Recording and playback of virtual-prototyping sessions are important if
science and engineering are the domain. VR is currently produced by real-time
display of 3D rendered polygons with the added enhancements of lighting,
texture, and transparency. Video, by contrast, is produced as 60 fields
(30 frames) of 2D images per second. The raw bandwidth of the ImmersaDesk,
given its resolution of 1024x768 at 96 Hz, is about 2 billion bits/second,
or about 70 times NTSC video, were one to attempt to save it as a series
of 2D bit maps. However, 2D screen saving eliminates the possibility of
head tracking during subsequent playback, so it is clear that VR experiences
must be recorded as they are produced; that is, as some form of lists of
geometrical descriptions, rendering parameters, and position information.
VR experiences are currently preserved as C or C++ programs, an unacceptably
complex recording mechanism for, say, design engineers. If CAD systems
had no way to record the results of a CAD session, drafting tables would
not be holding up laser printers as their primary function in architectural
studios today. It is fair to say that one cannot do science and engineering
in VR if the results are irreproducible in practice.
Naturally, recording time-based paths through space of scenes whose geometries do not change (like typical flight simulator runs) is fairly straightforward. A fly-through repeat would likely provide speed controls and some navigation capability. Editing can be accomplished by tagging together these paths with additional scene initialization information whenever necessary. A good model for this kind of editing is the Macromind Director program for the Macintosh, often used to produce multimedia presentations. Director is based on concepts tested at EVL in the mid-1970s and is similar to a multi-track audio tape recorder in function and editing capability. A joint effort of EVL and NCSA has produced a software capability called the Virtual Director which allows recording, editing, and playback of CAVE experiences, and the creation of complex camera moves for eventual transfer to film or video. The Virtual Director can be completely controlled with voice commands and tracker/wand button information. It has been used to produce videotapes and, most recently, a segment of an IMAX movie for the Smithsonian Institution Air and Space Museum.
Current visual resolution (1024x768 x 96Hz/screen) is way beneath human visual acuity, of course. To achieve 20/20 vision at a 90 degree angle of view, something close to 5000 pixels across is necessary. No display gives this resolution of course. Tracking the user's eyes (or minimally, the head) would allow a variable resolution display to give the highest resolution at the 5 degree area we see with most acuity. Perhaps VR will motivate the development of variable resolution options in commercial graphics engines and software.
Conclusions
Virtual reality is a demanding testbed activity for emerging high-speed networks. Much research needs to be done into novel compression techniques and predictive behavior to reduce the initial demands of high bandwidth, low latency and high quality of service. As a monitoring technique for these parameters, VR is excellent because the human senses are so engaged in the immersive experience - small gaps in continuity are easily perceived and large gaps are sufficiently unacceptable to inspire research. Partnerships between academia, industry, and government will quickly grow in this area, providing a driver for even higher-bandwidth networks.
Related Publications
[1] T.A. DeFanti, D.J. Sandin and C. Cruz-Neira, "A 'Room' with a 'View'," IEEE Spectrum, October 1993, pp. 30-33.
[2] C. Cruz-Neira, J. Leigh, C. Barnes, S.M. Cohen, S. Das, R. Engelmann, R. Hudson, M. Papka, T. Roy, L. Siegel, C. Vasilakis, T.A. DeFanti and D.J. Sandin, "Scientists in Wonderland: A Report on Visualization Applications in the CAVE Virtual Reality Environment," IEEE 1993 Symposium on Research Frontiers in Virtual Reality, October 1993, pp. 59-66.
[3] C. Cruz-Neira, D.J. Sandin, T.A. DeFanti, "Surround-Screen Projection-Based Virtual Reality: The Design and Implementation of the CAVE," Computer Graphics (Proceedings of SIGGRAPH '93), ACM SIGGRAPH, August 1993, pp. 135-142.
[4] C. Cruz-Neira, D.J. Sandin, T.A. DeFanti, R.V. Kenyon and J.C. Hart, "The CAVE: Audio Visual Experience Automatic Virtual Environment," Communications of the ACM, Vol. 35, No. 6, June 1992, pp. 65-72.