Animation in the CAVE

By Josephine Anstey and Dave Pape for Animation World Magazine
http://www.awn.com/mag/index.phtml

Introduction

You walk into a ten by ten foot room, put on a special pair of glasses, and take hold of a 3D mouse. As the program starts, the walls disappear. Now you are surrounded by the cobbles, stonework and red tiles of an Italian Renaissance city. Down the street you can see a cathedral. Pushing a button on the mouse moves you forward. Then you see an anomaly - a sleek, electric blue column. As you get closer, the column whirs and splits - a disjointed little figure unfolds The figure gestures to you, then takes off. It looks back and beckons, "Come on!"

You follow this strange guide to a dark stairway, winding up inside the cathedral. At the top a low doorway takes you outside. The city stretches beneath you - transparent pathways curl around the dome and lead back to the ground. You see the little guide far below. He is dancing wildly as a stream of flying letters and books sail past him and whip inside a building. What is in there .....?

This is a description of an experience in the CAVE(TM) virtual reality theatre. Virtual Reality is the art and science of using computers to create three dimensional worlds that users can be immersed in, explore as they please, and interact with in real time.

The CAVE (a recursive acronym for CAVE Automatic Virtual Environment) is a virtual reality display device, but not the kind of head-mounted display normally associated with VR. Instead, it's more like a prototype for Star Trek's HoloDeck - a room that people can enter, with stereoscopic computer images projected on the walls and floor. The computer continually updates and redraws the display as users move through the environment.

One of the potentials of the CAVE is the creation of animated 3D worlds and characters that a user can interact with: in effect making the user part of a story.

History of the CAVE

The CAVE was created at the University of Illinois at Chicago's Electronic Visualization Laboratory. EVL is a state of the art research lab for interactive computer graphics, which brings together students from UIC's Schools of Engineering and of Art and Design. It was founded in the early 1970s by Dan Sandin, art professor and creator of the Sandin Image Processor, a device well known in the video art community, and Tom DeFanti, engineering professor and author of GRASS, an early computer animation system. The Lab's work has always been a mixture of art, entertainment, engineering, and science. In the 70s, EVL staged public performances of interactive electronic art, and provided the computer hardware and software used to create the original computer graphics in Star Wars. Later work included developing graphics hardware that formed one of the first home computer systems, investigating 3D fractal imagery, and using visualization for scientific research with the National Center for Supercomputing Applications.

In 1991, DeFanti and Sandin decided to use their experience with video and interactive computer graphics to create a new approach to the growing field of virtual reality. Traditional VR systems were head-mounted - a pair of small video displays attached to a helmet or mechanical boom. Most such displays were low-resolution, encumbering, and isolated the user. EVL's new device - the CAVE - used video projection screens to create a VR display that users entered, rather than wore. The CAVE display was high-resolution, only required the users to wear lightweight shutter glasses, and could be shared by whole groups of people at once. It was implemented by EVL students - Carolina Cruz-Neira, Greg Dawe, Sumit Das, and others - and was first shown at the 1992 SIGGRAPH conference in Chicago. The full system was completed barely in time for the conference, and many of the applications demonstrated weren't seen in the CAVE itself until showtime.

The CAVE is a 10 foot by 10 foot cube; three walls are rear projection screens, and the floor is projected onto from above. High-end Silicon Graphics computers, such as an Onyx2 Infinite Reality, generate the 3D images and simulate the dynamics of the virtual world. Another SGI machine, connected to loudspeakers in the four corners of the CAVE, creates the sounds of the environment. The ImmersaDesk(TM) is a newer, smaller-scale version of the CAVE - a drafting-table-style display, rather than an entire room. Since 1992, over 50 CAVEs, ImmersaDesks, and similar devices have been installed in universities, corporate labs, and a few museums.

Applications

The difference between virtual reality and normal computer graphics is that the user is immersed in the computer-generated environment. She is surrounded by images and sound; the images are in stereoscopic 3D, rather than flat on the screen; and the world is displayed in a first-person perspective, from her viewpoint, rather than a third-person viewpoint common to most other forms of image creation. To complete the immersion in the virtual world, VR is interactive - the user can have (some) control over what happens.

VR can be applied to any problem that can benefit from an immersive, three-dimensional, interactive solution - whether in molecular biology, cosmology, architecture and design, education, entertainment or the arts. General Motors has started using CAVEs to evaluate the design of new car interiors before having to build physical prototypes. Old Dominion University is using an ImmersaDesk to view computer simulations of the Chesapeake Bay ecosystem. At NCSA, Donna Cox used the CAVE program Virtual Director to create animation for the IMAX film Cosmic Voyage. A group at EVL has built a virtual island where children can tend a virtual garden and learn about environmental concepts. EVL also participates fully in the world of electronic art. Dan Sandin organised the opening show for the first CAVE installed in a museum of Art and Technology, the Ars Electronica Center in Linz, Austria, which featured projects by EVL faculty and students.

The Thing

We are currently working on a project, The Thing Growing, whose focus is the construction of the "Thing," a virtual, interactive, animated character in the CAVE. The goal of the project is to create a story in which the user takes a leading role and is engaged at an emotional level with the Thing.

The Thing looks translucent. The triangular shapes forming its head, appendages and body do not seem to join up. It changes colors as it speaks and according to its moods. It is alternately bullying and loving. It has no specific gender. Its goal is to make you dance with it, which it takes as a sign of love and obedience.

To animate in the CAVE we use tools familiar to any computer animator. For example the models for The Thing Growing are being made in Softimage and the textures are being made in Photoshop. These models are then imported into the CAVE. In VR, the computer has to redraw the scene in about one sixtieth of a second, in order to keep the frame rate at 30 frames per second (remember it has to draw a different view for each eye). Therefore, even with an Onxy2 these models must be far simpler than those of computer animation for film and video, where you can spend minutes or hours rendering a single frame. The gain, and we think it's an exciting one, is being able to interact, in real time, with a virtual character and world.

Virtual reality applications can use a wide variety of methods for animating things: flipbooks, keyframing, motion capture, and procedural (computer programmed) animation. The Multi Mega Book in the CAVE (described at the beginning of this article) uses a flipbook of 3D models to walk a wire-framed Judas out of Leonardo's painting of the Last Supper. In The Thing Growing rocks come alive and chase the user. When a rock gets close enough it rears up and swallows her. In this case there are only four simple models and the CAVE morphs between them to produce the rock's growing and grabbing action.

We commonly use keyframe animation to move objects in a CAVE application. For example to animate the flying letters referred to in the description of the Multi Mega Book, keyframes were set to determine the path of the letters through the city. As the letters move along this path, a simple behavior routine makes them also spin and orbit each other.

The CAVE uses a tracking system to get information on the user's head and hand position in order for the computer to compute the perspective from a user-centered point of view. Tracking is done with electromagnetic systems such as the Ascension Flock of Birds; sensors are attached to the stereo glasses and to the 3D mouse. We use this same system to record motion tracked animation.

Our first experiments with this were for the Multi Mega Book. Our collaborator, Franz Fischnaller, had already determined a shape for the guide character - a simple collection of geometric shapes. We hooked Franz up to four tracking sensors, one for the head, one for each arm, and one for the body. At the same time we ran a CAVE program that took the position and orientation information from the tracker and fed it to the parts of the character's body which were then displayed. So as he moved Franz could immediately see how the character would move. Its head moved as he moved his head. Its arm waved as he moved his arm. In this way we could build up animation for the individual body parts. Later, we could define keyframes to move the body as a whole along a path through the virtual world.

Although motion-tracking or keyframing are useful in creating movements of characters and objects, they can't be used alone in VR. Because the virtual world is interactive, the user is an integral part of it, and the full progression of the storyline isn't known in advance. We can pre-animate elements of the action - walk cycles, dances, gestures - but these elements have to be combined dynamically in response to the user. Creating this level of interaction is easily the most challenging aspect of CAVE animation.

Building interaction for the CAVE means making objects intelligent and able to react to people. A simple example comes from The Thing Growing. At one point in the narrative the Thing becomes so angry with the user that it hides under one of the rocks in the vast plain the action takes place on. This is the point when the other rocks come alive and start to stalk and herd the user. Intelligence has to be programmed into the rocks - they have to know where the user is, they have to avoid each other, and they have to sneak up on the user and try to trap her. Instead of movement based on keyframes, the rocks are given a set of rules on how to move until one grabs the user. When that happens, all the other rocks scatter and the user's ability to navigate is taken away. She is trapped with a rock slobbering on her.

For the Thing itself we are using motion tracking to build up a library of actions; in this case there are 8 body parts - head, two arms, body, four tail sections. Each action lasts a few seconds and has a corresponding sound bite. As the program runs, the Thing's intelligence unit selects an appropriate action and sound according to the point in the narrative, the user's actions, and the Thing's own emotional state. The computer will interpolate between the end of one action and the beginning of the next, so that the movement is smooth.

The Thing's intelligence also decides how the body as a whole moves. This movement may be relative to the user, as the Thing stays close or swoops in on the user. Or it may decide to move to a particular spot in the environment. All the while it has to avoid other objects. The computations for these movements are done on the fly using a set of rules and information about the position of the user, the Thing, and other objects.

The Thing has four basic moods: happy, depressed, manic, angry. Its emotional state is established in part from information about the user. Essentially all that the computer, and therefore the Thing, can know about a user is the tracked position and orientation of the user's head and one, or both, hands. So, for example, we keep track of the user's head position relative to the Thing - if the user looks at the Thing most of the time, it interprets that as attentiveness and that makes it happy. We also monitor the general activity of the user. A user that moves about a lot is fast, and this will tend to make a happy Thing manic. If the Thing is not so happy, fast user movement will make it angry, slow user movement will make it depressed. The Thing's emotional state will also fluctuate according to an internal set of rules, so that the emotions are not simply a reflection of the user - too much of any one emotion will flip it over into a different one.

Thing and Josephine Dance In addition to monitoring the user in general, we check specific user movements. The Thing is attempting to teach the user a dance. It will demonstrate each part of the dance, then observe or join the user as she copies the movement. Each of the parts of the dance will be one of the Thing's actions. And each of these particular actions will have a test associated with it to verify whether the user is dancing correctly. Knowing whether the user is dancing correctly will feed back both to the Thing's emotional component and to its decision-making process. It may decide to repeat a part of the dance that the user is doing incorrectly. It will admonish, encourage or praise the user according to the user's behavior and its own mood.

Conclusion

This process of developing an animated character focuses on maximizing the advantages of CAVE VR - immersion in an interactive 3D world - and minimizing the disadvantages - the relative simplicity of images. It was immediately obvious that any degree of photo-realism, and even the visual complexity of most computer animation, was impossible. This resulted in a decision to make a very clean, uncomplicated environment, and to create a virtual character who is simply a collection of pyramids. Capturing the motion for the Thing's head and limbs is the most efficient and effective way of giving this collection a life.

Thereafter, the most daunting task is to give that life intelligence. The starting point is assessing its sensory inputs - the tracking information. In building the intelligence component we interpret the data the Thing receives, but we also try to construct its character so that the paucity of the information it is working with is not apparent. The Thing is high-handed and willful - in part because of the exigencies of the story line and in part to hide its stupidity. It is inconsistent, arbitrarily praising or abusing the user for the same behavior - in part to mimic the inconsistency of many people, in part to hide its ignorance.

The Thing Growing is still under development. At this stage we can't judge how effective or real an experience it will give a user. But we hope that after testing and refining the program, it will be possible for the Thing to build a relationship with a user - however virtual that relationship may be.