tech

The Thing Growing: Technical Description

The Thing Growing is a Virtual Reality Application, it has been shown in a CAVE, on an ImmersaDesk, on a Barco Baron and on Panoram Display system - the last two with added tracking.

The user has between two and four trackers attached to head, arms and body, and holds a 3D mouse with three buttons and a joystick for interaction and navigation. Information from the trackers, joystick and buttons is used by the application to figure out where the user is and what they are doing.

This application is programmed using the CAVE library, Performer and XP. XP is a VR authoring system based on Performer designed by Dave Pape. Other VR projects programmed in XP include, the Multi-Mega Book in the CAVE, Mitologies.

Among other things, the basic XP system takes care of loading and transforming models; collision detection; navigation; triggering events; integrating sound and switching individual objects or whole scenes on and off. In addition messages can be sent from any object in the scene graph to any other object.

To create the virtual scenes XP reads a hierachical text file. Objects in the text file can be models or behaviors. Messages and event triggers, (based on time, distance or other parameters) can all be entered into the text file. By editing this text file non-programmers can swiftly constrcut complex, interactive environments and alter them easily.

The XP system is also easily extendible for programmers. Classes with specific behaviors can be added to the central core. Each new class inherits from the XP node class, so that its attributes are parsed by the XP system. The messages it understand, and events that it triggers are added into the system. The classes do not have to know about each other to effect each other. In the scenefile we can link any object's event trigger to any message for any object.

Examples of Use and Extension of XP for The Thing Growing

The XP message system was invaluable for constructing the narrative backbone of the Thing Growing. In the first scene, a distance sensor triggers a message to open a door; a push of the button sends a message to activate a key; delayed messages blow up a box and shed. At a later point in the story the Thing becomes so angry with the user that it hides under one of the rocks that dot the plain the initial action takes place on. At this point the Thing object sends out a message to all the other rocks that it's time to catch the user. This message triggers a set of stalking and herding behaviors in all the rock objects.
Intelligence has to be programmed into the rocks - they have to know where the user is, they have to avoid each other, and they have to sneak up on the user and try to trap her. Dave extended the basic XP transform class and made a rock-object class which kept a list of all other rock-objects and programmed them with a set of rules on how to move until one grabs the user. When that happens, all the other rocks scatter and a message is sent to the navigator to disable navigation. The user is trapped with a rock slobbering on her.
Much of the extension of XP for this application lay in creating classes with autonomous behavior for moving objects in the VR environment. Most specifically for the creation of the Virtual Character, the Thing.

The Thing: Moving the Thing's Body

The Thing is a collection of detached translucent triangles, one for the head, two for arms, one for a body and four or five for a tail. It is animated using motion tracking. The life-like movement that results causes the user to sketch in the lines of a dragon-like creature.

The Thing is a speaking creature. Based on the initial story board, I recorded a library of soundbites for its voice. Sometimes the things it says are scripted and do not vary - for example when it is freed from the box it is trapped in. But mostly it has to speak in response to the user - for example when it is teaching the user to dance. For this section I recorded different versions of words of praise, encouragement, criticism, explanation. I also recorded these different types of utterance in different moods - happy, manic, sad and angry. Each sound bite lasts a few seconds. The animation was done in time to these soundbites.

Dave built an XP recording tool for use in the CAVE. To use it I make a text file loaded with soundfiles and the Thing's body parts. I enter the CAVE and attach trackers to my head, arms and body. In front of me I see the model of the Thing, around me boxes represent the sound bites. My motions are directly relayed to the model so that it moves in the same way I do. I click on a sound bite box. That particular sound plays back while I move and watch the Thing moving. I try to create a motion that captures the sense and mood of the words. Buttons on the wand allow me to record my movement, then play it back. During this play back I can both hear the soundbite and walk around the Thing watching it make the movement I just recorded. If necessary I can re-record the movement.

Since we only have four trackers to work with - I record the tail in a separate step. The tail is a spring - Dave wrote an XP spring class - attached to the body at one end. I manipulate a tracker that feeds information to the other end of the tail. I play back the body motion and then run around with the tail-end, trying to make an appropriate tail motion. The central pieces of the tail move according to the spring calculations between the two fixed points.

The position and orientation information obtained in this process is stored in text files. When the application is running, the computer interpolates between the end of one movement and the beginning of the next, so that the motion is smooth - Dave wrote all the code for making the body parts move correctly.

As well as this animated movement for its body parts. The Thing needs to move its body as a whole with reference to the environment and the user. At times it needs to stays close, swoop in on, or run after the user. At other times it needs to move to a particular spot in the environment. I wrote another class extending the basic XP transform class to execute this global body behavior.

The Thing: Intelligence

Essentially programming for the Thing has two major categories - body and brain.

The brain has access to stores of all the actions that the Thing knows. Each action consists of three parts; a text file with the motion capture information; the name of a sound bite that goes with this motion; and a message to tell the global body how to move the body as a whole. The brain's job is to select an appropriate action according to the point in the narrative, the user's actions, and the Thing's own emotional state, and to pass the information onto the Thing's body parts, to its voice and to its global body so that they will all execute this action.

The brain is parent of the global body which is a parent of the body parts. In the XP scenefile this is written like this:

brain(name = brain)
{

globalBody(name = globalbody)
{

bodyPart(name = head)

bodyPart(name = arm)

bodyPart(name = body)

}

}

The brain puts its grandchildren, the body parts into a list, so when it feeds them the information from a text file of motion captured movement, each part gets the right position and orientation information.

The brain has pointers to a sound class for the voice, and to action stores. Each action can stores any number of actions.

Narrative Constraints

The narrative constrains the number of actions and action stores we need to create for the Thing. For example when the Thing is attempting to teach the user to dance, it has a basic routine to follow:

demonstrate a dance step
observe user dancing or dance with user
check whether user is dancing correctly
praise or criticise user
repeat dance step or teach new step

This routine is interrupted if the user tries to run away. Then a different routine is triggered to make the Thing run after the user and plead with her or scold her to continue the dance.

Essentially each type of response - "demonstrate a step", "dance with user", "run after user", "praise user" - has an action store filled with possible actions. When one action is finished the intelligence performs some checks and decides on the next action. Then the action is pulled out from the appropriate bin and sent to the global body, body parts and voice for execution. The action can be pulled out sequentially, at random or by mood. Remember the Thing can respond in one of four moods, angry, sad, happy, manic.

This explains the mechanics of picking and executing an action, the problem left over for the intelligence is deciding on the next action.

Checks

Depending on the narrative needs, the Thing can either perform a scripted sequence of actions, or perform actions that depend on the user's reactions. Movement from one kind of behavior to the other can be triggered by a time interval or by the user's behavior.

If the Thing's actions depend on the user, a series of checks must be made before the next action is chosen, to see what the user has done. For example the routine for teaching the user to dance is listed above. One of the steps listed is to check whether the user is dancing correctly. Since information comes in from the trackers about the user's movements. It should be possible to check whether the movements of the user's arms, body and head are similar to the movements of the Thing's arms, body and head.

The story demanded that the Thing and user share an activity that suggested a certain level of intimacy, and one in which the Thing could slowly demonstrate that it was demanding and dominating. Originally I had planned for them to play at mimicking each other. However, I chose the activity of dancing because it would be easier to fix in time a beginning and ending of the user's motion. The Thing sings a repetitive 8 bar refrain as it dances and as the user dances. Therefore the user's attempts to dance should start as the Thing begins to sing and end as it finishes singing. This solves the problem of when to start checking the user's movements. However, there were other complexities.

I discussed the problem at EVL. Andy Johnson suggested that to really check on the accuracy of the user's movement I would have to divide the CAVE space into a grid and continually check which box in the grid the user's arms were in. I would have to build up a series of check points for each of the dance actions that would correspond to the user moving their arms in and out of the boxes of the grid. This solution seemed complicated and a lot of work - and would need changing if we ever changed the dance steps.

I was also unsure how to proceed with changing the Thing's moods during the interactive activity of the dance. I had in mind a scheme like this:

Animation in the CAVE

This was also a lot of work, and work without any real knowledge of how the users were going to react and what was actually needed by to make them co-operate with the virtual character.

Our intention has always been to make the Thing entirely autonomous. However, we built the Thing's body and the basic routine to teach the dance and I still had only vague idea on how to build this checker. My main thought was that it would be complicated,

Networked Thing

Our intention has always been to make the Thing entirely autonomous, however, SIGGRAPH 98 was approaching and we wanting to show the first part of the project. EVL was proposing to bring a networked system of two I-Desks. Therefore as an interim step, Dave created a networked XP and we built a networked version of the project. This effectively gave us a Wizard of Oz brain: a networked user who was an invisible voyeur on the scene between the Thing and an avatar of the participant.
The networked user had a menu on his screen and used the controls to tell the Thing if the participant was dancing well or not, and also to control it's moods. In this scenario, although the Thing had its inbuilt routines of behavior it was also getting help from a much more flexible intelligence system - with a wealth of fine-tuned interactive experience!
A side benefit of creating a networked version of the application was that we can videotape an avatar of a user interacting with the Thing.

User Testing and Autonomous Thing

We ran the networked Thing at both SIGGRAPH 98 and the ARS Electronica Festival 98 with just the first half of the story - up to the point where the user is trapped under a rock. Over the winter we developed the second half of the story. Where the Thing and user are trapped by the Things cousins and the user shoots them out of trouble.

I made the second half autonomous from the beginning. In this half the user is required to pick up a gun, free herself and the Thing from a cage, and gun down the Thing's cousins. It was easy to check if the user had jumped through the appropriate hoops, and move the story and the Thing's responses on appropriately. A first version of the completed story showed to various groups at the Walker art center in April 99.

As we observed the users, and our own responses as the Wizard of Oz two things became apparent.

First, we fell into a fairly standard way of altering the Thing's moods. The dancing interaction lasts for 2-3 minutes. The finale is the Thing running off in a huff. Essentially the mood changes from good to bad over time, if the user is uncooperative - refuses to dance, runs away a lot - the moods becomes whiny or angry quicker.

Second, users had a fairly standard way of reacting to the Thing. They either tried to obey it - or refused to dance and tried to get way from it. Those that tried to dance, varied widely between people who would copy exactly and those too shy to move very freely - as the Wizard of Oz we tended to treated these alike to encourage the timid. My sense is that copying and mimicry are very basic human habits - thats how we learn to be human. When testing the application myself I find myself tending to copy the Thing. So I decided that we could take advantage of what we had observed and make some short cuts in the portion of the Thing's intelligence dedicated to checking.

Based on the experience of making the Thing autonomous in the second half of the story and the two observations above I built an autonomous dancing Thing. I assumed any arm movement that travels more than an arbitary minmum distance at times when the user is meant to dance is an attempt to dance. We do not bother to check each dance movement separately and precisely to make sure that the user is doing a specific move. I only check one arm and so can run the application with just two trackers, one on the head and one attached to the wand. Over time the Thing becomes randomly pickier, criticing and rejecting movements that earlier it would have OK'd. The Thing's mood changes are also based on time and the level of user co-operation.

This autonomous dancing Thing has been tested at the Virtuality and Interactivity show in Milan, Italy, May 99, SIGGRAPH Aug 99, and at Life Science - Ars Electronica Festival 99. Some modifications are needed but essentially it works. As the Thing gets pickier people do seem to pay more attention and copy more acccurately. One modification may be to add fake sensors, for those experts who are trying to figure out how the Thing can know what they are doing!

The period of user testing with "human-augmented" intelligence was very revealing and led to the building of a Thing that is much less intelligent that the one planned. However, it also means that we did not spend a long time building more intelligence than we needed and the process as a whole was significantly simplified - I very much reccommend it!