Agency

It is a terrible thing to see and have no vision.” – Helen Keller

The computer may have vision, but it has no idea what to do with what it sees.

how a result is obtained or an end is achieved

Making an Agent of the Computer

What does the computer see?

The proper response to the changing input of our head position is for the computer to render the world from that point of view.

The mathematical equations that dictate the rendering of the world require only the position and orientation of the viewer.

However, when we want the computer to understand what we are looking at, we have a problem:

• are we looking at the object along the direct line of sight?
• are we looking at an object behind the first object in the line of sight?
• should we consider some area around the line of sight?
• should we choose the closest object in that area?

What does it feel?

How should we indicate to the computer that our hand is “inside” of an object?

This also poses questions:

• must the whole of the hand be inside?
• must only the “center” of the hand be inside?
• what is inside if the object is an irregular shape?

As a result, the situation must be simplified:

• some point relative to the tracked hand position is chosen as the center of the hand
• a volume, often using simple shapes, is used to define the extent of the object

How should we indicate to the computer that our hand is “touching” an object?

Again, this also poses questions:

• are we referring to the virtual hand or the physical hand?
• how will the user know when they are touching something?
• what if both the hand and the object being touched are complicated models?

As a result, the following is true:

• real-time computer graphics often do not have the luxury of testing each polygon in the scene against each polygon on the hand or other objects
• a simplified model of the hand, called the collision object, is usually used
• testing a short line segment against a number of polygons is significantly less intensive
• several line segments can be used to approximate the volume of the hand

What does it hear?

If our hand is reduced to a center point or a few bulky line segments, then our ability to communicate with the environment is significantly less than real life.

We are accustomed to not only moving our hand to objects, but interacting with them in a complicated manner such as grabbing them, pushing buttons, moving them on the surface.

Some other mean of indicating our intentions is necessary if we are not using a tracking system that can monitor the tips of the fingers and a real-time system that can make sense of those inputs.

As a result, the following is true:

• most computer systems work best with discrete events such as on/off, inside/outside, yes/no
• most interfaces, including the familiar desktop interface, work best when the input is reduced to discrete events (left click, right click) and simple analog inputs (i.e. mouse X and Y position)
• even systems that track the position of the fingers incorporate the ability to send discrete button signals by touching the fingers together – data gloves

People often ask why we do not use voice recognition in the cave.

If voice recognition is so useful for getting things done then why don’t 99.99% of the desktop computer users in the world use it?

People will be doing productive work in virtual environments long before they abandon the visual and motor oriented relationship they now enjoy with the computer.