Week 7 Homework - due 9pm Chicago
time on Friday 10/13/17
In project 1 you used
VR to create your dream office - a new trend in AR apps is evaluating
virtual furniture placed within the room where you are thinking about
putting it. For this homework you should try out one of these apps -
IKEA has a couple, IStaging is another. Take a room you know, place a
piece of appropriate furniture in it and take a screen capture. Then do
the same thing for a place you normally wouldn't see a piece of
furniture like that - like a bed in the middle of Halsted St. or a stove
in the middle of campus. Add these two snapshots to another homework
page and write down your thoughts on how AR applications like this could
be used in the future.
The primary purpose of tracking is to update the visual display based on the viewers head position and orientation.
Ideally we would track the users eyes directly, and while that is possible, it is cumbersome, and not overly necessary. Instead of tracking the viewer's eyes directly, we track the position and orientation of the user's head. From this we determine the position and orientation of the user's two eyes.
We may also be tracking the user's hand(s), fingers, legs, or other interface devices.
Want tracking to be as 'invisible' as possible to the user.
Want the user to be able to move
freely with few encumbrances
Want to be able to have multiple 'guests' nearby
Want to track as many objects as necessary
Want to have minimal delay between movement of an object (including the head and hand) and the detection of the objects new position / orientation (< 50 msec total)
Want tracking to be accurate (1mm and
In order to interact with the virtual
world beyond moving through it, we probably need to track at least one
of the user's hands as well, and preferably both of them. Tracking the
position and orientation of the hand allows the user to interact with
the virtual world or other users as though the user is wearing mittens
with no fine control of the fingers. When thinking about how the user
interacts with the worlds that you are building, think about the kinds
of actions a person can do while wearing mittens.
large transmitter and one or more small sensors.
transmitter emits an electromagnetic field.
sensors report the strength of that field at their location to a computer
sensors can be polled specifically by the computer or transmit continuously.
rigid structures with multiple joints
one end is fixed, the other is the object being tracked
could be tracking users head, or their hand
physically measure the rotation about joints in the armature to compute position and orientation
structure is counter-weighted - movements are slow and smooth
Knowing the length of each joint and the rotation at each joint, location and orientation of the end point is easy to compute.
small transmitter and one medium sized sensor
each transmitter emits ultrasonic pulses which are received by microphones on the sensor (usually arranged in a triangle)
as the pulses will reach the different microphones at slightly different times, the position and orientation of the transmitter can be determined
LEDs or reflective materials are placed on the object to be tracked
video cameras at fixed locations
capture the scene (usually in IR)
image processing techniques are used to locate the object
With fast enough processing you can
also use computer vision techniques to isolate a head in the image and
then use the head to find the position of the eyes
In CAVE2 we use a Vicon optical
camera tracking system, in hte new classroom space it will be an
One can also use much less expensive
camera based systems like the Xbox Kinect to track multiple people in a
small area. Staying within the field of view and focal area is very
important here since you only have a single camera, and users are
usually limited to facing a single direction
gyroscopes / accelerometers used
knowing where the object was and its change in position / orientation the device can 'know' where it now is
tend to work for limited periods of
time then drift as errors accumulate.
For outdoors work GPS can give the
general location of the user (3 meter accuracy horizontally in open
field, much less as you get near buildings). Vertical accuracy is more
like 10 meters, so that is not very useful right now. Newer
constellations of satellites, as well as ground based reference stations
can substantially improve on that accuracy.
For better vertical accuracy devices
are now including barometers. These work pretty well when calibrated to
the local air pressure, which may be constantly changing as the weather
A common way for camera based AR
systems to orient themselves is by using fiducial markers. These could
be pieces of paper held in front of a camera where a 3D object suddenly
appears on the paper (when looking at the camera feed). They can also be
placed on walls, floors, ceilings so moving users with cameras can
locate where they are.
Combining multiple forms of tracking
is a very good way to improve tracking in complex situations, just as
our phones GPS based information is improved if we also have the WiFi
Intersense uses a combination on Acoustic and Inertial. Inertial can deal with fast movements and acoustic keeps the inertial from drifting
Outdoor AR devices can use GPS and
orientation / accelerometer information to get a general idea where the
user is, and then use the on board camera to refine that information
given what should be in sight from that location at that orientation.
a current popular version of this is
Inside Out Tracking
The HoloLens and
future derived headsets don't want to rely on external markers or
emitters or cameras, they want to be able to track using just what the
user is wearing with cameras and sensors looking outward. This requires
a combination of sensors including inertial (for orientation tracking),
and visible light camera(s) and depth camera(s) for position tracking,
and all of the together are used for space mapping.
With the HoloLens you
first have to help the headset map the space by looking all around the
room you are in, and then the HoloLens remembers that room, and as you
move between multiple rooms it adds to its internal map and combines
independent spaces together into larger contiguous spaces.
Google's Project Tango and others use similar combinations of sensors on headsets and smartphones
OptiTrack - http://v110.wiki.optitrack.com/index.php?title=Quick_Start_Guide:_Getting_Started
Rather than looking for a generic solution, specialized VR applications are usually better served using specialized tracking hardware. These pieces of specialized hardware generally replace tracking of the user with an input device that handles navigation
For Caterpillar's testing of their cab designs they place the actual cab hardware into the CAVE so the driver controls the virtual loader in the same way the actual loader would be controlled. The position of the gear shift, the pedals, and the steering wheel determine the location of the user in the virtual space.
A treadmill can be used to allow walking and running within a confined space. More sophisticated multilayer treadmills or spheres allow motion in a plane.
A bicycle with handlebars allows the user to pedal and turn, driving through a virtual environment
a bit more about latency
Accuracy needs to come from the tracker manufacturers. Latency is partly our fault.
Latency is the sum of:
another important point about latency is the importance of consistent latency. If the latency isn't too bad, people will adapt to it, but its very annoying if the latency isn't consistent - people can't adapt to jitter.
How many sensors is enough?
Tracking the head and hand is often
enough for working with remote people as avatars.
A user putting on sensors and another user dancing with 'the thing growing' at SIGGRAPH 98 in Orlando. This application tracked the head, both hands and the lower back.
In the simplest Virtual Reality world
there is an object floating in front of you in 3D which you can look at.
Moving your head and / or body allows you to see the object from
different points of view. This is also the default in an Augmented
In Fish Tank VR
setups, or HMDs like the original Rift, the user is typically sitting
with a limited space to move in. With more modern HMDs like the newer
Rift and VIVE, or room scale systems like the CAVE and CAVE2 the user
has a larger space to move walk in, jump up, kneel down, lie on the
floor etc, and arcade level systems give you larger rooms to move around
in using just your body. Unseen Diplomacy pushes this navigation
in a limited space to the extreme - https://www.youtube.com/watch?v=KirQtdsG5yE
But often you want to
move further, or in effect move a different part of the virtual world
into the area that you can easily move through.
Common ways of doing
this involve using a joystick or directional pad on the wand to move
'drive' through the space as though you were in a first person video
game, which gives you a better sense of continuity in the virtual world,
though this can risk simulator sickness, which is why most current
consumer HMD games don't do it.
Another simple option
is using a wand to point to where you want to go and teleporting from
one place to another within the virtual world, which is what most
current consumer HMD games use.
Another option is to
use large gestures such as swinging both your arms (holding two wands)
up and down as though you were jogging to tell the system you want to
walk, or pointing in the direction you want to go if you have hand and
finger tracking. VR Dungeon Knight uses the jogging metaphor - https://www.youtube.com/watch?v=TTolJoKUcks
options as discussed above include real devices like bicycles,
treadmills, car interiors, fighter plane interiors, train interiors.
In Augmented Reality
you are limited to your actual physical movements (or the movements of a
real car or a real bike) as the Augmented Reality world is anchored to
the real world.
Here are several controllers that we
used in the first 10 years of the CAVE. The common elements on these
included a joystick and three buttons (same as the 3 buttons on unix /
IRIX computer mice.)
original 'hand made' CAVE / ImmersaDesk wand based on Flock of Birds tracker from 1992-1998. It would have been really handy to have had rapid prototyping machines back then to make these.
New CAVE/ ImmersaDesk wanda with a
similar joystick plus three buttons, based on Flock of Birds tracker
InterSense controller, joystick plus
4 buttons, used in the early 2000s
When we designed and
built CAVE2 form 2009-2012 we wanted to go with controllers that were
easier to replace if they were broken, so we shifted to playstation
controllers with marker balls mounted on the front, again giving us
multiple buttons, a d-pad and joystick up top and a trigger below. This
was the first CAVE controller to be cable free, so it needs to be
charged up like any wireless game controller.
The d-pad added a lot
of advantages for interacting with menus. Using lower trigger was a big
advantage over the joystick for navigation.
The initial VIVE controllers followed
a similar pattern.
The Oculus Touch has similar features
in a more compact arrangement
More buttons give
you more options that can be directly controlled by the user, but may
also make it harder to remember what all those buttons do. More buttons
also makes it harder to instruct a novice user what he/she can do.
Asking a new user to press the 'left button' or the 'right button' is
pretty easy but when you get to 'left on the d-pad' or 'press the right
shoulder button' then you have a more limited audience that can
Most VR software
today automatically brings up context sensitive overlays about what the
various controller buttons do to help users get familiar with the
Game controllers (and
things that look like game controllers) have a big advantage in terms of
familiarity for people who play games, and have often gone through
pretty substantial user testing, and are often relatively inexpensive to
More interesting is the ability to manipulate objects in the virtual world, or manipulate the virtual world itself.
The user is given a
very 'human' interface to VR ... the person can move their head or hand,
and move their body, but this also limits the user's interaction with
the space to what you carry around with you. There is also the obvious
problem that you are in a virtual space made out of light, so its not
easy to touch, smell, or taste the virtual world, though all of the
senses have been used in various projects.
Even if you want to just 'grab' an object there are several issues involved.
A `natural' way to
grab a virtual object is to move your hand holding a controller so that
it touches the virtual object you want to manipulate. At this point the
virtual environment could vibrate the controller, or add a halo to the
object, or make the object glow, or play a sound to help you know that
you have 'touched' the object. You could then press a button to 'grab'
the object, or have the object 'jump' into your hand.
While this kind of motion is very natural, navigating to the object may not be as easy, or the type of display may not encourage you to 'touch' the virtual objects. It can also be impractical to pick up very large objects because they can obscure your field of view. In that case the users hand may cast a ray (raycasting) which allows a user to interact at a distance. One fun thing to try in VR is to act like a superhero and pick up a large building or train and throw them around - turns out that when you pick them up you cant see anything else - 'church chuck'.
One common way of interacting with the virtual world is to take the concept of 2D menus from the desktop into the 3D space of V.R. These menus exist as mostly 2D objects in the 3D space
This can be extended from simple buttons to various forms of 2D sliders.
These menus may be
fixed to the user, appearing near the users head, hand, or waist, so as
the user moves through the space, the menus stay in a fixed position
relative to the user. Alternatively the menus may stay at a fixed
location in the real space, or a fixed location in the virtual space.
e.g. from the current crop of HMD experiences:
Google Earth VR maps
the controls to buttons on the controller with tooltips floating nearby
Vanishing realms has a nice UI at the user's waist where you store keys and food and weapons and then to interact the user intersects that menu with one of the controllers as though you were reaching down to grab something off your belt.
Tilt brush has a nice
2-handed 3D UI where the multi-faceted menu appears in one hand and you
select from it with the other hand
Bridge Crew has a nice UI with (lots of) buttons that you have to 'press' with your virtual hands - including virtual overlay text to remind you which is which
Job Simulator has a
nice UI built into the 3D environment itself based on object
In either case these
menus may collide with other objects in the scene. One way to avoid this
is to turn z-buffering off for these menus so they are always visible
even when they are 'behind' another object.
When you get into
more complicated virtual worlds for design or visualization the number
of menus multiplies dramatically as does the need for textually naming
them so more traditional menus are more common in these domains. There
are several ways to activate these kinds of menus - using the and as a
pointer to select menu items, using a d-pad to move through a menu,
intersecting the wand itself with the menu items. Using a d-pad tends to
work better than a pointer if you have a single menu to move through as
it can be hard to hold your hand steady when pointing at complex menus.
Another option is to
use a head-up display for the menu system where you look at the menu
item you want to select and choose it with a controller. Here is a
version of that we did back in 1995 with the additional option for
selecting the menu options by voice. The HoloLens uses a similar menu
In Augmented Reality
this can be trickier since you also have the real world involved, both
in terms of the graphics, and in terms of the people you share the world
with. Google glass's physical control on the side of glass worked OK for
small menu systems, augmented by voice. Microsoft's HoloLens pinch
gesture for selecting within the field of view of the camera didn't work
quite so well, but the physical button did, as long as you keep the
physical button with you.
One way to get around
the complexity of the menus is to talk to the computer via a voice
recognition system. This is a very natural way for people to
communicate. These systems are quite robust, even for multiple speakers
given a small fixed vocabulary, or a single speaker and a large
vocabulary, and they are not very expensive.
However, voice commands can be hard to learn and remember.
In the case of VR applications like the Virtual Director from the 90s, voice control was the only convenient way to get around a very complicated menuing system
The HoloLens makes effective use of voice to rapidly move through the menus without needing to look and pinch.
do not add any extra encumberance to the user in dedicated rooms, and
small wireless microphones are a small encumberance. HMDs or the
controllers can include microphones which work pretty well.
Problems can occur in
projection-based systems since there are multiple users in the same
place and they are frequently talking to each other. This can make it
difficult for the computer to know when you are talking to your friends
and when you are talking to the computer. There is a need for a way to
turn the system on and off, and often the need for a personal
Voice is becoming
much more common now for our smartphones and our homes, and soon our
cars, as the processing and the learning can be offloaded into the
This also seems like
a very natural interface. Gloves can be used to accurately track the
position of the user's hand and fingers. Some simple gloves track
contacts (e.g. thumb touching middle finger), others track the extension
of the fingers. The former are fairly robust, the latter are still
somewhat fragile. Camera tracking as in the Kinect and AR systems can do
a fairly good job with simple gestures, and are rapidly improving.
with tracking hands improve if you have two of them. Multigen's
SmartScene is a good example using two Fakespace Pinchgloves for
manipulation. The two handed interaction of Tilt Brush seems like a
modern version of the SmartScene interface.
Full body tracking
involving a body suit or gives you more opportunities for gesture
recognition, and simple camera tracking does a pretty good job with
gross positions and gestures.
One issue here, as
with voice, is how does the computer decide that you are gesturing to it
and expect something to happen, as opposed to gesturing to yourself or
Several different models of PHANToMs
a PHANToM in use as part of a cranial implant modelling application with a video here - https://www.youtube.com/watch?v=cr4u69r4kn8
The PHANToM gives 6 degrees of freedom as input (translation in XYZ and roll, pitch, yaw) 3 degrees of freedom in output (translation in XYZ)
You can use the
PHANToM by holding a stylus at the end of its arm as a pen, or by
putting your finger into a thimble at the end of its arm.
The 3D workspace ranges from 5x7x10 inches to 16x23x33 inches
and a nice introductory video here -
There is also work
today using air pressure and sound to create a kind of sense of touch,
though not as strong as a PHANToM, they operate over a wider area than
We often compensate for the lack of one sense in VR by using another. For example we can use a sound to replace the sense of touch, or a visual effect to replace the lack of audio.
In projection based
VR systems you can carry things with you, for example in the 90s we
could carry PDAs giving an additional display, handwriting recognition,
or a hand-held physical menu system. Today smart phones or tablets
provide the same functionality with infinitely more capabilities.
In smaller fish tank
VR systems, or in hybrid systems like CAVE2 you have access to
everything on your desk which can be very important when VR is only part
of the material you need to work with.
In Augmented Reality
you have access to everything in the real world so interacting with the
real world is pretty much the same as before, especially with a head
mounted AR system.