Week 7 Homework - due
9pm Chicago time on Friday 10/13/17
project 1 you used VR to create your dream office - a new trend
in AR apps is evaluating virtual furniture placed within the
room where you are thinking about putting it. For this homework
you should try out one of these apps - IKEA has a couple,
IStaging is another. Take a room you know, place a piece of
appropriate furniture in it and take a screen capture. Then do
the same thing for a place you normally wouldn't see a piece of
furniture like that - like a bed in the middle of Halsted St. or
a stove in the middle of campus. Add these two snapshots to
another homework page and write down your thoughts on how AR
applications like this could be used in the future.
The primary purpose of tracking is to update the visual display based on the viewers head position and orientation.
Ideally we would track the users eyes directly, and while that is possible, it is cumbersome, and not overly necessary. Instead of tracking the viewer's eyes directly, we track the position and orientation of the user's head. From this we determine the position and orientation of the user's two eyes.
We may also be tracking the user's hand(s), fingers, legs, or other interface devices.
Want tracking to be as 'invisible' as possible to the user.
Want the user to be
able to move freely with few encumbrances
Want to be able to have multiple 'guests' nearby
Want to track as many objects as necessary
Want to have minimal delay between movement of an object (including the head and hand) and the detection of the objects new position / orientation (< 50 msec total)
Want tracking to be
accurate (1mm and 1 degree)
In order to interact
with the virtual world beyond moving through it, we probably
need to track at least one of the user's hands as well, and
preferably both of them. Tracking the position and orientation
of the hand allows the user to interact with the virtual world
or other users as though the user is wearing mittens with no
fine control of the fingers. When thinking about how the user
interacts with the worlds that you are building, think about the
kinds of actions a person can do while wearing mittens.
large transmitter and one or more small sensors.
transmitter emits an electromagnetic field.
sensors report the strength of that field at their location to a computer
sensors can be polled specifically by the computer or transmit continuously.
rigid structures with multiple joints
one end is fixed, the other is the object being tracked
could be tracking users head, or their hand
physically measure the rotation about joints in the armature to compute position and orientation
structure is counter-weighted - movements are slow and smooth
Knowing the length of each joint and the rotation at each joint, location and orientation of the end point is easy to compute.
small transmitter and one medium sized sensor
each transmitter emits ultrasonic pulses which are received by microphones on the sensor (usually arranged in a triangle)
as the pulses will reach the different microphones at slightly different times, the position and orientation of the transmitter can be determined
LEDs or reflective materials are placed on the object to be tracked
video cameras at fixed
locations capture the scene (usually in IR)
image processing techniques are used to locate the object
With fast enough
processing you can also use computer vision techniques to
isolate a head in the image and then use the head to find the
position of the eyes
In CAVE2 we use a Vicon
optical camera tracking system, in hte new classroom space it
will be an OptiTrack system.
One can also use much
less expensive camera based systems like the Xbox Kinect to
track multiple people in a small area. Staying within the field
of view and focal area is very important here since you only
have a single camera, and users are usually limited to facing a
gyroscopes / accelerometers used
knowing where the object was and its change in position / orientation the device can 'know' where it now is
tend to work for
limited periods of time then drift as errors accumulate.
For outdoors work GPS
can give the general location of the user (3 meter accuracy
horizontally in open field, much less as you get near
buildings). Vertical accuracy is more like 10 meters, so that is
not very useful right now. Newer constellations of satellites,
as well as ground based reference stations can substantially
improve on that accuracy.
For better vertical
accuracy devices are now including barometers. These work pretty
well when calibrated to the local air pressure, which may be
constantly changing as the weather changes.
A common way for camera
based AR systems to orient themselves is by using fiducial
markers. These could be pieces of paper held in front of a
camera where a 3D object suddenly appears on the paper (when
looking at the camera feed). They can also be placed on walls,
floors, ceilings so moving users with cameras can locate where
forms of tracking is a very good way to improve tracking in
complex situations, just as our phones GPS based information is
improved if we also have the WiFi antennas working.
Intersense uses a combination on Acoustic and Inertial. Inertial can deal with fast movements and acoustic keeps the inertial from drifting
Outdoor AR devices can
use GPS and orientation / accelerometer information to get a
general idea where the user is, and then use the on board camera
to refine that information given what should be in sight from
that location at that orientation.
a current popular
version of this is Inside Out Tracking
HoloLens and future derived headsets don't want to rely on
external markers or emitters or cameras, they want to be able to
track using just what the user is wearing with cameras and
sensors looking outward. This requires a combination of sensors
including inertial (for orientation tracking), and visible light
camera(s) and depth camera(s) for position tracking, and all of
the together are used for space mapping.
the HoloLens you first have to help the headset map the space by
looking all around the room you are in, and then the HoloLens
remembers that room, and as you move between multiple rooms it
adds to its internal map and combines independent spaces
together into larger contiguous spaces.
Google's Project Tango and others use similar combinations of sensors on headsets and smartphones
OptiTrack - http://v110.wiki.optitrack.com/index.php?title=Quick_Start_Guide:_Getting_Started
Rather than looking for a generic solution, specialized VR applications are usually better served using specialized tracking hardware. These pieces of specialized hardware generally replace tracking of the user with an input device that handles navigation
For Caterpillar's testing of their cab designs they place the actual cab hardware into the CAVE so the driver controls the virtual loader in the same way the actual loader would be controlled. The position of the gear shift, the pedals, and the steering wheel determine the location of the user in the virtual space.
A treadmill can be used to allow walking and running within a confined space. More sophisticated multilayer treadmills or spheres allow motion in a plane.
A bicycle with handlebars allows the user to pedal and turn, driving through a virtual environment
a bit more about latency
Accuracy needs to come from the tracker manufacturers. Latency is partly our fault.
Latency is the sum of:
another important point about latency is the importance of consistent latency. If the latency isn't too bad, people will adapt to it, but its very annoying if the latency isn't consistent - people can't adapt to jitter.
How many sensors is enough?
Tracking the head and
hand is often enough for working with remote people as avatars.
A user putting on sensors and another user dancing with 'the thing growing' at SIGGRAPH 98 in Orlando. This application tracked the head, both hands and the lower back.
In the simplest Virtual
Reality world there is an object floating in front of you in 3D
which you can look at. Moving your head and / or body allows you
to see the object from different points of view. This is also
the default in an Augmented Reality world.
In Fish Tank VR setups, or
HMDs like the original Rift, the user is typically sitting
with a limited space to move in. With more modern HMDs like
the newer Rift and VIVE, or room scale systems like the CAVE
and CAVE2 the user has a larger space to move walk in, jump
up, kneel down, lie on the floor etc, and arcade level systems
give you larger rooms to move around in using just your
body. Unseen Diplomacy pushes this navigation in a
limited space to the extreme - https://www.youtube.com/watch?v=KirQtdsG5yE
But often you want to move
further, or in effect move a different part of the virtual
world into the area that you can easily move through.
Common ways of doing this
involve using a joystick or directional pad on the wand to
move 'drive' through the space as though you were in a first
person video game, which gives you a better sense of
continuity in the virtual world, though this can risk
simulator sickness, which is why most current consumer HMD
games don't do it.
Another simple option is
using a wand to point to where you want to go and teleporting
from one place to another within the virtual world, which is
what most current consumer HMD games use.
Another option is to use
large gestures such as swinging both your arms (holding two
wands) up and down as though you were jogging to tell the
system you want to walk, or pointing in the direction you want
to go if you have hand and finger tracking. VR Dungeon
Knight uses the jogging metaphor - https://www.youtube.com/watch?v=TTolJoKUcks
Other specialized options
as discussed above include real devices like bicycles,
treadmills, car interiors, fighter plane interiors, train
Augmented Reality you are limited to your actual physical
movements (or the movements of a real car or a real bike) as the
Augmented Reality world is anchored to the real world.
Here are several
controllers that we used in the first 10 years of the CAVE. The
common elements on these included a joystick and three buttons
(same as the 3 buttons on unix / IRIX computer mice.)
original 'hand made' CAVE / ImmersaDesk wand based on Flock of Birds tracker from 1992-1998. It would have been really handy to have had rapid prototyping machines back then to make these.
New CAVE/ ImmersaDesk
wanda with a similar joystick plus three buttons, based on Flock
of Birds tracker from 1998-2001
joystick plus 4 buttons, used in the early 2000s
designed and built CAVE2 form 2009-2012 we wanted to go with
controllers that were easier to replace if they were broken, so
we shifted to playstation controllers with marker balls mounted
on the front, again giving us multiple buttons, a d-pad and
joystick up top and a trigger below. This was the first CAVE
controller to be cable free, so it needs to be charged up like
any wireless game controller.
d-pad added a lot of advantages for interacting with menus.
Using lower trigger was a big advantage over the joystick for
The initial VIVE
controllers followed a similar pattern.
The Oculus Touch has
similar features in a more compact arrangement
buttons give you more options that can be directly controlled by
the user, but may also make it harder to remember what all those
buttons do. More buttons also makes it harder to instruct a
novice user what he/she can do. Asking a new user to press the
'left button' or the 'right button' is pretty easy but when you
get to 'left on the d-pad' or 'press the right shoulder button'
then you have a more limited audience that can understand you.
software today automatically brings up context sensitive
overlays about what the various controller buttons do to help
users get familiar with the controls.
controllers (and things that look like game controllers) have a
big advantage in terms of familiarity for people who play games,
and have often gone through pretty substantial user testing, and
are often relatively inexpensive to replace.
More interesting is the ability to manipulate objects in the virtual world, or manipulate the virtual world itself.
user is given a very 'human' interface to VR ... the person can
move their head or hand, and move their body, but this also
limits the user's interaction with the space to what you carry
around with you. There is also the obvious problem that you are
in a virtual space made out of light, so its not easy to touch,
smell, or taste the virtual world, though all of the senses have
been used in various projects.
Even if you want to just 'grab' an object there are several issues involved.
`natural' way to grab a virtual object is to move your hand
holding a controller so that it touches the virtual object you
want to manipulate. At this point the virtual environment could
vibrate the controller, or add a halo to the object, or make the
object glow, or play a sound to help you know that you have
'touched' the object. You could then press a button to 'grab'
the object, or have the object 'jump' into your hand.
While this kind of motion is very natural, navigating to the object may not be as easy, or the type of display may not encourage you to 'touch' the virtual objects. It can also be impractical to pick up very large objects because they can obscure your field of view. In that case the users hand may cast a ray (raycasting) which allows a user to interact at a distance. One fun thing to try in VR is to act like a superhero and pick up a large building or train and throw them around - turns out that when you pick them up you cant see anything else - 'church chuck'.
One common way of interacting with the virtual world is to take the concept of 2D menus from the desktop into the 3D space of V.R. These menus exist as mostly 2D objects in the 3D space
This can be extended from simple buttons to various forms of 2D sliders.
menus may be fixed to the user, appearing near the users head,
hand, or waist, so as the user moves through the space, the
menus stay in a fixed position relative to the user.
Alternatively the menus may stay at a fixed location in the real
space, or a fixed location in the virtual space.
e.g. from the current crop of HMD experiences:
Vanishing realms has a nice UI at the user's waist where you store keys and food and weapons and then to interact the user intersects that menu with one of the controllers as though you were reaching down to grab something off your belt.
brush has a nice 2-handed 3D UI where the multi-faceted menu
appears in one hand and you select from it with the other hand
Bridge Crew has a nice UI with (lots of) buttons that you have to 'press' with your virtual hands - including virtual overlay text to remind you which is which
Simulator has a nice UI built into the 3D environment itself
based on object manipulation
either case these menus may collide with other objects in the
scene. One way to avoid this is to turn z-buffering off for
these menus so they are always visible even when they are
'behind' another object.
you get into more complicated virtual worlds for design or
visualization the number of menus multiplies dramatically as
does the need for textually naming them so more traditional
menus are more common in these domains. There are several ways
to activate these kinds of menus - using the and as a pointer to
select menu items, using a d-pad to move through a menu,
intersecting the wand itself with the menu items. Using a d-pad
tends to work better than a pointer if you have a single menu to
move through as it can be hard to hold your hand steady when
pointing at complex menus.
option is to use a head-up display for the menu system where you
look at the menu item you want to select and choose it with a
controller. Here is a version of that we did back in 1995 with
the additional option for selecting the menu options by voice.
The HoloLens uses a similar menu system.
Augmented Reality this can be trickier since you also have the
real world involved, both in terms of the graphics, and in terms
of the people you share the world with. Google glass's physical
control on the side of glass worked OK for small menu systems,
augmented by voice. Microsoft's HoloLens pinch gesture for
selecting within the field of view of the camera didn't work
quite so well, but the physical button did, as long as you keep
the physical button with you.
to get around the complexity of the menus is to talk to the
computer via a voice recognition system. This is a very natural
way for people to communicate. These systems are quite robust,
even for multiple speakers given a small fixed vocabulary, or a
single speaker and a large vocabulary, and they are not very
However, voice commands can be hard to learn and remember.
In the case of VR applications like the Virtual Director from the 90s, voice control was the only convenient way to get around a very complicated menuing system
The HoloLens makes effective use of voice to rapidly move through the menus without needing to look and pinch.
Ambient microphones do not add any extra encumberance to the
user in dedicated rooms, and small wireless microphones are a
small encumberance. HMDs or the controllers can include
microphones which work pretty well.
can occur in projection-based systems since there are multiple
users in the same place and they are frequently talking to each
other. This can make it difficult for the computer to know when
you are talking to your friends and when you are talking to the
computer. There is a need for a way to turn the system on and
off, and often the need for a personal microphone.
is becoming much more common now for our smartphones and our
homes, and soon our cars, as the processing and the learning can
be offloaded into the cloud.
also seems like a very natural interface. Gloves can be used to
accurately track the position of the user's hand and fingers.
Some simple gloves track contacts (e.g. thumb touching middle
finger), others track the extension of the fingers. The former
are fairly robust, the latter are still somewhat fragile. Camera
tracking as in the Kinect and AR systems can do a fairly good
job with simple gestures, and are rapidly improving.
possibilities with tracking hands improve if you have two of
them. Multigen's SmartScene is a good example using two
Fakespace Pinchgloves for manipulation. The two handed
interaction of Tilt Brush seems like a modern version of the
body tracking involving a body suit or gives you more
opportunities for gesture recognition, and simple camera
tracking does a pretty good job with gross positions and
issue here, as with voice, is how does the computer decide that
you are gesturing to it and expect something to happen, as
opposed to gesturing to yourself or another person.
models of PHANToMs
a PHANToM in use as part of a cranial implant modelling application with a video here - https://www.youtube.com/watch?v=cr4u69r4kn8
The PHANToM gives 6 degrees of freedom as input (translation in XYZ and roll, pitch, yaw) 3 degrees of freedom in output (translation in XYZ)
use the PHANToM by holding a stylus at the end of its arm as a
pen, or by putting your finger into a thimble at the end
of its arm.
The 3D workspace ranges from 5x7x10 inches to 16x23x33 inches
and a nice introductory
video here - https://www.youtube.com/watch?v=0_NB38m86aw
is also work today using air pressure and sound to create a kind
of sense of touch, though not as strong as a PHANToM, they
operate over a wider area than the PHANToM.
We often compensate for the lack of one sense in VR by using another. For example we can use a sound to replace the sense of touch, or a visual effect to replace the lack of audio.
projection based VR systems you can carry things with you, for
example in the 90s we could carry PDAs giving an additional
display, handwriting recognition, or a hand-held physical menu
system. Today smart phones or tablets provide the same
functionality with infinitely more capabilities.
smaller fish tank VR systems, or in hybrid systems like CAVE2
you have access to everything on your desk which can be very
important when VR is only part of the material you need to work
Augmented Reality you have access to everything in the real
world so interacting with the real world is pretty much the same
as before, especially with a head mounted AR system.