Week 4

Vision / Visuals and Audio

sources include: James Helman's Siggraph '93 Applied VR course notes, Dennis Proffitts '94 Developing Advanced VR Applications course notes, Lou Harrison's Siggraph '97 Stereo Computer Graphics for Virtual Reality notes, and the IRIS Performer guide.

but first - homework for this week - due Friday 9/21 at 9pm Chicago time

One nice set of AR applications for smartphones has been those that let you point your phone up at a part of the sky (usually at night, but of course they work anytime) and have the app tell you what stars, planets, etc. are visible from where you are. If you want to know if that bright light on the horizon is a UFO or the planet Venus, then this might help. For this homework you should try out one of these apps - Pocket Universe is a nice one for ios, or Star Tracker or SkyView on Android. Knowing where you are on Earth and the orientation of your phone, its pretty easy to tell what celestial objects you are looking at since they don't move very fast. Take one screenshot when you are looking at the moon and another looking at Saturn. Please do not point your camera towards the sun while doing this - its bad for the camera's optics. As with the google Translate homework, add these two images to your homework page and then imagine a future where this app is built into typical AR glasses as a way to get more information about the world around you and quickly answer the question 'what is that?' In this case let's just focus on the Sky. Write about what other data sources might you want to integrate into an app like this? What different layers of information might you want to combine in this view.

The eye has a bandwidth of approx 1 gigabyte per second - much greater than the other senses

Temporal Resolution

The real world doesn't flicker. Theatrical films flicker and displays flicker because the image is constantly being refreshed. If the image isn't refreshed fast enough, we perceive flickering. Some people can perceive flickering even at 60Hz (the image being refreshed 60 times per second) for a bright display with a large field of view but most people stop perceiving the flicker between 15Hz (for dark images) and 50Hz (for bright images). LCD panels tend to have 60hz or 120hz refresh rates these days (though manufacturers often claim to have higher rates. Direct view LED tiles have similar rates.

Note that this is the rate that the hardware refreshes, which is (mostly) independent from the rate the graphics are being refreshed.


The eye can detect a range of 10^14 in luminance, but we can only see a range of 100:1 at the same time. The size of our pupil automatically adjusts to the amount of light available, allowing us to shift our window of much narrower range of luminance detection across a much larger range. If we spend 30 minutes dark adapting we can dramatically increase our vision at the low end.

The human eye is sensitive to ratios of intensities not absolute magnitude. Brightness = Luminance^0.33. To make something appear n times brighter the luminance must be increased by n^3.

Displays still can not reproduce the full range of luminance that human beings can see, but the range increases every year, and with the new focus on OLED displays, high contrast displays, and direct view LED displays this is getting much better.


Most perceptual processes are driven by intensity not colour. The human motion system is colour blind, depth perception is colour blind, object recognition is colour blind.

but uniquely coloured objects are easy to find

Displays still can not reproduce the full colour spectrum that human beings can see, but the range increases every year, and with the new focus on OLED displays, high contrast displays, and direct view LED displays this is getting much better.

Field of View

Each eye has approximately 145 degrees horizontal (45 degrees towards the nose and 100 degrees to the side) and 100 degrees vertically (30 degrees up and 70 degrees down)

Current VR headsets like the VIVE and the Rift have 110 degree stereo FoV

Google Glass display has FoV of 15 degrees monoscopic in the corner of your vision
Microsoft HoloLens has a 20 degree stereo FoV in the center of your vision

With human scale systems like the CAVE and CAVE2, or even a planetarium dome, the field of view depends on the size and location of the physical displays and the glasses used - 6 sided CAVEs surrounded the users with screens, but the glasses typically don't cover the complete field of view (roughly 120 horizontal and 90 vertical.

Visual Acuity

We often encounter visual acuity as measured with a Snellan Chart at the doctor's office where correct vision is described as being 20/20 (20/X where this viewer sees at 20 feet detail that the average person can see at x feet, 20/200 is legally blind). If you are metricly minded then the fraction is 6/6. https://en.wikipedia.org/wiki/Snellen_chart.

Back in the 90s our monitors at normal distance were 20/40, HMDs were 20/425, BOOMs were 20/85, and the original CAVE was 20/110.

Our current consumer HMDs are around 20/60 with a wide variety of lens distortion effects to give us a wider field of view, while most of our laptops and phones have so many pixels that they give us full 20/20 across a small range of our vision.

How the Eye Works

bill sherman's diagram

Human eye has 2 types of photosensitive receptors: cones and rods



The cones are highly concentrated at the fovea and quickly taper off around the retina. For colour vision we have the greatest acuity at the fovea, or approximately at the center of out field of vision. Visual acuity drops off as we move away from the center of the field of view. However, we are very sensitive to motion on the periphery of our vision, so we can see movement even if we can't see what is moving.

The rods are highly concentrated 10-20 degrees around the fovea, but almost none are at the fovea itself - which is why if you are stargazing and want to see something dim you can not look directly at it.

What happens when we walk from a bright area into a dark area, say into a movie theatre? When we are outside the rods are saturated from the brightness. The cones which operate better at high illumination levels provide all the stimulus. When we walk into the darkened theatre the cones don't have enough illumination to do much good, and the rods take time to desaturate before they can be useful in the new lower illumination environment.

It takes about 20 minutes for the rods to become very sensitive, so dark adjust for about 20 minutes before going stargazing.

Since the cones do not operate well at low light intensities we can not see colour in dim light as only the rods are capable of giving us information. The rods are also more sensitive to the blue end of the spectrum so it is especially hard to see red in the dark (it appears black).

There is also the optic nerve which is 10-20 degrees away from the fovea which connects your eye to your brain. This is the blind spot where there are no cones and no rods. We can not see anything at this point and our brain compensates by filling in that part of our vision with surrounding colours and simple patterns.

Here is a nice page on colour blindness: http://www.toledo-bend.com/colorblind/Ishihara.asp

8 percent of men
1 percent of women

Here is a nice web-based desktop simulator that uses your webcam to show what the world looks like to a color blind person - https://sciencedemos.org.uk/colorblind.php and another good phone based tool is the Chromatic Vision Simulator at http://asada.tukusi.ne.jp/cvsimulator/e/

One of the major takeaways from this is that as our display technology gets better and better we will increasingly be running into the limitations of the human eye - just as we are on our smartphone or laptop screens.

One of the most important things in VR, and increasingly in AR, is creating an artificial world with a sense of depth.

Real World Depth Cues

  1. Occlusion (hidden surfaces)
  2. Perspective Projection (parallel lines meet at infinity)
  3. Binocular Disparity
  4. Motion Parallax (due to head motion)
  5. Convergence (rotation of the eyes to view a close object)
  6. Accommodation (change of shape of eye to view a close object - focus)
  7. Atmospheric (fog)
  8. Lighting and Shadows

Computer graphics give 1,2,7,8
VR gives 3,4,5
AR gives 4
some specialized multi-plane displays can give 6.

In VR, as in stereo movies or stereo photographs, the brain is getting two different cues about the virtual world. Some of these cues indicate this world is 3D (convergence and stereopsis). Some of these cues indicate that the world is flat (accommodation). The eyes are focusing on the flat screen but they are converging depending on the position of the virtual objects. This can lead to headaches, or more serious forms of simulator sickness, but in general its not a serious problem, especially if you are doing it for only a couple hours at a time.

If we start thinking about wearing AR glasses all the time then we may run into more issues.

Note there are some potential issues of having young children seeing this kind of mediated 3D and research is ongoing there. Most of the current headsets are for 12 or 13 year olds or older to be safe.

How to Generate and View Stereo Images

We have stereo vision because each eye sees a slightly different image. Well, almost all of us (90-95%) do.

Presenting stereo imagery through technology is over 150 years old. Here is a nice history of the stereopticon http://www.bitwise.net/~ken-bill/stereo.htm

The device is in charge of getting the separate views to your eyes

As a kid you may have had a View-Master(tm) and saw the cool 3D pictures. Those worked exactly the same way as the steropticon. Your left eye was shown one image on the disc, while your right eye was shown a different image.

Head Mounted Displays work in a very similar way, with the addition of some special lenses to give you a wider field of view than a stereopicon or viewmaster typically provide.

But you don't have to separate the images.

As a kid you might have read some 3D comics which came with red/ blue (anaglyphic) glasses, or you might have even seen a movie with the red/blue glasses (maybe down at U of Chicago's Documentary Film Group with films like 'House of Wax' or 'Creature from the Black Lagoon' or 'It Came from Outer Space').

If you have a pair of red/cyan glasses and a correctly calibrated display then this image will become 3D.) The coloured lenses make one image more visible to one of your eyes and less visible to your other eye, even though both eyes are viewing both images.

and here is another one from an Ocean Going Core Drilling ship. The red/blue or red/cyan trick works ok for greyscale images (like old movies or black and white comics) but has a hard time with color.

and here is a nice movie on youtube https://www.youtube.com/watch?v=MQEkFppWaRI

Another inexpensive way to show stereo imagery is to draw two slightly different images onto the screen (or onto a piece of paper), place them next to each other and tell the person to fuse the stereo pair into a single image without any additional hardware. This is easy for some people, very hard for other people, and impossible for a few people.

Some of these images require your left eye to look at the left image, others require your left eye to look at the right image (cross-eye stereo).

To see the pictures below as a single stereo image look at the left image with your right eye and the right image with your left eye. If you aren't used to doing this then try this: Hold a finger up in front of your eyes between the two images on the screen and look at the finger. You should notice that the two images on the screen are now 4. As you move the finger towards or away from your head the two innermost images will move towards or away from each other. When you get them to merge together (that is you only see 3 images on the screen) then all you have to do is re-focus your eyes on the screen rather than on your finger. Remove your finger, and you should be seeing a 3D image in the middle.

Passively polarized stereo - https://en.wikipedia.org/wiki/Polarized_3D_system

In the 1950s when stereo films like 'Creature from the Black Lagoon' were shown theatrically, the theatres did not use red/cyan. They used a system pretty much identical to what 3D movies used in the 80s and again today - passive polarization. In this case one projector shows the left eye image with a polarizing filter. A second synced projector shows the right eye image with a different polarizing filter. The screen preserves polarization and the viewers wear lightweight polarized glasses that allow the correct image through and not the wrong one.

This same technique can be used on flat panel displays where a layer is added on top of the display with lines of alternating polarization matching up with the rows of pixels below. The same glasses are used but now one eye only sees the even lines of the display while the other eye only sees the odd lines. This has the disadvantage of cutting your vertical resolution in half.

<image from https://en.wikipedia.org/wiki/Polarized_3D_system>

An alternative approach for both projectors and flat panels is to use Active Stereo - https://en.wikipedia.org/wiki/Active_shutter_3D_system

In active stereo the projector or the panel shows the image for the left eye, then black, then the image for the right eye, then black, and then repeats. The viewer wears glasses with LCD shutters the turn either clear or opaque and the shutter for the appropriate eye is open when the display shows the image for that eye. This requires the display to also talk to the glasses (usually via IR) and for the glasses to be powered.

<image from https://en.wikipedia.org/wiki/Active_shutter_3D_system>

A very important thing to keep in mind is what kind of display you will be using, and the space that it will be used in.

Projection-based systems can give much larger fields of view, but typically need to be in dimly lit or dark rooms. If the system is front-projection then how close will the user be able to get to the display before they begin to cast shadows on the screen? If its rear-projection from behind the screen then how much space are you prepared to waste behind the screen.

Large flat panel displays have a different problem - they are designed for on-axis viewing and the further you are off-axis (horizontal or vertical) increases the chance that you will see degraded colour / contrast / stereo vision. They also tend to have borders which can be distracting.

Head mounted VR displays avoid issues of dark rooms and off-axis viewing but you still need enough space for the user to move around in and you need to make sure the user doesn't hurt himself/herself while moving around without seeing the real world.

Head mounted AR displays allow the user to see the real world, making them safer than VR displays, but as these displays may be used in a dim room or in bright sunlight the graphics need to be visible across that range of use cases, similar to how you want to be able to use your phone or smart watch in a variety of settings, so the contrast and brightness ranges need to be greater than for displays that are only used indoors.

Some Terminology

Horizontal Parallax - when the retinal images of an object fall on disparate points on the two retinas, these points differ only in their horizontal position (since our eyes are at the same vertical position). The value given by R - L.

Stereo Window (Plane) - the point at which there is no difference in parallax between the two eye views - usually at the same depth as the monitor screen or the projection surface. HMD optics create their own projection plane, commonly about 1-2 meters away from the user.

Homologous Points - points which correspond to each other in the separate eye views

Interocular Distance - the distance between the viewer's left and right eyes, usually about 2.5 inches - in order for virtual objects to appear to have the exact correct size the VR system needs to know your specific interocular distance. https://en.wikipedia.org/wiki/Interpupillary_distance

Positive Parallax - the point lies behind the stereo window

Zero Parallax - the point is at the same depth as the stereo window

Negative Parallax - the point lies in front of the stereo window

Vertical Displacement - vertical parallax between homologous points relative to the line that the two eyes form - this should be zero in a correctly calibrated setup

Interocular Crosstalk (Ghosting) - when one eye can see part of the other eye's view as well - this should also be zero in a correctly calibrated setup, but much harder to achieve, especially in scenes with very high contrast

<image from https://jivp-eurasipjournals.springeropen.com/articles/10.1186/s13640-017-0210-5>

off-axis projection

In an HMD or the boom, the screens in front of each user's eye move with the user - that is the screens are always perpendicular to the eye's line of site (assuming the eyes are looking straight ahead). This allows the traditional computer graphics 'camera paradigm' to be extended so that there are 2 cameras in use - one of each eye.

In the CAVE or Fish Tank VR, this is not the case. The projection planes stay at a fixed position as the user moves around. The user may not be looking perpendicular to the screen and certainly cant be looking perpendicular to all of the screens in the CAVE simultaneously - in this case 'off-axis projection' is used, and the math/geometry is a bit more complex to be more general. One of our former students, Robert Kooima, has a nice page on the geometry and math involved : http://csc.lsu.edu/~kooima/articles/genperspective/

Updating Visuals based on Head Tracking

Naively we would like to update the graphics every frame in order to use the most recent head (eye) positions.

Since there will be jitter in the tracker values and latencies in getting information from the tracking hardware to deal with, this may result in the image jittering.

One way to avoid this is to only update the image when the head has moved (or rotated) a certain amount so the software knows that the motion was intentional.

Another option is to interpolate between the previous and current position and rotation values in order to smooth out this motion. This results in smoother transitions but will also increase the lag slightly.

Another option is to extrapolate into the future by predicting how the user is going to move in the next couple seconds and proactively render for the position you believe the user will be in.

How to generate Graphics Quickly

Naive Approach

  1. poll head sensor for location and orientation
  2. poll hand sensor(s)for location and orientation
  3. get any button presses or other state change information
  4. update virtual world
  5. draw world for left eye
  6. draw world for right eye
  7. display images to the user(s)

Pipelined approach (from SGI Performer manual)

While in videogames it is good to maintain a particular frame rate, in VR it is much more important to do that.

Current game engines are pretty good at optimizing what is drawn based on where the user is in the scene. If you create your own software from scratch you will need to take care of those things.

Models can be replaced by models with less detail

3D models of far away objects can be replaced by texture mapped billboards

The horizon can be moved in - moving in Z-far and perhaps covering this with fog

A less complex lighting model can be used

Simulator Sickness

2 things are needed: a functioning vestibular system (canals in the inner ear) and a sense of motion

Symptoms: Nausea, eyestrain, blurred vision, difficulty concentrating, headache, drowsiness, fatigue

These symptoms can persist after the VR experience is finished.

Causes: still unknown but one common hypothesis is a mismatch between visual motion (what your eyes tell you) and the vestibular system (what your ears tell you)

Why would this cause us to become sick? Possibly an inherited trait - a mismatch between the eyes and ears might be caused by ingesting a poisonous substance so vomiting would be helpful in that case.

Another hypothesis deals with the lack of a rest frame. When a user views images on a screen with an obvious border that border locates the user in the real world. Without that border the user loses his/her link to the real world and the affects of motion in the virtual world are more pronounced.

Current HMD environments default to allowing the user to walk around a physical space but then use teleporting rather than controller based 'flying' common in first person shooters to move larger distances. This is primarily to reduce the chances of people getting sick.

Fighter pilots have 20 to 40 percent sickness rates in flight simulators - but experienced pilots get sick more often than novice pilots.

In a rotating field when walking forward, people tilt their heads and feel like they are rotating in the opposite direction.

If a person is walking on a treadmill holding onto a stationary bar and you change the rate the the visuals are passing by, it will feel to the person like the bar is pushing or pulling on their hands.

Open fields are less likely to cause problems than walking through tight tunnels; tunnels are very aggressive  in terms of peripheral motion. This doesn't mean that you should not have any tunnels, but you should be careful how much time the users spend there.

This all affects the kinds of worlds you create and how long a person can safely stay in that world.

Its easy (and fun) to induce vertigo. Most people really seem to enjoy jumping off of high places and falling in VR.

Pokemon Incident

Clip available on YouTube:

December 16 1997

685 schoolchildren taken to hospitals- feeling sick while watching Pokemon

12 Hz red - blue flicker scene lasting about 5s roughly 20mins into the program

Show aired in several major cities (Tokyo, Osaka, etc) and then excepts were shown on the nightly news after reports came in - causing more cases. Broadcast of the show was cancelled in 30 other cities.

Pokemon incident was the first occurrence on a mass scale

New type of trigger, not just rapid light/dark - this is now known as "chromatic sensitive epilepsy."

With VR, you very likely have visuals covering the user's field of view, like a child up close to the TV, and you have a full array of special effects to chose from. Choose carefully.

Pokemon on the brian


Audio in virtual environments can be used for several purposes:

- audio can help the virtual environment give the user feedback that they have 'touched' something, that they have activated a menu item, that a new user has entered the space

- ambient audio can help set the mood of the virtual environment, and make the environment seem more real

- directional audio can be good to tell a user where something is occurring that might be out of their current field of view

- audio can also be used to send speech between multiple participants, but we will talk more about that in a future lecture

In a virtual world you have full control of all audio sources. In an augmented reality world you will need to blend your audio with the audio of the real world

Simple audio can be monaural with a single speaker but there are many advantages to having multiple speakers as in a surround sound system, or using headphones to give directional audio where sounds occur form a particular location. As you get close to these sounds they get louder and are more localized. This can tell you where things have happened, or lead you towards something.

Subwoofers can be good for rumbling. Connecting them to a vibrating floor can add additional feedback.

The sounds themselves can be prerecorded clips that are played back or looped, or sounds can be synthesized. Synthesized sounds can be useful in scientific environments to give feedback on the current state of the world

Its important to have high-quality prerecorded clips - background hiss can be very noticeable.

You will usually want to play back multiple sounds at the same time - some of these will be looping environmental sounds and others will be sounds played when certain events occur. There are several free audio libraries out there that can handle these things in a pretty straightforward way.

Its important to balance all of these sounds so that some sounds do not unintentionally hide others. You also need to be careful that you do not overload the speakers.

You also need to be careful that you are not playing too many sounds at the same time, or playing the same sound too many times. For example when there are multiple people running, or multiple water droplets hitting a pool, it is probably a bad idea to play a sound for each of those events or you will just hear noise. One method is to only play an event sound if it hasn't been played in the previous n seconds.

Its usually a good idea to load all of your sounds in at the beginning of the program and store them in memory. Any repeating sound can be set to repeat and play at zero volume, and then faded up when needed and faded down when not needed.

As with visuals, its important to audition your audio in the environment to make sure they work.

Coming Next Time

Project 1 presentations

9/19/18 - added links to color blindness simulators