Project 3 : Depth of Field Effect with an Automatically Adjusting Focal Distance


For this project, I decided that I wanted to implement and improve upon, if possible, the Real-Time Depth of Field Simulation developed at ATI Research (Riguer, et. al.).  The one thing about all depth of field demonstrations is that the focal distance must be modified by the user with some key strokes etc.  I wanted to do something on top of all the prior work by having the focal distance (i.e. the distance at which things appear sharp), to automatically adjust.  Now Mr. Humus had already done something like this by performing a ray-trace into the scene, and rendering multiple views.  Of course, this will effect how you organize your objects so that you can ray-trace through your scene, and also add the cost of rendering your scene many times.  I believe that this can be done differently using image processing and rendering the scene at most one time prior to performing the depth of field effect.  The ultimate goal of this project was to have the focal distance automatically adjust based on what you were looking at on the screen with your own eyes.

In the following text, I will be discussing the overall work done in this project.  What I had to do to make the assets.  What outside code I chose to incorporate.  Where things worked, where things didn't work, and what didn't get done. And finally, the limitations of my implementation

Scene Assets

In this project, I created some simple assets.  The first is a rectangular box for the room.  The floor has a metal grid texture, the walls and ceilings have some brick like textures.  Tied to the user's camera is a gun model that I made.  It's use is to help see how much close things are affected.  Inside the enclosed room is a set of inter-linked tori.  The center torus has a greenish-blue texture and the outsides had some kind of gray camoflauge texture.  This structure was created because although simple, it is also complex because of the holes the tori have.  The holes of the tori will be used to help describe both the strength and weakness of my implementation.

Pass One: Determination of Focal Distance

This is the pass that will ultimately determine the focal distance.  What I did was setup a Framebuffer Object with a 32 x 32 texture as the color attachment.  Drawn to this 32 x 32 texture was the scene from the user's point-of-view.  The trick is to use a small field-of-view.  In my case, I decided that my normal field-of-view was about 60 degrees and my small field-of-view would be 5% of my normal.  When rendering to the 32 x 32 color attachment, I have a shader enabled.  This vertex shader is simple and just passes the transformed vertex.  The fragment shader takes the incoming varying vertex's z-coordinate in eye space, subtracts the near plane and divides the distance between the near and far plane.  What this does is give a linear value from [0,1] of the depth of the vertex.  To understand what I mean, take the following picture.

In the upper left corner we see the linear depth value of the objects in the center of the main rendering.  After the rendering is done, we pull the pixels from the FBO's color attachment texture and process the image.   I use a weighted average that is basically parabolic from the center out to the edges, normalized so the sum of the weights in each row or column were one.  Because I chose to have this be a symmetric weight, I was able to make the filter separable and used a 32 x 1 filter horizontally and vertically.  After all that, I was able to get a depth value from [0,1], which I then scaled based on the visible range to give me an object space focal distance!

Pass Two: Color Map and Depth/Blur Map Creation

The focus of the next pass is to create a color map and a depth/blur map.  The color map will be used for sampling the final pixel values in the same way that Riguer, et. al. performed.  The shader code to determine the depth/blur values was done a little differently by Riguer, et. al.  I decided that this pass will make use of Multiple Render Targets (MRTs), but can be done by performing two passes (one for the color map and the next for the depth/blur map). 

The depth value in the depth/blur map is actually a linear depth in eye space mapped to [0,1].  In the paper by Riguer, et. al., the authors actually calculate the blur factor using the equation described in their paper. 

I decided to simplify this.  The blur value is calculated by taking the distance between the eye space depth and the focal distance value, which is sent in as a uniform, and dividing that by the focal range of the lens.  This is then clamped to a range of [0,1].  The focal range is the range in which the sharpness attenuates until it reaches 100% blurry.  100% blurry means that the circle of confusion for that point will be the same size as the maximum circle of confusion.  Below is a picture showing the results of pass one and two on the left portion of the image.

Here the focal distance is somewhere just past the tori, but before the far wall.  This can be seen by the black bands along the pixels that would be where the ceiling and floor would be. 

At this point, I determined that I can't work on my Powerbook laptop or on any Nvidia GeForce 5 series graphics cards on Linux.  The reason being that the MRT support in the drivers created by Apple is broken, or just hasn't been implemented correctly.  As for the GeForce 5 cards, I don't know what's wrong but I get the same results, as can be seen below.

What we have here can be seen as nothing being written to the second target.  In the first picture, I have gl_FragData[1] being assigned a value last.  In the second I have gl_FragData[0] being assigned a value last.  If I comment out one, the other seems to be writing to the first color attachment.  ALWAYS!!!  I have submitted a detailed bug report to Apple about this.  So I decided that I would only work in Linux and on an NVidia card that was made about the time of the GeForce 6 or 7 series.

At this point, we need to do one more pass.

Pass Three: Sampling The Color Map and Applying the Blur

Here I go back to doing the same kind of thing that Riguer, et. al. did with first technique described in their paper.  Using some randomly created set of vectors as uniforms, I scale the vectors by the circle of confusion calculation for that fragment.  The circle of confusion is simply calculated by using the blur coefficient from pass two, in a scale from [0,1], multiplied by the maximum circle of confusion.  I run through the array of vectors and sample the color map at the determined pixels. 

I deal with color bleading by modifying the contribution that a sample by comparing the depth from focal distance value of the sampled point with the depth from focal distance value of the sample in the center (i.e. the texel that would be used if we weren't doing any blurring).  If the distance is greater then I add 1 to the total weight, otherwise I add the blur coefficient to the total weight.

As a result, well... just look at the picture! :)

Here we can see that part of the tori is in focus, the farther part being a little more blurry.  As goes for the floor and celing.  We can also see that the back wall is nice and blurry and we dont' have any major color leaking between the closest torus and the far wall.

The Extras!!

Head Tracking, Varrier Mode & GeoWall Mode!

I got a hold of Bob's tracker code that can access the shared memory to retrieve head tracking data sent by TrackD, which I believe is feed with data coming from Javier's Neural Network based Head Tracking system.  That, in combination with Bob's new varrier combiner code, allowed me to port my project as a personal varrier application!  I had to figure out how the varrier combiner worked, and determined that I had to do some things a little different from Bob's example code.  In the case of creating the left eye and right eye views, I decided that I needed to add a function to perform pass three for both eyes, and pass two for both eyes before I called the code to combine the views using the vc_combine function.  I then determined that I could easily reuse the outputs of new function and render in geowall mode to a single large framebuffer.

Eye Tracking

After messing around with the eye tracker that is in the lab, I was able to eventually get the data being sent by the eye tracker to an outside program.  After calibration, the eye tracker system can output the Point of Reference (POR) over a serial port.  The POR is basically the screen coordinate the user is looking at.  I wrote code to read from the serial port and extract the values.  The big problem was that the eye tracker uses an infrared LED (IR LED) to help get the image it needs to process and return the desired information.  Now when using the eye tracker alone this is actually not much of a problem, but the Personal Varrier system tracks the user's head using a large number of infrared LEDs, which surround the perimeter of the large 30" Apple Display being used for it.  When looking at the screen, the LEDs used by the head tracker over load the image, especially if you are wearing glasses that reflect the incoming infrared light, such as mine :)  This basically makes eye tracking not reliable when the two are put together, at least at this point.

The other problem with the eye tracker is if you need to scratch your head and cause the eye tracking unit to move, then you will lose calibration and your POR values can be very very wrong.  This is because the eye tracking unit is something you wear on your head and has a piece of material that is used to reflect the light coming off of the unit's single IR LED.  If you move the unit, then the reflecting material could be pointing to your eye brow or your cheek, instead of your eyes.  This is actually bound to happen as that unit seems to slide down my face over time when I wear it.

Had this seemed to be a practical setup and was doable then we could have had some interesting matrix transformations to get the vector the eye was making into the virtual world, and modify our pass one to create an off-axis projection  to draw the area the user was looking at.  With that information we could really modify the focal distance based on what the user was looking with their own eyes, and not what was in the center of the screen.  I believe this is still doable, and maybe something that I will look into over the summer.

Downloading, Building & Running

You can download the source code here: dof.tar.gz

To build you need to have SDL setup, as well as libjpeg and libpng available.  Modify src/makefile as needed.  The application will compile on the Mac, but as stated before, it will not run properly.  After making the necessary adjustments just type 'make'

Here are the following usages of the application

Run with default parameters:           ./dof
Run with specified screen dimensions: ./dof <width> <height>
Run in GeoWall mode: ./dof <width> <height> -geowall
NOTE: width is the width for one viewport of the eye
Run in Personal Varrier mode: ./dof <width> <height> -varrier
NOTE: The width and height parameters for the Personal Varrier in the lab is 2560 x 1600

To move around the standard W,A,S,D control scheme is used, with the mouse to look around.  To increase or decrease the focal range, use the 'j' and 'l' keys respectively.  For adjusting the maximum circle of confusion use the '[' and ']' keys.  One important thing to note, if you want everything to be sharp, adjust the maximum circle of confusion to the point where the maximum circle of confusion is about 0.5.  Then the same texel will be sampled repeatedly and then used.  To toggle on/off the views of pass one and two press the '1' and '2' keys.

Questions/Comments send mail to Arun Rao