Computing the CAVE Projection Transformation

Dave Pape, pape@evl.uic.edu

This report describes the steps involved in computing a geometrically accurate projection transformation for a CAVE, ImmersaDesk, or IWall VR display (hereafter referred to simply as a CAVE). The CAVE projection assumes a fixed rectangular display screen that can be at any arbitrary position in space, with the viewer's eye able to move anywhere in front of the screen; see figure 1. To compute the projection, we are given the positions of the corners of the screen - LL (lower left), UL (upper left), and LR (lower right) - and the current tracked position of the viewer's eye. (Note: since the screen is known to be a rectangle, only three corner positions are needed.)

Figure 1. CAVE screen and eye-point in CAVE space

As the eye-point can be anywhere relative to the screen, the viewing volume is, in general, an off-axis frustum. In OpenGL, we create the projection matrix using glFrustum(). The glFrustum() matrix assumes an eye-point at the origin, looking down the negative Z axis, with the projection plane parallel to the X-Y plane (figure 2); hence, to complete the projection, we must also compute a matrix which transforms the screen and eye-point from the situation of figure 1 to that of figure 2. This second transformation is loaded as the View matrix in OpenGL.

Figure 2. glFrustum() off-axis viewing volume (screen space)

The "real-world" coordinate system of figure 1 will be referred to as CAVE space; the projection coordinate system of figure 2 will be referred to as screen space.

Screen-space axes and transformation

First, given the positions of the screen corners, we compute the coordinate axes of screen-space (X_s, Y_s, Z_s), in CAVE coordinates. The X axis corresponds to the horizontal edge of the screen, pointing to the right. The Y axis is the vertical edge of the screen, pointing up. The Z axis is then the cross product of X and Y. We also compute the width and height of the screen at this time, as these values will be needed in determining the frustum.

	right = LR - LL

	width = || right ||

	X_s = right / width

	up = UL - LL

	height = || up ||

	Y_s = up / height

	Z_s = X_s x Y_s

Given these axes, we can compute the rotation portion of the view matrix for the CAVE-to-screen-space transformation. Since (X_s,Y_s,Z_s) are the screen-space coordinate axes in CAVE-space, they define a transformation from screen- to CAVE-space:

	| X_s[0]  Y_s[0]  Z_s[0] |
	| X_s[1]  Y_s[1]  Z_s[1] |
	| X_s[2]  Y_s[2]  Z_s[2] |

The desired transformation is then just the inverse of this matrix.

		| X_s[0]  Y_s[0]  Z_s[0] | ^-1
	RotMat=	| X_s[1]  Y_s[1]  Z_s[1] |
		| X_s[2]  Y_s[2]  Z_s[2] |

Off-axis frustum

To create the projection matrix with glFrustum(), we must determine the distances to the six clipping planes - left, right, bottom, top, near, and far. In the CAVE system, the near and far clipping distances are defined by the application program; the remaining values are calculated from the tracked eye-point and the previously computed screen coordinate axes. We first calculate the values as absolute distances on the plane of the screen itself (L, R, B, T); they will then be scaled appropriately for glFrustum().

As shown in figure 2, the value of L is the distance from the eye-point to the left edge of the screen, along the X_s axis. Similarly, B is the distance from the eye-point to the bottom edge of the screen, along the Y_s axis. Therefore, these values can be computed by taking the dot product of the screen axes with the eye position relative to the lower-left screen corner. R and T can then be computed using the width and height of the screen.

	eye_s = eye - LL

	L = eye_s  X_s

	R = width - L

	B = eye_s  Y_s

	T = height - B

The left/right/bottom/top arguments for glFrustum() must define the corners of the near clipping plane. Using similar triangles, we compute these values from L/R/B/T, scaling them by the ratio of the near clipping distance to the distance between the eye-point and the screen:

	distance = eye_s  Z_s

	left =   -L * near / distance

	right =   R * near / distance

	bottom = -B * near / distance

	top =     T * near / distance

View transformation

The view matrix transformation, which transforms from the CAVE-space of figure 1 to the screen-space of figure 2, consists of two parts - a rotation to orient the projection plane parallel to the X/Y plane, and a translation to put the eye-point at the origin. The rotation part was compute above, from the screen-space coordinate axes. The translation is merely the negation of the eye position (in CAVE space):

	ViewMat = translate(-eye[0],-eye[1],-eye[2]) * RotMat