Tiled multi-projector displays on curved screens (e.g. cylindrical or spherical screens) are becoming more popular for visualization, education, entertainment, training and simulation applications. Their appeal lies in the greater sense of immersion and presence they can create, and at times, the superior aesthetics they provide. Displays are tools used by these application users who are not expected to be experts in setting them up or maintaining them. Unfortunately, most registration algorithms designed for curved displays expect them to be one. Registering multiple projectors on such a display has been a challenge, primarily due to the fact that recovering the 3D shape of the display quickly almost always requires attaching fiducials (physical markers) on the display screen for providing robust correspondence between the screen and the camera, which is especially obtrusive. Using structured light patterns to achieve the same results is a time consuming process. Finally, both these methods are complex requiring a complex camera calibration, all of which are too difficult for a layman user to execute in a successful manner.

We seek a simple procedure to register multiple projectors on a curved display that can be used even by a layman user like a doctor in a medical facility, teacher in a school or a worker in a theme park. We observe that most of the time, geometrically simple surfaces, like partial cylinders (e.g. pillars or surround immersive environments), are used as the display screen. So we impose two simple priors on the screen. First, the screen is a vertically extruded surface - a surface made by sweeping a 2D curve, called the profile curve, along a direction perpendicular to it. This covers a large number of shapes that can be built by soft folding of a rectangular sheet in one direction (Figure 2) − a cylinder is an important special case. Second, we assume the aspect ratio of the planar rectangle formed by the four corners of the extruded surface is known. Such a measurement is easy to provide, even for a layman user. Having these priors allows us to prevent the use of any markers on the display screen and still recover the shape of the display using a single image from an uncalibrated camera. This allows easy set-up and maintenance of such multi-projector displays by the user, even in the face of changes in the display surface or projector configurations and severe non-linearities.

### 1.1 Main Contributions

In this paper we present a new efficient algorithm to register images from multiple projectors on a vertically extruded surface. Using the priors of an extruded shape and the known aspect ratio, we use a single image of the display surface from an *uncalibrated* camera to recover both the camera parameters and the 3D shape of the surface. The display surface is then arc length parameterized in both dimensions. Then we capture a few images of patterns from the projectors to relate the projector coordinates with the display surface points, and represent this relationship using a rational Bezier patch. This relationship is then used to segment the appropriate parts of the image for each projector to register them and create a seamlessly wall-papered projection on the display screen.

To the best of our knowledge, this is the first work that can achieve the following many desirable qualities of geometric registration on these non-planar surfaces. All prior work can only address one or a few of these desired qualities, however our work addresses all simultaneously for the first time.

*Markerless:* Using some simple priors on the display surface, we can register images from multiple projectors on a vertically extruded screen without using any correspondence between the 3D display and the camera.

*Uncalibrated Camera:* We show that with some simplifying assumptions on the intrinsic parameters of the camera, we can achieve this registration using an uncalibrated camera.

*View Independent:* Unlike registering with respect to a camera where the registration is correct from a single sweet spot, we paste the image like a wallpaper on the display. A wallpapered image does not look perspectively correct from any single viewpoint. Yet, human observers can easily correct for the existing distortions irrespective of their viewpoint since we are used to seeing wallpapered images commonly. Hence, by wallpapering we assure that multiple viewers can observe our display at the same time, making our method view independent.

*Allowing Low-Resolution Sensor:* Since we use a rational Bezier patch to relate the projector to the display parameters, we can achieve a good fit even if we sample the function sparsely. As a result, we can use a relatively low-resolution camera (e.g. VGA camera) to register a much higher resolution display.

*Allowing Non-Linearities:* Further, since our registration depends on a 2D parametrization of the display generated from the recovered 3D surface rather than auto-calibrating projectors on the 3D surface itself, we can handle severe non-linearities in the projectors (like radial distortion). Thus, we can allow a compact set-up with inexpensive short-throw lenses mounted on the projectors that usually have non-linear distortions. Non-linear distortions have been addressed when using planar displays [4]. When using non-planar displays, non-linear lens distortion has been addressed in a limited manner by using a high-resolution camera to sample the function relating the projector to the display parameters densely [18], [7]. However, we can correct such distortions even with a low-resolution camera using a sparse sampling of the function relating the projector and display parameters.

*Allowing Display Imprecision:* The 2D parametrization additionally assures that a moderate deviation of the screen from being a perfectly extruded surface will not affect the accuracy of the geometric registration. Thus, we can handle manufacturing imprecision in the vertically extruded display surface.

*Accuracy:* Our method assures subpixel accuracy even in the presence of projector non-linearities.

*Efficiency:* Finally, our method can be run in real-time on the GPU making it ideal for interactive video applications.

Camera-based geometric registration of multi-projector displays can be either view-dependent or view-independent. View-dependent registration yields an image on the display that is correct from only one sweet view-point, usually the view of the camera. Deviation of the viewer from this location shows view-dependent distortions. Hence, view-dependent registration is usually appropriate for static single-user applications. On the other hand, view-independent registration pastes or wall-papers the images on the display surface. Since wall-papering is a common way to accommodate multiple viewers, such registration can cater to more than one viewer easily. Such a registration not only requires registering the projectors in a common camera frame but also the (conformal) parameterization of the shape of the display surface.

There has been a large amount of work on registering images on planar multi-projector displays in a view-independent fashion using linear homographies enabled by the planar screen [11, 5, 14, 10, 18, 19, 9, 3]. Such registration can be achieved in the presence of projector non-linearities using rational Bezier patches [4].

View-dependent registration on a non-planar display has been achieved by using special 3D fiducials and a large number of structured light patterns for a complete device (camera and projector) calibration and 3D reconstruction of the display surfaces, which are then used to achieve the registration [13]. Aliaga et al. in [2], [1] use a similar 3D reconstruction method to achieve a similar registration on complex 3D shapes, but without using any physical fiducials. To constrain the system sufficiently, this method uses completely superimposed projectors and validates results from photometric and geometric stereo, resulting in a self-calibrating system. Raskar et al. in [12] use a stereo camera pair to reconstruct special non-planar surfaces called quadric surfaces (spheres, cylinders, ellipsoids and paraboloids) and propose conformal mapping and quadric transfer to minimize pixel stretching of the projected images in a view-dependent registration.

More recently, there has been work on view-independent registration for the special case of a cylindrical surface rather than a general non-planar surface [7], [17]. Using the fact that cylindrical surfaces are developable, they have achieved a 'wall-paper' registration on such surfaces. However, these methods do not recover the shape of the surface in 3D, but attempt to find its 2D parametrization in the camera space. Hence, they need precise correspondences between the physical display and the observing camera. To achieve this, a precisely calibrated physical pattern is pasted along the top and bottom curves of the cylinder. Using these correspondences, a piecewise linear 2D parametrization of the display is computed and linked to a piecewise linear representation of the projector coordinates via the camera that observes both. This allows segmenting the appropriate parts of the image for each projector using linear/non-linear interpolations to create a wall-papered display. However, to avoid fiducials at a high spatial density, the physical pattern only samples the rims of the display. This insufficient sampling results in distortions or stretching, especially towards the middle of the display surface.

### 2.1 Comparison of Our Method

Unlike earlier methods for view-independent registration of cylindrical displays that assume a piecewise linear representation of the surface to parametrize it in the 2D camera space [7], [17], we recover the 3D geometry of the display. Hence, we can parametrize the display directly in 3D rather than in the camera image space, resulting in a geometric registration of the projected imagery without any stretching or distortions. Use of a perspective projection invariant function, e.g. a rational Bezier function, for interpolation instead of a simple linear interpolation allows us to maintain registration in the presence of severe projector distortions and considerable imprecision in manufacturing of the extruded surface. Further, as shown in [4], unlike a piecewise linear function, a rational Bezier function can be interpolated accurately even from a sparse set of samples. This allows our method to use a low resolution camera while registering a much higher resolution display. Unlike earlier methods for non-planar displays that recover the 3D shape using complex stereo or structured light based procedures [13, 12, 2, 1], we simplify the process using a single image from a single camera position by imposing the prior that the surface is vertically extruded. Finally, we avoid calibrating the camera to recover the 3D shape by using some simplifying assumptions on the intrinsic parameters of the camera and the aspect ratio of the display surface that is provided by the user.

Let the display surface, the image planes of the camera and the projector be parametrized by (*s* *t*), (*u* *v*) and (*x* *y*) respectively. We denote the 3D coordinates of the point at (*s* *t*) in the display by (*X*(*s* *t*) *Y* (*s* *t*) *Z*(*s* *t*)). Since the display is a vertically extruded surface, the four corners of the display lie on a planar rectangle, whose aspect ratio *a* is known. We define the world 3D coordinate with *Z* axis perpendicular to this plane and *X* and *Y* defined as the two orthogonal basis of this planar rectangle. We also consider this planar rectangle to be at *Z* = 0. Considering these 3D coordinates, the top and bottom curves of the surface lie respectively on *Y* = 1 and *Y* = 0 planes in 3D. Hence *Y* (*s*, 0)= 0 and *Y* (*s*,1)= 1. Further, these two curves are identical except for a translation in the Y direction. Therefore,∀ *s*, (*X*(*s*, 0 *Z*(*s*, 0)) = (*X*(*s*,1 *Z*(*s*, 1)). This is illustrated in Figure 4. We assume that our camera is a linear device without any radial distortion. Note that a distorted camera will still provide good registration but the wallpapering will be imperfect. Limitations that will be imposed by this assumption are discussed in further detail in Section 5.3. However, our projectors need not be linear devices.

A view-independent geometric registration essentially requires us to define a function from (*x* *y*) projector coordinates to the (*s* *t*) display coordinates. Our method follows three steps to achieve this (Figure 3). First we use a single image of the display from the uncalibrated camera and the known aspect ratio of the display to *recover the camera properties* (intrinsic and extrinsic parameters) using a non-linear optimization. Using the estimated camera parameters, we next *recover the 3D shape of the display*. Then, we use the profile curves of the vertically extruded surface to define 2D parametrization of the display surface based on the arc length of the profile curves flanking the display. After calibrating the camera and reconstructing the display, in the next phase, we capture an image of a blob-based pattern from each projector and use these to find samples of the mapping from the projector (*x* *y*) to the display (*s* *t*). Then we approximate this mapping from these samples by fitting a rational Bezier patch to the correspondences. Assuming that an image pasted on the display results in the image coordinates being identical to the display coordinates (*s* *t*), this automatically achieves the *geometric registration* by defining the part of the image to be projected by each projector so that the resulting display is seamlessly wallpapered. Each of the above four steps are described in detail in the following sections.

### 3.1 Recovering Camera Properties

In this step, we use a single image of the display surface (Figure 5) to recover the intrinsic and extrinsic parameters of the observing un-calibrated camera using a non-linear optimization. A large number of image formats like jpg or tiff store EXIF tags for images which provide some of the camera parameters used during the capture. One of these is the focal length of the camera, the critical component for the intrinsic parameter matrix of the camera. As in [16], we use this focal length to initialize the intrinsic parameter matrix in our non-linear optimization. To convert the focal length to the unit of pixels, we divide resolution of the camera by the CCD sensor size and multiply it with the focal length specified in the EXIF tags. The sensor size of the camera is available in its specifications.

In most cameras today, it is common to have the principal center at the center of the image, no skew between the image axes and square pixels. Using these assumptions, we express the intrinsic parameter matrix of a camera *K*_{c}, as
TeX Source
$$K_c = \left({\matrix{f 0 0 \cr 0 f 0 \cr 0 0 1 \cr } } \right)$$
The camera calibration matrix that relates the 3D coordinates with the 2D camera image coordinates (*u* *v*) is given by *M* = *K*_{c}[*R|RT*] where *R* and *T* are the rotation and translation of the camera with respect to the world coordinate system. In this step, we use the initial estimate of *f* and the aspect ratio *a* as input and use a non-linear optimization to estimate seven parameters of the camera calibration matrix − these include the focal length *f*, the three rotations that comprise *R* and the three coordinates of the center of projection of the camera *T*.

Our non-linear optimization has two phases. In the first phase *plane based optimization* (Section 3.1.1), the seven camera parameters are estimated using just the projection of the corners of the display surface on the camera image. These estimates are used to initialize the *extrusion based optimization* (Section 3.1.2) with a more expensive error function to refine the camera parameters.

#### 3.1.1 Plane Based Optimization

We estimate the seven parameters in this step based on the image of the plane formed by the four corners of the screen whose 3D coordinates are given by: and Consequently, the (*u* *v*) coordinates in the camera of any 3D point (*X*(*s* *t*) *Y* (*s* *t* *Z*(*s* *t*)) on the display are given by
TeX Source
$${\rm{(}}uw{\rm{, }}vw{\rm{, }}w{\rm{)}}^T {\rm{ = }}M{\rm{(}}X{\rm{(}}s{\rm{, }}t{\rm{),}}Y{\rm{ (}}s{\rm{, }}t{\rm{),}}Z{\rm{(}}s{\rm{, }}t{\rm{),1)}}^T $$
where (*uw* *vw* *w*)^{T} are the 3D homogeneous coordinates corresponding to the camera coordinate (*u* *v*) and *M* is the 3 *×* 4 camera calibration matrix defined by the seven camera parameters. We estimate the seven camera parameters in this step by using a non-linear optimization method that minimizes the reprojection error *E*_{r}, (i.e. the sum of the distances of the projection of these 3D corners on the camera image plane from the detected corners). We initialize the angle of rotations about the X, Y and Z axes that comprise *R* to be zero and *T* to be roughly at the center of the planar rectangle formed by the four corners of the display at a depth of a similar order of magnitude as the size of the display i.e. (0, 0 *a*).

#### 3.1.2 Extrusion Based Optimization

The seven estimated camera parameters in the plane based optimization are used to initialize the extrusion based optimization that attempts to refine these parameters further. This also uses a non-linear optimization method that minimizes the error *E* = *w*_{r} E_{r} + *w*_{c} E_{c}, where *E*_{r} is the error function from the plane based optimization step, and *E*_{c} is an error function based on the reprojection error in the similarity of the flanking curves of the display as described next, and *w*_{r} and *w*_{c} are the weights to combine them.

The vertically extruded display surface is constrained by the fact that the points on the top curve of the vertically extruded surface when translated by *Y* = *−*1 should lie on the bottom curve. We use the deviation from this constraint to define *E*_{c}. Let the image of the top and bottom boundaries of the vertically extruded display in the camera be *I*_{t} and *I*_{b} respectively. We first use image processing to segment the image and sample the curves *I*_{t} and *I*_{b}. We fit a parametric curve to the samples on *I*_{b}. Let us denote it with *ℬ*.We use the current estimate of *M* to reproject *I*_{t} in 3D. This is achieved by ray casting through the sampled points on *I*_{t} and intersecting it with *Y* = 1 plane. The 3D curve thus obtained is *B*_{t}. Then we translate the samples on *B*_{t} along *Y* direction by 1 to get the samples on the 3D bottom curve *B*_{b}. Then we project these samples back on to the camera using *M*, denoted by *M*(*B*_{b}). Sum of the square of the distances of these samples from the curve *ℬ* provides the reprojection error of the estimated bottom curve from the detected bottom curve. In case of perfect estimation, this error should be zero. Hence, we seek to minimize *E*_{c} in addition to *E*_{r}.

To solve both the plane and extrusion based optimizations, we use standard gradient descent methods. To assure faster convergence we (a) apply a pre-conditioning to the variables so that the range of values that can be assigned to them is normalized; and (b) use decaying step size.

### 3.2 Recovering 3D Display Parameters

After convergence of the optimization process, we use the estimated *M* to reproject samples on *I*_{t} and *I*_{b} in 3D and intersect them with *Y* = 1 and *Y* = 0 planes to find *B*_{t} and *B*_{b} respectively. Due to accumulated errors *B*_{t} and *B*_{b} may not be identical. So, we translate both the curves on *Y* = 0 plane and find their average to define *B*_{b}. This is then translated to *Y* = 1 to define *B*_{t}. This assures that both *B*_{t} and *B*_{b} are identical except for a translation along Y. We use a polynomial curve fitting to find a parametric representation of *B*_{t} and *B*_{b}.

Next, we seek a 2D parametrization of the display *D* with (*s* *t*). The profile curve *B*_{b} on the XZ plane is arc length parametrized using the parameter *s*. Considering the 3D point (*X* *Y* *Z*) on the display surface *X* = *X*(*s* *t*) = *X*(*s*) and *Z* = *Z*(*s* *t*)= *Z*(*s*). Since extrusion is along the Y direction *Y* = *Y* (*s* *t*)= *t*. Using the vertical extrusion assumption we can conclude that *X* and *Z* are independent of *t* and *Y* is independent of *s*.

### 3.3 Geometric Registration

Geometric registration entails defining, for each projector, a function that maps the projector coordinates (*x* *y*) to the display coordinates (*s* *t*) via the camera coordinates (*u* *v*). Mathematically,
TeX Source
$${\rm{(}}s{\rm{,}}t{\rm{) = }}M_D {\rm{ }}_ \leftarrow {\rm{ }}_C {\rm{(}}M_{C{\rm{ }} \leftarrow {\rm{ }}P} {\rm{(}}x{\rm{,y)) }}$$
where *M*_{C ← P} maps the (*x* *y*) to (*u* *v*) and *M*_{D ← C} maps (*u* *v*) to (*s* *t*). As in [4], we use a rational Bezier patch to define *M*_{C ← P}. To find *M*_{C ← P} we project a number of blobs and use the camera to capture them (Figure 6). The center of these blobs are known in the projector coordinate space (*x* *y*). When these centers are detected in the camera space (*u* *v*), they provide direct correspondences between (*x* *y*) and (*u* *v*). We fit a rational Bezier patch using a non-linear least squares fitting solved efficiently by the Levenberg-Marquardt gradient descent optimization technique. To compute *M*_{D ← C}, we do the following. For every mapped (*u* *v*) coordinate in the camera, we cast a ray through this point and find the point of intersection with the recovered 3D display. Then we find the 2D parameter corresponding to this 3D point.

Using a rational Bezier for representing *M*_{C ← P} provides two important capabilities to our algorithm, as in [4]. First, we can achieve accurate registration in the face of severe non-linear distortions like lens distortion (barrel, pin-cushion, tangetial and so on). Such distortions are common when using inexpensive short throw lenses on projectors to allow a compact setup. The rational Bezier in this case can represent the non-linearities both due to the curved nature of the display and projector non-linearities. Second, unlike previous method [18] that uses a piecewise linear function to represent *M*_{C ← P} and hence requires a dense sampling of the correspondences to estimate it, the rational Bezier can be estimated accurately even from a sparse sampling of the correspondences. This allows the use of the low resolution camera to calibrate a much higher resolution display. For example, we can achieve calibration on a 3000 *×* 1500 display using a VGA camera (640 *×*480). Though these two capabilities were demonstrated for planar displays in [4], we demonstrate them for the first time for a class of non-planar displays.

### 3.4 Implementation

We have implemented our method in MATLAB for two types of displays. First, we have used a large rigid cylindrical display - an extruded surface with a radius of about 14 feet and an angle of 90 degrees. Since a cylinder is an extruded surface, our method is applicable. We tiled eight projectors in a casually aligned 2 *×* 4 array to create the display. Second, in order to demonstrate the success of our method on a large number of vertically extruded shapes, we made a flexible display using a rectangular sheet of flexible white styrene. This was supported by five poles to which the styrene sheet was attached (Figures 9 and 10). The shape of the profile curve of this extruded display can be changed by simply changing the position of the poles. Thus, we can create a large number of extruded shapes. We use six projectors on this display in a casually aligned 2 *×* 3 array to create the tiled display. For all the setups, we use Epson 1825p projectors ($600). We show results by using two types of sensors: (a) a high-end high-resolution (13 Megapixel) Canon Rebel Xsi SLR camera ($800); and (b) a low-end low-resolution (0.25 Megapixel) Unibrain camera ($200). We achieve color seamlessness by using the constrained gamut morphing method presented in [15].

Figure 5 shows the single image used to recover the camera and display properties. To find the projector to camera correspondences, we display a rectangular grid of Gaussian blobs whose projector coordinates are known. These are then captured by the camera. We use a 2D stepping procedure where the user identifies the top-left blob and its immediate right and bottom neighbors in camera space. Following this, the method (a) estimates the rough position of the next blob in scan-line order, and (b) searches for the correct blob position using the nearest windowed center-of-mass technique [6]. If this is not possible for extreme projector/screen distortions, one can binary-encode the blobs and project them in a time sequential manner to recover the exact ids of the detected blobs and find the correspondences [13], [18] (Figure 6).

Our projectors have relatively large throw-ratios and hence do not reveal major lens distortions. To demonstrate the capability of our method to handle non-linearities, we chose to simulate the distortion digitally by distorting the input images to the projectors. Such distortions will be common when mounting inexpensive short throw lens on the projector to create a compact setup.

**Real time image correction using GPU:** The registration is done offline and takes about five minutes. This generates the rational Bezier patches, (*u* *v*)= *B*(*x* *y*), for each projector, which are then used for image correction. We have implemented a real-time image correction algorithm using modern GPUs through Chromium - an open-source distributed rendering engine for PC clusters [8]. A module for Chromium is written that first precomputes the coordinate-mappings of all pixels using the rational Bezier parameters. This per-pixel projector to screen lookup table is used by a fragment shader to map pixels from the projector coordinate space to the screen coordinate space during rendering.

Figures 1, 7 and 9 show the results of our method on different extruded surfaces including the most common case of a cylindrical surface. We demonstrate our results on particularly challenging contents like text, especially common for visualization applications, and show accurate geometric registration Figure 10 demonstrates that our method can handle severe projector non-linearities enabling mounting inexpensive short throw lens for compact set-up Figure 8 shows the two distortions we used in our experiments. Our supplementary video demonstrates the interactive rates which we achieve in all these renderings using our GPU implementation.

The degree of the rational Bezier used to achieve geometric registration depends on the amount of non-linearity present due to the curved screen and the distortions in the projectors. In our set-ups, we used a bicubic rational Bezier representation for the cylindrical surface. For our flexible display, we use a rational Bezier of degree 5 and 3 in horizontal and vertical directions respectively. With large projector distortions and larger curvature of the display, higher order rational Beziers will be more appropriate.

In Figure 11 we compare our method with three different methods. Since our work is the only work that can achieve a markerless and view-independent registration, probably the only fair comparison is with using a homography-based registration that assumes a piecewise planar display surface and uses a homography tree to register all the projectors [5]. However, in Figure 11 we also show comparisons with the view-dependent method presented in [18]. View-dependent registration defines a mapping from the projector coordinates (*x* *y*) to the camera coordinates (*u* *v*), as opposed to the display coordinates (*s* *t*) and Equation 3 becomes
TeX Source
$${\rm{(}}u{\rm{, }}v{\rm{) = (}}M_{C{\rm{ }} \leftarrow {\rm{ }}P} {\rm{(}}x{\rm{, }}y{\rm{))}}$$

Hence, the distortions of the camera (like the perspective projection) embeds itself in the registered display. Further, this method uses a piecewise linear mapping for *M*_{C ← P}(*x* *y*) that requires a much denser sampling of projector-camera correspondences compared to our method. Hence, in the face of severe distortion, even with an order of magnitude higher number of samples, it cannot achieve the accuracy of registration achieved by our method. Finally, the ability to reconstruct the rational Bezier patches from a sparse sampling of the function allows us to use a very lower resolution camera (e.g. 640x480 VGA camera) to accurately calibrate a much higher resolution display (e.g. 3500x1200 Figure 12 compares the geometric registration achieved using a high-resolution vs a low resolution camera.

In this section, we discuss the dependency of our method on various parameters like the camera position, precision in the display surface, and the degree of the rational Bezier.

### 5.1 Camera Placement

Our method achieves markerless registration on extruded surfaces using an uncalibrated camera. Even in the presence of the priors on the display surface, there is a set of camera positions that will lead to degeneracy for one or both phases of our non-linear optimization. Consider the plane based optimization stage where the goal is to find the focal length *f* and the extrinsic parameters. Let us assume the camera calibration matrix *C* to be
TeX Source
$$c = \left({\matrix{ f 0 0 \cr 0 f 0 \cr 0 0 1 \cr } } \right)\left({\matrix{{r_1 } {r_2 } {r_3 } {t_x } \cr {r_4 } {r_5 } {r_6 } {t_y } \cr {r_7 } {r_8 } {r_9 } {t_z } \cr } } \right)$$
Note that in the plane based optimization we are using four points that have *Z* = 0 and *Y* = 0 or *Y* = 1. Now, consider the case where *r*_{7} = *r*_{8} = 0 and *r*_{9} = 1. This is equivalent to placing the camera on the Z-axis with the normal to the image plane being parallel to the Z-axis. In this case, the homogeneous coordinates of the images of the four corners of the plane are given by (*f ×* (*ar*_{1} + *t*_{x} *f ×* (*ar*_{5} + *t*_{y} *t*_{z})^{T} when *Y* = 0 and (*f ×* (*ar*_{1} + *r*_{2} + *t*_{x} *f ×* (*ar*_{5} + *r*_{6} + *t*_{y} *t*_{z})^{ T} when *Y* = 1. Note that these points have a scale factor ambiguity, i.e. multiplying *t*_{z} and *f* with the same scale factor would result in the same image coordinates. Intuitively, if the camera is placed with the image plane parallel to the planar rectangle defined by the extruded surface on the Z-axis, moving the camera on the Z-axis can create the similar change as scaling its focal length and we cannot find a unique solution to the camera parameters. Hence, this camera placement should be avoided.

Second, let us consider the two 3D curves *B*_{t} and *B*_{b}, where *B*_{t} = *B*_{b} +(0, 1, 0). If the camera placement is such that the images of these two curves *I*_{t} and *I*_{b} respectively, are related by *I*_{t} = *I*_{b} +(0 *k*) where *k* is a translation in the vertical image direction, then the extrusion based optimization will be redundant. This camera placement occurs when the normal to the camera image plane lies on a plane parallel to the XZ plane i.e. is perpendicular to the Y-axis. Hence, this camera placement should also be avoided. Note that the former placement that resulted in the scale factor ambiguity is contained in this latter condition since Z-axis is on the XZ plane. Hence, as long as a camera placement where the normal to the image plane is parallel to the XZ plane is avoided, our optimization will yield an accurate solution.

### 5.2 Accuracy and Sensitivity

Our system estimates the camera and display parameters and makes assumptions on the type of the display surface. Hence, it is important to answer two questions: (a) How accurate are the estimated camera and display parameters in the non-linear optimization stage?; and (b) how sensitive is the geometric registration to the inaccuracies of these estimates or the priors imposed on the display surface? It is difficult to analyze all the above issues in real systems, hence we have conducted extensive analysis in simulation and real systems (whenever possible) to answer these questions.

First, we study the accuracy of the estimated camera extrinsic parameters following our non-linear optimization process. Our simulation of many different camera and display parameters shows that when an accurate intrinsic parameter matrix is given, our estimated extrinsic parameter matrix is very accurate. The error analysis of the deviation of the estimated parameters from the actual parameters is provided in Table 1. For the orientation of the camera, we provide the deviation in degrees from the actual orientation. For translation, we provide the ratio of the error in estimation with the distance from the screen. We also study the accuracy of the estimated 3D profile curves of the display in this situation. To compare the estimated curves with the actual ones, we first sample the estimated curves densely. Then, for each sample, we find the minimum distance to the original curve. The ratio of the maximum of these distances to the length of the original curve is considered to be the measure of the accuracy of the display geometric reconstruction and is reported in Table 1.

We analyzed the validity of our simplifying assumption for the camera intrinsic matrix by running some experiments. For each of our camera set-up, we used standard algorithms and toolboxes to accurately estimate the camera's intrinsic matrix [20]. The skew estimated by Zhang's method was always zero and the principal center deviated from the center of the image by a percentage error that is within the error tolerance of Zhang's method. These two confirm the validity of our use of a simpler intrinsic matrix. Further, we compared the estimated focal length from this method to the focal length estimated by our nonlinear optimization to analyze the accuracy of the estimated intrinsic camera parameters. We found that when provided with a good initial estimate as is available from the EXIF tags, the focal length estimated by our method is very close to that recovered by Zhang's method as indicated in Table 1.

We analyze the sensitivity of our registration to imprecision in the display surface or errors in the estimation of the display shape, both of which would result from a deviation of the real surface from a perfect extruded surface. However, our rational Bezier function provides a particularly robust framework for handling deviation from extruded surfaces. This is due to the fact that a small deviation from extrusion will lead to an erroneous 2D parametrization of the display surface, but the overlapping pixels from the multiple projectors will still map to the same (*s* *t*). Hence, an imprecision in the extrusion can create small image distortions but will not lead to any misregistration. This is one of the strengths of our algorithm and is well-demonstrated by our flexible display which shows considerable imprecision due to its make-shift flexible prototype nature, but almost no misregistration of the projected images is visible even on this display. We quantitatively evaluate the effect of deviation of a surface from an extruded surface on the accuracy of the estimated camera parameters in Figure 13. Deviation from extrusion is measured by the maximum difference of the top and bottom curves with respect to the curve length. This plot shows even in presence of large deviation of the screen from being an extruded surface our method can achieve a reasonable estimation of camera pose and focal length.

### 5.3 Camera Non-Linearity and Resolution

We assume the camera to be a linear device devoid of any non-linear distortion in Section 3. However, even if this is not true when using commodity cameras, our method will not result in any pixel misregistration since the camera non-linearity will be accounted for by the fitted rational Bezier patches. However, the camera non-linearity will affect the accuracy of the reconstruction of the 3D shape of the screen and hence, the final result may not be perfectly wall papered. Fortunately, human visual system can tolerate such minor deviation from wall papering. For verification, we performed our registration using an uncalibrated Unibrain Fire-i webcam with 640x480 resolution (one tenth of our display resolution) which had significant non-linear lens distortion (with quadratic coefficient of 0.01 and quartic coefficient of -0.009). We compare the achieved result with the ones achieved by our high resolution camera in Figure 12. Note that the deviation from wall-papering is hardly detectable and the registration is comparable. In case of more severe camera non-linearities one can use standard camera calibration techniques to undistort the captured images.

### 5.4 User Assistance

Our method needs to detect the four corners and the top and bottom curves of the extruded surface. Since the screen is usually the most distinct white object in the environment, segmenting it is relatively easy if the background is of reasonable contrast. Further, more often than not, a display environment is designed to have relatively diffused illumination, which does not affect the segmentation adversely. Even in the worst case of a low contrast between the screen and the background color, one can always use user interaction to improve the segmentation. All other steps of our method are completely automated as long as the projection area of the projectors are entirely within the screen and the screen is entirely within the camera's field-of-view.

In summary, we have presented the first work for markerless view-independent registration of tiled projection-based displays on extruded surfaces using an uncalibrated camera. We have shown that by imposing practical priors on the display surface, the registration technique can be simplified to be easily used by layman users. Our method provides a very user-friendly and cost-effective way to sustain such displays in large establishments like visualization centers, museums, theme-parks. Further, our method also offers the ability of recalibration and reconfiguration at a very short notice. These can be especially useful for applications like digital signage and aesthetic projections in malls, airports and other public places.

In the future, we would like to explore the similar concept of practical priors leading to easier registration for a different kind of widely-used non-planar surfaces, the domes. In the recent years, the number of domes have surpassed the number of IMAX theater installations (Figure 14). However, there still does not exist an easy way to calibrate these displays. Our goal is to extend our fundamental concept in this direction.

### Acknowledgments

We would like to thank Maxim Lazarov for helping in making the video for this paper and building the flexible display set-up used in this work. We would like to thank Epson and Canon for their generous donations of projectors and cameras used in this project. This research is funded by NSF SGER 0743117 and NSF CAREER IIS-0846144.