Online Projector Deblurring Using a Convolutional Neural Network

Projector deblurring is an important technology for dynamic projection mapping (PM), where the distance between a projector and a projection surface changes in time. However, conventional projector deblurring techniques do not support dynamic PM because they need to project calibration patterns to estimate the amount of defocus blur each time the surface moves. We present a deep neural network that can compensate for defocus blur in dynamic PM. The primary contribution of this paper is a unique network structure that consists of an extractor and a generator. The extractor explicitly estimates a defocus blur map and a luminance attenuation map. These maps are then injected into the middle layers of the generator network that computes the compensation image. We also propose a pseudo-projection technique for synthesizing physically plausible training data, considering the geometric misregistration that potentially happens in actual PM systems. We conducted simulation and actual PM experiments and confirmed that: (1) the proposed network structure is more suitable than a simple, more general structure for projector deblurring; (2) the network trained with the proposed pseudo-projection technique can compensate projection images for defocus blur artifacts in dynamic PM; and (3) the network supports the translation speed of the surface movement within a certain range that covers normal human motions.


Online Projector Deblurring Using a Convolutional Neural Network
Yuta Kageyama, Daisuke Iwai, Member, IEEE, and Kosuke Sato, Member, IEEE Frame: I 40 90 Frame:I 25 46 (a) Translationmovement (b) Rotationmovement (c) Experimental setup Fig. 1. The proposed projeclor deblurring technique compensates movie contents for defocus blur artifacts using a deep neural network for dynamic projection mapping. The left and middle images show the target images (top) as well as the projected results of the target images (middle) and of the compensation images (bottom) that were computed by the proposed network. The projection surface was (a) translated from far (in focus) to near (out of focus) or (b) rotated from 0 degrees (in focus) to 45 degrees (out of focus) along the yaw axis by using a robotic arm shown in the right image. We observed that the details of the target images were preserved in the projected results of the proposed technique but were missing in the projected results of the direct projection of the target images.
Abstract-Projector deblurring is an important technology for dynamic projection mapping (PM), where the distance between a projector and a projection surface changes in time. However, conventional projector deblurring techniques do not support dynamic PM because they need to project calibration patterns to estimate the amount of defocus blur each time the surface moves. We present a deep neural network that can compensate for defocus blur in dynamic PM. The primary contribution of this paper is a unique network structure that consists of an extractor and a generator. The extractor explicitly estimates a defocus blur map and a luminance attenuation map. These maps are then injected into the middle layers of the generator network that computes the compensation image. We also propose a pseudo-projection technique for synthesizing physically plausible training data, considering the geometric misregistration that potentially happens in actual PM systems. We conducted simulation and actual PM experiments and confirmed that: (1) the proposed network structure is more suitable than a simple, more general structure for projector deblurring; (2) the network trained with the proposed pseudo-projection technique can compensate projection images for defocus blur artifacts in dynamic PM; and (3) the network supports the translation speed of the surface movement within a certain range that covers normal human motions.
This paperproposesa projectordeblurringtechniquethat doesnot requireprojectionof the calibrationpatternevenwhen the projection surfacemoves.As the first attempttowardsthe projectordeblurringin dynamicPM, we startwith a simpleassumptionthat the surfaceis uniformly white andcompletelydiffuse. The key insight exploitedin the techniqueis that the geometricrelationshipbetweenthe projectorand the projectionsurfacedoesnot vary significantly within a videoframe (i.e., 1/60 secin mostcurrentvideo projectors).Therefore,our method computesthe compensation imageof the currentframe using the projectedresultof the previousframecapturedwith a camera.Specifically, we applieda deepconvolutionalneuralnetwork(CNN) to generatethe compensation image. As the prime contributionof this research, we devisedan effective networkstructurefor projectordeblurring,which hastwo parts: an extractoranda generator.In eachframe, the extractor, which consistsof two subnetworks, takesa pair of the projectionimage of the previousframe and its projectedresultas the input. The first subnetworkestimatesa defocusblur map that represents how much each projectorpixel is defocusedon the surface. The secondsubnetwork estimatesa luminanceattenuationmap that representsthe degreeof reductionof the capturedluminanceof the projectedresultcompared to that of the targetluminancedue to the inversesquarelaw of light intensity. Thesetwo mapsare theninjectedinto the middle layersof the generatornetwork, which takesthe original targetimageof the currentframe as the input andcomputesthe compensation imageto be projectedat the currentframe. We alsoproposeto synthesizephysically plausibletraining datato avoid laboriousandtime-consumingprojection datacollection. In particular, our pseudo-projection technique generates the projectedresultsby simulatingthe defocusblur basedon the thin-lensmodelandby simulatingthe luminancereductionbased on the inverse-square law with respectto the depth.Consideringactual PM scenarioswhere the geometricregistrationof the projectorand the camerais potentially inaccurate, we alsoincorporatedwarpingof the generated imageinto our pseudo-projection framework. Through a simulation-based comparison,we show that the proposednetwork structureis more suitablefor compensating dynamicprojectioncontentsfor defocusblur thana simple,single-networkstructure.Using a physicalprojector-camera system,we demonstrate that our projector deblurringtechniquecancompensate for the defocusblur in an actual dynamicPM scenario.
To summarize,our primary contributionsfrom this study are as follows: • We introducea CNN-basedprojectordeblurringtechniquethat generates a compensation imagefor defocusblur artifactsin the projectedresult in a dynamic PM scenariowithout requiring offiine PSFestimation; • We find that the combinationof extractor subnetworksand a generatorsubnetworkis the effective structurefor projectordeblurring, wherethe outputsof the extractor(thedefocusblur map and the luminanceattenuationmaps)are incorporatedinto the middle layersof the generator; • We designa pseudo-projection framework to synthesizephysically plausibletraining data,consideringinaccurategeometric registrationbetweenthe projectorand the camera;and • We demonstrate theprojectordeblurring achievedby the proposed systemthrougha physicaldynamicPM experiment.

RELATED STUDIES
Thereare two major researchtopics relatedto this study: projector deblurringanddeeplearningfor radiometriccompensation in PM. In this section,we introducepreviousstudieson thesetopicsandstateour contributionscomparedto them.
These maps represent the degree of defocus blur and luminance attenuation per pixel, respectively. The generator's subnetwork generates the compensated projection image Ip(t) from the target image I,(t) by incorporating the maps into the middle layers of the network.

Our contribution
Theprime contributionof this paperis the realizationof a projectordeblurring techniquein a single-projector approachthatworks in dynamic PM. Basedon an observationthat the depthof the projectionsurface doesnot significantly changewithin a video frame (Le., 1/60 sec),our techniquesynthesizes the compensation imageusing the projectedinformation in the previousframe. We designeda DNN structurethat synthesizes a compensation imagefrom the targetimageof the current frameas well as from the projectionimageof the previousframeand its projectedresult. In particular,we found that the combinationof two networkmodules(an extractoranda generator)hasbettercompensation performancethan a simplesingle-networkstructure.We also proposea pseudo-projection techniquefor synthesizinga projectedresultfrom a projectionimageanda depthmapof a surfacein orderto train the network. We showthat the network weightstrainedby the synthesized dataare usefulfor projectordeblurringin physicalsetups.

PROJECTOR DEBLURRING NETWORK
We proposea DNN that synthesizes a projectionimageto compensate for defocusblur evenin dynamicPM scenarios. This sectiondescribes our network and our loss function, which are designedto minimize the differencebetweena targetimageand the projectedresultof the compensation image.

Overview
We assumethat a projectionsurfaceis (I) uniformly white andcompletely diffuse, (2) observedwith a camera,and(3) within the camera's DoFs. Although various optical phenomena can degradethe image quality of a projectedresult, we found from our preliminary investigation that two of theseoptical phenomena are dominantfactors in the assumedsituation.The first optical phenomenon is the projector's defocusblur, which attenuates the high-spatial-frequency components of the projectedresultaccordingto the distancefrom the focal plane. The secondoptical phenomenon is the attenuationof the luminanceof the capturedprojectedresultaccordingto the distanceof the surface from the camera(Le., accordingto the inversesquarelaw of light). We designedour networkto mitigatethe imagequality degradationcaused by the defocusblur in the projectedresultwithout sufferingfrom the luminanceattenuationartifacts.
Figure2 showsthe whole structureof our proposedprojectordeblurring network. In our preliminary investigation,we observedthat the projectionsurfacedid not significantly movewithin eachvideo frame in mostof the dynamicPM scenarios.Thus, we estimatedthe extentof the occurrence of the defocusblur andthe luminanceattenuationof the projectedresult in the previousframe, andwe usedthatinformationto generatethe compensation imagefor the currentframe. We applieda networkstructurewith two parts,an extractoranda generator, rather than a single network to explicitly estimatethe defocusblur and the luminanceattenuation.Specifically,the extractorhadtwo subnetworks, oneof which estimatesthe amountof defocusblur, and the other,of luminanceattenuationfor eachprojectorpixel. The estimateddefocus blur map and the luminanceattenuationmap werethen injectedinto the generatorsubnetwork,which synthesized the projectionimagethat compensated for the imagedegradation.We explicitly separated the networkstructureso that the textureof the previousframe would not affect the compensation imagein the currentframe.
wherest e ,(i,j) is a loss function that usesthe fl 1 norm of the differencesbetweeni and j. In addition,we appliedthe total variation [40] loss stTV (i, j) for the regularizationof the estimatedmaps. AI is a coefficientthat balancesthe two functions.
We trainedDefocusNet and LuminanceNet to accuratelyestimate the amountof defocusblur andluminanceattenuationin the projected resultfor eachprojectorpixel. Supposethat the groundtruth mapof the PSFvariancesof the projectedpixels andof their distancesfrom the cameraareMel andM" respectively.Then,we usedthe following st d and!C/ as the loss functions in the training of DefocusNetand LuminanceNet, respectively: Note that we designedour networkto compensate for defocusblur causedby spatially varying PSFs,and it properly worked even in a casewhereintwo consecutivevideo frameswere not similar (e.g., a scenechangeoccurredbetweenthem). Without lossof generality,we assumedthat the projectorand the camerasharedthe samefield of view (FoY). This assumptionallowedus to modelthe projector'sPSF without having to considerits distortionon a freeform or tilted surface dueto the differentperspectives of the two devices.We achievedthe FoY sharingin an actualprojector-camera setupby applyinga beamsplitter or by geometricallytransformingthe capturedimageusingthe poserelationshipbetweenthem.
We trained CompensationNet to generatea compensationimage whoseprojectedresultresemblesthe targetimagefor humanobservers. Therefore,thefollowing function can be consideredthe loss in the training:

Training strategy
A straightforward(or naive) methodof training the proposednetwork is to updateall the weights in the network by ftowing the data depictedin Fig. 2. Specifically,the estimatedmapsfrom DefocusNetand LuminanceNet (i.e., M~andMf) aredirectly injectedinto Compensa-tionNet in the training. However,we expectthis methodto be unstable andthe weightsnot to convergein a reasonable time frame. In the early stageof the training, DefocusNetand LuminanceNet did not output correctmaps. Thus, updatingthe weights of CompensationNet did not makemuch sense.Therefore,we appliedanothermethodthat separatelytrainedthethree networks. Specifically, we usedthe ground truth of the defocusblur mapMel and the luminanceattenuationmap M, to train CompensationNet insteadof the estimatedmaps,M~and M! (Fig. 3).

DATASET SYNTHESIS
To train the proposednetwork,we hadto preparea large setof target imagesIf, the groundtruth of the defocusblur mapsMd, and that of the luminanceattenuationmapsMI' In addition,the training required the projectedresults of the projection imagesI p . However, it was impracticalto obtainthem usingactualPM andcapturingsetupswith a large numberof projection surfacesof various shapes.Therefore, we synthesizedthe datasetfrom a set of targetimagesand a set of depth imagesthat represented the shapesof the projection surfaces. We performedthe datasetsynthesisin a virtual spacewith a virtual projectoranda virtual camera.Basedon the assumptiondescribedin Sect.3.1, the virtual projectorandthe virtual camerahadthe sameFoY.

Computational model of projector blur
As shownin Fig. 4, basedon a thin-lensmodel,a light point emitted from the imaging plane of a projector is observedas a circle on a projectionsurfacelocatedv awayfrom the projectorlens. The diameter of the blur circle b canbe computedusinggeometricalsimilarity as:

Synthesis of defocus blur map and luminance attenuation map
We generatedthe defocusblur mapM d from the depthmapof a projection surfacethat represented the distancefrom the virtual projector to the surfaceat eachprojectorpixel ( Fig. 5(a». Specifically,basedon

Pseudo-projection for synthesis of projected results
We developeda pseudo-projection techniqueto synthesizethe projected results,consideringboththe defocusblur andthe luminanceattenuation. SupposethatI p is a projectionimage;then the defocusblurred image Id canbe computedusing Eq. II as:

EXPERIMENT
We evaluatedthe proposednetworkboth in a simulationanda physical dynamic PM setup. In this section, the details of the training are described, followed by the simulationexperiment that wasconducted to evaluatethe validity of the proposednetworkandthe training strategy. Then,the resultsof the physicalexperiments, which wereconductedto checkif the proposednetworkwould work in an actualdynamicPM scenario,are introduced.
wherencr is a Gaussiannoisewith the standarddeviationvalueof 0" (Fig.5(c». Next, we appliedthe luminanceattenuationto Id to obtain the projected result I r . According to our assumptionof the FoV sharing betweenthe virtual projectorandthe virtual camera,the apparent size of eachprojectedpixel capturedby the virtual cameradoesnot change with respectto the depth variation of the projection surface. On the otherhand,the luminousftux that is emittedfrom the projectedpixel andincidentinto the virtual camera'slens is inverselyproportionalto the squareof the distancefrom the pixel. Thus,the luminanceof the projectedpixel capturedby the virtual cameraattenuates accordingto the inversesquarelaw. Therefore,the luminanceattenuationfactor at eachpixel is (MI (~' Y)P' Supposethe pseudo-projection operatoris denotedas 9; then the projectedresultis generated as: 20Q 0100 wo 800 1(lO(1 Numbn-ofI~atiOIl

Validation of training strategy
As describedin Sect.3.4, we proposeto separatelytrain the threesubnetworks,DefocusNet,LuminanceNet, andCompensationNet, rather thanjointly updatingall the weightsof the entire network(i.e., as done in the naIve training method). In the naIve training method,it is not guaranteed that DefocusNetand LuminanceNetprovide the correct defocusblur andluminanceattenuationmaps,respectively,at the early stageof the training. On the otherhand,the correctmapsarealways injectedinto CompensationNet in the proposedtrainingmethod.Therefore, we hypothesizethat the proposedtraining methodcan train the subnetworksmoreefficiently than canthe naIvetraining method.We experimentallytestedthis hypothesisby comparingthe loss values betweenthe naIve training methodand the proposedtraining method. Figure6 showsthe lossvaluesat eachiterationin the training of the threesubnetworksusing the naIve methodand the proposedmethod. We seethat the loss valuesdid not decrease but only ftuctuatedoverthe iterationswhen the naIve training methodwas applied. On the other hand,the proposedtraining methoddecreased the loss valuesof all three subnetworks.Therefore,we confirmed that our hypothesisis correct;andthus,the proposedtraining strategyis valid.

Validation of network structure
The most important feature of the proposednetwork is its divided structureinto an extractoranda generator.We designedthe proposed structureto usethe targetimageof the previousframe and its projected resultonly to estimatethe defocusblur mapand the luminanceattenuation map. If we did not explicitly separatethe network structure, the possibility that the textureof the previousframe would affect the compensation imagein the currentframe would increase.Therefore, we evaluatedour proposednetworkstructureby comparingit with a simpleronethat computesthe compensation imagewithout the explicit estimationsof the defocusblur map and the luminanceattenuation map. Specifically, consideringthe versatilepropertyof ResNet,we appliedit aswe did CompensationNet for thecomparedsimplenetwork, which takesthe targetimageof the currentframe,that of the previous frame, andits projectedresultas the inputs and generates a compensationimage. We trainedthe simplenetwork using the samedataset and pseudo-projection techniquethat we usedto train the proposed network.
We comparedthe proposedandsimplenetworksin the simulation using the pseudo-projection technique.Ten video files for the comparison wererandomlyselectedfrom anotherdataset(Momentin Time al. [16]. The 20,000pairsweredivided into 1,000mini-batches, each of which contained20 pairs. Becausetraining usinga largesetof video files normally takesa relatively long time, it wasnot feasibleto repeat our training for multiple epochs.Therefore,the epochnumberof our training wasone. We trainedour network using a sharedworkstation
To optimize the network, we utilized Adam [29] with a learning rate of le-3 and momentumparameters of f31 = 0.9 and f32 = 0.999.
All learnableparameters were initialized using the methodof He et

Warping in pseudo-projection
The pseudo-projection processhasbeenimplementedso far assuming the FoV of the virtual projectorandthat of the virtual cameraare identical. However,achievingthe perfectalignmentin an actualprojectorcamerasetupis difficult [2]. If the projectedresultsin a dataset are synthesized asdescribedaboveand usedto train the proposednetwork, a slight misalignment in an actualsetuppotentiallycausessignificant artifactsin the compensation result. Therefore,we proposethe useof a geometricwarping techniqueto simulatethe misalignmentand to incorporateit into the pseudo-projection process.

Dynamic PM in actual setup
We designedour network to compensate for the defocusblur in an actual dynamic PM scenario. In particular, our pseudo-projection techniquein the training applies warping to the virtually projected result,consideringthe potentialrnisregistrationin an actualprojectorcamerasetup,asdescribedin Sect.4.4. We evaluatedthe compensation performanceof the proposednetworkand the efficacy of the warping We built a physicalprojector-camera setup,as shown in Fig. 1. We useda DLP projector(BenQTH682ST,60mm lensaperture)andan industrialCMOS camera(FUR FL3-U3-13S2C-CS). Theprojection surfacewas a fiat, diffuse surfacewhoseposewas controlled by a robot arm (UFACTORY xArm 7) so that the samesequenceof the surfaceposescould be repeatedin different conditions. Becausethe computationofa compensation imagetook 71.7 ms (Sect.5.3), realtime frame-by-frameprojectordeblurring,which requirescompletion of the computationwithin 1/60s, wasdifficult to performin thecurrent setup. Therefore,we merely emulatedit using the robotic arm by slowly moving the surface. We performeda manual calibration to obtain the geometricrelationshipsamongthe projector,the camera, and the surface,by which we wereableto geometricallytransformthe capturedimageof the projectedresult suchthat the cameraand the projectorsharedthe sameFoY.
We preparedtwo types of movementsfor the projection surface: translationand rotation (Fig. 8). For the translationmovement,the surfacewas initially placed280 mm awayfrom the projectorsuchthat the surfacewasperpendicular to the projector'soptical axis. Then,the surfacewas translatedtowardsthe projectoralong its opticalaxis (Le., from far to middle) at the speedof 1 mm perframefor 40 frames.Then, the surfacewas translatedin the horizontaldirectionat the samespeed for 10 frames,during which no depthvariationoccurred.The surface was then translatedtowardsthe projector(i.e., from middle to near) againat the samespeedfor 40 frames. During the experiment,the projector'sfocusingdistancewasfixed at 280 mm from the projector lens (Le., far). For the rotation movement,the surfacewasplaced400 mm awayfrom the projectorsuchthatthe surfacewas perpendicular to the projector'soptical axis. The surfacewas rotatedaroundthe yaw axis at the angularvelocity of 1 angleper frame for 45 frames. Then,it wasrotatedbackat the samespeedfor 45 frames.During the experiment,the projector'sfocusingdistancewasfixed at 400 mm from the projectorlens.   Fig. 9 compared to the corresponding target images (orange: compensated condition; blue: uncompensated condition).

Full-color dynamic PM
Full-color imagescan be compensated using the proposednetwork independentlyfor eachcolor channel. The full-color compensation imagewas generated by concatenating the compensation imagesof the threecolor channels. Figure I showsa part of the projectedresults capturedby the camera.We canseethatthe high-frequencycomponents were preservedin the resultsof the camp condition, while thoseof the correspondingvideo frames were missing in the results of the uncamp condition. Therefore,we experimentallyconfirmedthat the proposednetworkcould compensate full-color imagesfor the defocus surface.We calledthis the "compensated (without warping)" condition. Figure II comparesthe estimateddefocusblur andluminanceattenuation mapsin the rotationmovement betweenthe compensated condition andthe compensated (without warping)condition. Becausethe surface was flat and rotatedsuchthat the projectedimageappearedfocused at its right end and was getting defocusedtowardsthe left end, the intensitiesof the defocusblur mapandthe luminanceattenuationmap shouldbe linearly decreasingfrom left to right. In the figure, we see thatthe targettextureis prominentin both the mapsof the compensated (without warping) condition. This artifact wascausedby the inaccurate geometricregistrationsof the actualprojectorand camera,and thus, thesetwo devicesdid not perfectlysharetheFoYs. The networktrained without the warpingprocessdid not assumesuchsituation,and thus, producedthe artifacts. On the other hand, we can observethat the proposednetwork was less affected by the misregistration,and the textureof the targetimageis lessvisible in the maps.The right graphs quantitativelyshowthis trend. The red lines (the compensated condition) decreasemore smoothlyfrom 0 to 255 on the horizontal pixel coordinatethan the greenlines (the compensated (without warping) condition).

Validation of warping process in pseudo-projection
We saw the effect of the warping processon the estimateddefocus blur andluminanceattenuationmaps.We trainedour networkwithout the warpingprocessin the pseudo-projection techniqueandconducted thedynamicPM experimentusing the samevideo files andprojection 5.4.2 Results Figure9 showsa partof the projectedresults.The comparisonof the resultsin the compensated and uncompensated showsthat the highfrequencycomponents(texturedetails)werepreservedwhen the proposedcompensation techniquewasapplied,while thoseweremissing in the uncompensated condition. Figure 10 showsthe SSIM valuesof theprojectedresultscomparedto the corresponding targetvideo frames. The SSIM valuesin the compensated conditionwere higherthanthose in the uncompensated condition in both the translationand rotation movements. TheaverageSSIM valuesoverall the video framesof all the video files in the translationmovementwere0.531 in the compen-satedconditionand0.438in the uncompensated condition. Thosein the rotationmovementwere0.804in the compensated condition and 0.722in the uncompensated condition. Therefore,we canquantitatively confirm that the proposednetworksuccessfullycompensated for the defocusblur in an actualPM system. blur artifacts.
A largepartof a naturalimageconsistsonly of low-spatial-frequency components;and thus, projector deblurring does not significantly changethe imagequality of the projectedresultof sucha low-frequency part. Therefore,we could improveour methodin termsof its computational costand memoryusageby applying the compensation network selectivelyto only the imageareasthatcontaina largeamountof highspatial-frequency components. We may further speedup the inference processand reducememory usageby incorporatingfactorizationof two-dimensional PSFinto two one-dimensional PSFs [43]. Another solution is the applicationof the latestmulti-scaleseparablenetwork, which has beenproven to deblur a 4K video at the video rate of 35 fps [10]. We believethat projectordeblurringfor largerimagesizein dynamicPM is an importanttopic for future study.

CONCLUSION
This paperpresented a DNN that cancompensate for defocusblur in dynamic PM. The primary contribution of this paperis its developmentof a uniquenetworkstructurethatconsistsof an extractoranda generator.The extractorexplicitly estimatesa defocusblur mapand a luminanceattenuationmap,which are theninjectedinto the middle layersof the generatornetwork that computesthe compensationimage. We also proposeda pseudo-projection techniquefor synthesizing physicallyplausibletraining data,consideringnot only the defocusblur and the luminanceattenuationbut also the geometricmisregistration thatpotentiallyhappensin actualPM use. We conducteda simulation andactualPM experimentsandconfirmedthat: (I) the proposednetwork structurewas moresuitablefor projectordeblurringthan a simple structure;(2) the networktrainedwith the proposedpseudo-projection techniquecould compensate projectionimagesfor defocusblur and luminanceattenuationartifactsin dynamicPM; and (3) the network supportedthe translationspeedof a projectionsurfacewithin a certain rangethatcoversnormal humanmotions.In our future study,we will testour methodin morecomplexenvironments, suchas onewith clear depthdiscontinuities.We will also conducta userstudy to evaluate how much the proposednetworkimprovesthe projectedimagequality comparedto the simple networkin the perceptual space.