Universal Framework for Joint Image Restoration and 3D Body Reconstruction

Recent works have demonstrated excellent state-of-the-art achievements in image restoration and 3D body reconstruction from an input image. The 3D body reconstruction task, however, relies heavily on the input image’s quality. A straightforward way to solve this issue is by generating vast degraded datasets and using them in a re-finetuned or newly-crafted body reconstruction network. However, in future usage, these datasets may become obsolete, leaving the newly-crafted network outdated. Unlike this approach, we design a universal framework that is able to utilize prior state-of-the-art restoration works and then self-boosts their performances during test-time while jointly carrying out the 3D body reconstruction. The self-boosting mechanism is adopted via test-time parameter adaptation capable of handling various types of degradation. To accommodate, we also propose a strategy that generates pseudo-data on the fly during test-time, allowing both restoration and reconstruction modules to be learned in a self-supervised manner. With this advantage, the universal framework intelligently enhances the performance without any new dataset or new neural network model involvement. Our experimental results show that using the proposed framework and pseudo-data strategies significantly improves the performances of both scenarios.


I. INTRODUCTION
The task of 3D body reconstruction is gaining popularity in recent times. The 3D version of the human body can be parsed directly using only a single image as input. Various works [1]- [8] have been explored to achieve this objective. Unfortunately, these methods rely heavily on the input data, namely image, that should satisfy the clean scenario. Degraded image case affects negatively unto the body reconstruction output, and most of the state-of-the-art works exclude this constraint.
One may solve this issue straightforwardly by synthesizing a large corrupted image dataset and utilizing it in a finetuned or newly re-created body network. Another solution is to generate the corrupted images using physical tools [9] and annotate the 3D body data via external devices [10]. However, this pre-generated dataset approach comes with the cost of high labor work while still unable to support the network in solving the constantly-change real-world data. This situation subsequently leaves the newly crafted network to be outdated. To tackle this issue, the recent trend is shifted to the life-long learning direction, which can be applied via test-time adaptations instead of running a single-forward pass algorithm that only depends on the pre-generated dataset. Recent 3D body reconstruction work [11] followed this pathway by self-adapting their network using its test data via meta-learning [12]. However, again, it is still constrained to the clean image scenario. Thus, solving restoration and reconstruction jointly and adaptively is still an open matter.
With the challenges exposed in the previous discussion, we propose a joint restoration and reconstruction framework that solely relies on the input data processed through a testtime self-improvement fashion. Instead of crafting a genuine neural network architecture, we construct a framework in the algorithm forms capable of plugging any deep-learningbased image restoration modules while jointly tasked with the 3D body reconstruction module. For that very reason, we define our framework as a universal one. To achieve the testtime improvement, we present a self-adaptation algorithm that fits both modules and optimizes them using only the input data. This strategy is made possible by borrowing the concept of model agnostic meta-learning (MAML) [12] and adjust it within our algorithm.
Without any new dataset acquired, one might question on how the self-adaptive algorithm works seamlessly. In this work, we introduce the readers to a term coined as pseudodata which is generated on-the-fly during the test-time and aimed to assist our algorithm. Our straightforward yet unique approach for acquiring the pseudo-data is meant to substitute the new dataset approach itself. The pseudo-data, generated directly from the test input image, is split into pseudoclean and pseudo-corrupted information. Their elaborations, along with our algorithms, are furthermore discussed in the Method section. The scope of our study in the restoration case involves 3 major degradation-solver, namely: denoising, deblurring, and super-resolution (SR). Each of these scenarios is worked jointly with the 3D body reconstruction under the universal framework.
In the experimental procedures, we show that the test-time self-adaptive capability is preserved even though the joint scenario is applied, yielding better results in both quantitative and qualitative aspects. Furthermore, we analyze the characteristic of each restoration scenario along with the 3D body reconstruction results. With this finding, future joint framework applications may utilize this information for further improvements. In summary, we define our contributions as follows: • We introduce a modular-based plug-and-play universal framework (PPUF) encapsulated within a self-adaptive algorithm capable of receiving any deep-learning-based image restoration methods while jointly carrying the 3D body reconstruction task. • We introduce the utilization of pseudo-data in test-time as an alternative yet cheap way than generating a new vast dataset to support our algorithm. • We show that using the proposed algorithm and pseudodata alone within the joint framework, both restoration and 3D body reconstruction modules work simultaneously while producing significant quantitative scores and visual quality improvements.

II. RELATED WORKS
Image restoration and 3D body reconstruction The work of image restoration has been explored since decades ago. The arts of restoration are mainly categorized into deblurring, denoising, and SR scenarios. In the deblurring case, most works focused on motion deblurring tasks. These studies have been evolved from traditional blur kernel prediction based [13]- [15] up to the recent kernel-prediction-free method via deep learning [16]- [18], particularly GAN strategy [19]. The trends followed by equipping them with realworld priors such as human faces [20], [21] and bodies [22], [23]. Denoising and SR works are also evolving these days. [24] translated the cleaned version of an image directly from its noisy input. This approach is made possible by providing the pairs of clean and noisy images as a training dataset. Many works [25]- [28] then followed the same strategy with many modifications on the neural network level. In the SR case, the trends were moving from the hand-crafted functionbased, such as: random forest [29], [30] and decision tree [31] approaches up to the common deep-learning utilization [32]- [34]. Although this work is mainly implemented for the deblurring, SR, and denoising scenarios, we believe that other restoration, such as dehazing [35], [36] and de-raining [37], [38] are also applicable for future implementation. 3D body reconstruction, on the other hand, is a computer vision task that reconstructs a 3D human body model from an input image. Most works are evolved by using the statistical human body prior of [39] which is directly regressed via hand-crafted functions [5], or even deep-learning [2], [6]- [8], [40] approaches. The earlier hand-crafted approach by Bogo et al. [5] utilizes the initial output of the estimated 2D body joints to regress the statistical body prior final output. The following work by Kanazawa et al. [6] embraced the deep learning-based implementation via generative adversarial network [19]. Their work [6] utilized specific discriminators that constraint the 3D body reconstruction results within the boundary of realistic human body poses. Its improved version [40] is aimed to hallucinate temporal-based 3D body outputs with plausible movements. Kolotouros et al. [8] introduced a straightforward refinement from the work of [6] via body-fitting mechanism during training. The work of Moon et al. [2] surpassed their performances by utilizing the line-based pixel (lixel) information to better regresses the 3D body prior output. In recent times, some additional works [41]- [43] are tasked to refine temporal-based 3D body reconstruction works via (i) gated recurrent units that connect relatable body features in sequential frames [41], (ii) additional module that penalizes outlier body poses in adjacent frames [42], and (ii) temporal re-weighting strategy that balances the sequential frames' features [43]. Unlike these human prior-based approaches, Saito et al. [4] avoided the prior utilization by directly employing the pixel-aligned implicit function to predict the 3D cloth-textured human body output. Learning strategies for computer vision tasks Unlike the discussions above, some arts mainly focus on strategies that boost the performance. The early work by [44] utilized ondemand learning to study multiple restoration cases such as in-painting, pixel interpolation, deblurring, and denoising being solved separately. [45] then introduced a reinforcement learning-based restoration that is able to solve combined cases automatically. Their cases include deblurring, de-raining, denoising, and de-JPEG-ing. The work by [46] constructs an architecture that also solves a similar scenario intending to improve the recognition task. Recent trends move toward the adaptive learning strategy to achieve the life-long demand. This demand emphasizes the capability of any computer vision function to adapt to the unknown test data instead of relying on the pre-generated dataset that might become obsolete in future usage. To achieve it, recent works exploit the model agnostic meta-learning [12] algorithm that is known to induce the self-adaptive capability. These particular works in SR [47]- [49], motion-deblurring [50], video interpolation [51] and body reconstruction [11] utilize and adjust the meta-learning algorithm to boost their performances in the test phase.
Inspired by these studies, our work is focused on boosting the joint work of restoration and 3D body reconstruction without any modification in the network level and additional dataset. We describe our strategies in the following section.

III. METHOD
Our PPUF is built by a restoration module (RestoNet (f )) followed by a 3D body reconstruction module (BodyNet (Ω)), which outputs both restored images and reconstructed bodies, respectively. It is trained into 2 progressive stages, namely: (i) initial learning, denoted as blue arrow and (ii) meta-transfer learning, denoted as brown arrow in Figure 1. Once trained, the PPUF is run through our meta-testing algorithm to perform test-time self-improvement, as reflected in Figure 2. We elaborate the details from training to testing procedures on the following subsections:

A. INITIAL LEARNING
This stage is regarded as the initial procedure of our scheme. The aim of this training is to transfer the un-trained weights to their initial trained versions ((f 0 , Ω 0 ) → (f T , Ω T )) that are robust in solving restoration as well as 3D body reconstruction. The training involves only the dataset of Human3.6M [10] as it provides the human keypoint ground truth while we carefully degraded the images according to the degradation scenario: noisy, blurry, and low-resolution. In the noisy case, we determine the range of sigma noise randomly between 10 to 20 during training. In the deblurring case, fixed Gaussian blur with the window size of 5 × 5 and randomized standard deviation (σ) between 0.05 to 1.2 are utilized. Finally, for the case of SR, we determine the case of 2× downsampling in our study.

B. UNIVERSAL META-TRAINING
The next stage is to perform the meta-transfer training that utilizes the output weights f T , Ω T of initial learning. This training is aimed to provide initial stable weights of both modules that are ready to be adapted during meta-testing. Our strategy in performing this training is visualized in Figure 1 under the chunk of the brown directional arrow . This figure is translated directly to our implementation in Algorithm 1.
Algorithm 1 followed the meta-learning [12] concept as it is constructed by 2 important snippets: inner-loop (Line 8-12) and outer-loop (Line 14-18) operations. The idea of this training is to explore the best weights position for test-time adaptation. To achieve this goal, our the training algorithm utilizes 2 data samples, namely task-training (T tr ) and tasktesting (T te ) batches sampled from a data collection p(T ). T tr and T te are utilized within the inner-and outer-loop scopes of Algorithm 1, respectively. Each time the networks are optimized within the inner-loop scope (Line 11-12), their behaviors are supervised by the outer-loop optimizer (Line [17][18]. From this point forward, we provide the detailed elaboration of Algorithm 1. Algorithm 1 fed the input data of corrupted image C, clean image L, and the body ground truth (3D and 2D keypoints) G. These data are re-sampled for the T tr and T te batches (Line 4-6). The process is then continued directly to the inner-loop scope with a pre-determined number of iterations i (Line 8-12). Our algorithm restores the image inside the inner-loop to obtain R and reconstruct the 3D body parameters S from the restored image R. The next step is to perform a loss calculation (Line 10) that combines restoration loss L f and body reconstruction loss L Ω . In details, L f is defined by: Similarly, L Ω is represented by: where P 3 (.) and P 2 (.) denote the 3D keypoints and 2D keypoints extraction functions, obtained from the body parameters of Skinned-Multi Person Linear model (SMPL) [39] using the 3D body reconstruction network Ω [6]. Variable a and b represent the data batch index inside the inner/outerloop scope, respectively. Instead of using gradient descent in the inner loop, we utilize the ADAM method with certain learning rates α, β for the restoration and reconstruction cases, respectively. Once the inner-loop operation is finished, the process is progressed through the outer-loop algorithm with similar procedures (Line 14-18).

C. UNIVERSAL META-TESTING
The meta-testing is the main actor in executing the selfimprovement strategy. Our meta-testing in the joint framework is reflected in Algorithm 2. Its scheme is shown in Figure 2 where 3 sub-procedures are run progressively: (i) pseudo-data extraction followed by (ii) test-time adaptation, and the (iii) final meta-testing. These procedures correspond directly to the Algorithm 2 in Lines 3-5, 6-10, and 11-12, respectively. The pseudo-data extraction stage is important as it provides a cue to perform self-adaptation of the restoration and reconstruction modules. In our approach, the pseudo-data is constructed by 2 particular information: pseudo-clean and pseudo-corrupted data. The restoration is supported by the pseudo-corrupted image C P and pseudo-clean image L P while the reconstruction module is supported by the pseudocorrupted body S I and pseudo-clean body S P . These  Sample task batch T tr and T te from p(T ).

5:
Obtain data C a , L a , G a from T tr . 6:     pseudo-corrupted data are treated as input data, while the pseudo-clean data is regarded as the label information during self-supervised learning. In the following discussion, we describe our strategies to extract this information for various restoration cases. For simplicity, we provide the example of the denoising case in Figure 2.
• Denoising case For the denoising case, we extract C P by adding a consistent severe noise to the original corrupted input image (C 0 ). To be precise, we add noise to   This image is then treated as the pseudo-corrupted image (C P ), while the intermediate denoised image from the input image (R 0 ) is treated as the pseudo-clean image (L P ) (reflected in Line 2-3 in Algorithm 2). The pseudo-clean and pseudo-corrupted bodies are extracted from the restored version of R 1 (restored from C p ) and its initial corrupted input C 0 , respectively. The visualization of these approaches are shown in Figure 2 within the chunks of pseudo-data extraction for restoration and pseudo-data extraction for reconstruction. • Super-resolution case In the super-resolution case, the main objective is to super-resolved low-resolution image input to a desired 2× spatial dimension. Its pseudocorrupted image (C P ) is obtained by downscaling the low-resolution input by 2×. The input is then treated as pseudo-clean image L P . With this approach, the pseudo-corrupted image C P has the size of h/2 × w/2, FIGURE 2. Our universal meta testing scheme, which executes pseudo-data extraction, test-time adaptation, and final testing procedure using the adapted weights. Blue arrows indicate the supervision function for loss calculation.
which is 0.5× smaller than the original low-resolution (h × w) input image (treated as L P ), and 0.25× than the original desired output (2h × 2w). Note that, in the SR approach, the pseudo-clean S P and pseudo-corrupted S I bodies are extracted by the re-upsampled version of L P (2×) and C P (4×), respectively. The re-upsampling process is applied via classical bilinear interpolation. This approach is visualized in Figure 3 and can be utilized for replacing the pseudo-data extraction for restoration chunk of Figure 2. • Deblurring case In the deblurring case, we firstly obtain the initial deblurring result of the blurry input image (C 0 → R 0 ). Both the blurry input and the deblurring result images are then manually blurred with Gaussian degradation. Specifically, we set the kernel window of 5 with σ = 0.8. The degraded version of blurry input (C 0 → C P ) and the intermediate deblur (R 0 ) result are treated as pseudo-corrupted (C P ) and pseudo-clean images (L P ), respectively. Similar to the denoising case, the pseudo-clean and pseudo-corrupted bodies are extracted from the restored version of R 1 (restored from C p ) and its initial corrupted input C 0 , respectively. This approach is visualized in Figure 4 and can also be substituted with the pseudo-data extraction for restoration chunk of Figure 2. Take note that in the case of generating the pseudo-body, we make use of another identical 3D body reconstruction network (BodyHelp in Figure 2) to produce the pseudo-clean body L P from the intermediate restored version R I (Line 4 and second term of Line 5 of Algorithm 2). The pseudocorrupted body is extracted by using the initial corrupted input C 0 without being restored (first term of Line 5 of Algorithm 2). The BodyHelp module itself is excluded for optimization during the test-time backpropagation. Once the pseudo-data is ready, the algorithm performs the adaptation procedure that is run within a number of iteration (N ), as highlighted in Line 6-10 in Algorithm 2. The self-adaptation test loss (Line 7) is performed by utilizing the same formula of Eqs. 1 and 2. The adaptation is illustrated in the test-time adaptation chunk of Figure 2 where the losses calculation is jointly summed, and only the RestoNet and BodyNet are being backpropagated. In our implementation, we simply utilize an ADAM optimizer with the learning rates of each respected algorithm (described in Experiment section). After the selfadaptation procedure, both end-to-end networks (f, Ω) secure their adapted position. As illustrated in Figure 2, the red-dot object (f N , Ω N ) is already transferred to its new location that is adapted according to the pseudo-data. The final restoration and reconstruction are performed in Line 11-12 to produce R N , S N .

A. IMPLEMENTATION SETTINGS
In our experiments, several backbone methods are used for the restoration procedures, namely: URIE [46] (deblurring and denoising cases) and EDSR [34] (SR case), which are available in PyTorch. For the backbone of the 3D body reconstruction function, we opt to utilize the PyTorch-based method of SPIN [8] without its additional Smplify [5] optimization, which essentially equals to HMR [6]. These   PyTorch-based methods were elected as our algorithms are scripted using the Pytorch library. Human3.6M [10] is the sole dataset utilized in training the whole framework. The human image of Figures 1 and 2 are obtained through Hu-manEVA [52] and InstaVariety [6] dataset for clearer illustration sake. During training, both networks fed the patch of 224 × 224 following the strict requirement of HMR [6] that is plugged into our BodyNet module. For deblurring and denoising cases, α (learning rate for restoration) is set to 1e-03 using URIE [46] method while the SR case utilizes α=1e-07 as presented in EDSR [34] method. For the 3D body reconstruction, β (learning rate for reconstruction) and weighted constant λ are set to 1e-05 and 5.0 respectively, following the original work [6]. Batch of data sampled, p(T ) is set to 16 with 50:50 ratio divided to T tr (batch size of 8) and T te (batch size of 8). T tr and T te are utilized within inner-and outer-loop operations, respectively. The data batch in T tr are different from the one in T te . The total required time for the training scheme is around ∼4 days using a Titan RTX GPU. In testing, the maximum iteration N is set to 20.

a: Preliminary
We provide a vast benchmarking experiment using the 3DPW [53] dataset. We make use the validation (-v) and test (-t) sets of 3DPW as they provide the human images and equipped with the body ground truth such as 3D key points and SMPL [39] parameters. The mentioned annotations are utilized for calculating the mean per joint position error (MPJPE) and reconstruction error (RE) commonly used in SMPL-based benchmarking. Lower error scores of these metrics indicate better results. Note that the training set of 3DPW [53] is even excluded during the learning process of PPUF. The images on the benchmarked set are pre-degraded randomly before the testing procedure is run. For the noisy scenario, the noises are varied using the sigma values from 10 to 20, and they are added directly to the image. For the blur scenario, we determine the Gaussian sigma randomly for 0.05 to 1.2 with the kernel size of 5. The blur is convolved directly in each benchmarked image. Finally, on the lowresolution scenario, we determine the case of 2× downsampling following the recent studies [47], [48] that are directly applied to the input image. The total benchmark data of the 3DPW-validation (3DPW-v) and 3DPW-test (3DPW-t) cases are 10,412 and 35,515 images. The restored images are then compared with the clean ground truth by the metrics of peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Higher scores indicate better results.

b: Joint denoising and 3D body reconstruction
We begin our analysis by interpreting the results of restoration as it is run before the reconstruction task. In the case of denoising, the restoration is consistently improved during test time. This argument is clearly visible in our quantitative scores that are presented in Tables 1 and 2 which denote the PSNR and SSIM scores, respectively. Here, the denoising cases are shown in the columns of Denoise-v and Denoise-t where their data is obtained from the validation and test sets of the images in 3DPW [53]. The degraded images are added with the noise as previously discussed in the preliminary section. Thanks to the pseudo-clean and pseudo-corrupted data generated in test-time, our PPUF is able to improve the performance by maximum +0.9842 dB (Denoise-v) and +1.2731 dB (Denoise-t) from the initial denoising outputs. Similarly, the SSIM gains about +0.04124 (Denoise-v) and +0.07518 (Denoise-t). The interesting finding here is that the performance is relatively stable, although the iterations are increased. Using the PPUF, no PSNR nor SSIM bounce back to the initial scores. For clarity, these PSNR and SSIM scores are re-plot in Figure 5 (a)-(b). The graph of Figure 5 (a)-(b) clearly demonstrate that the scores of Denoise-v and Denoise-t are vastly improved between the initial 3 iterations, followed by a slight yet consistent improvement when the iterations are increased.   The latter analysis is then aimed at the 3D body reconstruction result that is coupled with the denoising task. As shown in Table 3, the MPJPE scores are suppressed with the margin of -13.843 mm (Denoise-v) and -12.192 mm (Denoise-t) from the initial 3D body reconstruction outputs. Using the PPUF, the MPJPE also hold the same characteristic with the restoration module as the scores jump around -8.804 mm (Denoise-v) and -9.066 mm (Denoise-t) with only 3 iterations. Similar and consistent results are also reflected in the RE performance as seen in Table 4. Maximum errors to be suppressed are -6.798 mm and -6.406 mm for Denoisev and Denoise-t cases. Within only 3 iterations, the RE scores are suppressed by -4.475 mm (Denoise-v) and -4.569 mm (Denoise-t). From the analysis above, it is clear that our PPUF helps the denoising and 3D body reconstruction module to self-improve jointly in test time with the capability of fast adaptation.
The visual performance of the joint denoising and 3D body reconstruction is shown in Figure 6. From these examples, detailed structures that were not present in the initial output are recovered after several iterations. The reconstructed bodies are also improved when the iteration of the PPUF is increased. The improvements are emphasized in the cropped regions. From our perception, the reconstructed body tends to have a correct initial position while the minor incorrect VOLUME 4, 2016  The deblurring restoration within our PPUF framework also demonstrates its self-improvement capability. In this case, the deblurring also utilized the pseudo-clean and pseudocorrupted data that are extracted during test-time. From Table 1, The PSNR scores are significantly improved by +1.529 dB (Deblur-v) and +1.791 dB (Deblur-t). Similar to the denoising case, large gains are already scored within 3 iterations as they reach +1.445 dB (Deblur-v) and +1.731 dB (Deblur-t) improvements. These results are also improved in the SSIM level by the maximum gain of +0.02162 (Deblur-v) and +0.03463 (Deblur-t) as written in Table 2.
In the 3D body reconstruction case, consistent performances are also demonstrated in Tables 3 and 4. The predicted body joints are re-improved by the reduction of the error scores by -15.646 mm (Deblur-v) and -11.953 mm (Deblur-t) as shown in Table 3. Similar to the denoising case, significant cuts are shown within 3 iterations only, indicating that the task of deblurring and 3D body reconstruction succeed in performing fast adaptation jointly.
Qualitative results are shown in Figure 7 where weak artifacts are restored after several iterations. The case of rendered 3D body reconstruction results shows a clear difference. Partial body regions are clearly improved after iterations occur. Figure 7 shows that the foot positioning are even re-corrected according to the depth position. To our visual perception, most updates are developed in the scale as well as the body-joint rotations levels. Additional qualitative results in the deblurring and 3D body reconstruction case are also provided in the supplementary material.

d: Joint super-resolution and 3D body reconstruction
In terms of super-resolution, our restoration results still maintain high scores output. Unlike the denoising and deblurring cases, the super-resolution is improved slightly by +0.421 dB (SR-v) and +0.029 dB (SR-t) as shown in Table 1. Nonetheless, the achieved gains present in each iteration case denote that the PPUF still maintain the SR module's selfimprovement capability with no tendency to get significantly lower scores when iterations are increased. This phenomenon indicates that the SR module in PPUF avoids the catastrophic forgetting [54] effect, which is a performance reduction anomaly due to the inability to preserve initial knowledge.
Interestingly, the 3D body reconstruction results in the SR case achieve the best performance compared to other degradation cases. This phenomenon indicates that the SR case is relatively easier to be solved by the reconstruction module. As shown in Tables 3 and 4, the MPJPE and RE scores of (SR-t) obtain the values of 121.425 mm and 69.889 mm, respectively. These scores are improved through the margin of -14.869 mm and -7.819 mm compared to each respective initial output. These results are directly reflected in the qualitative outputs in Figure 8 where restoration performance is still preserved with high quality, while the body reconstruction is significantly improved over iterations.
With the vast experiments above, our work proves that any restoration and 3D body reconstruction arts can be coupled to achieve test-time adaptation. The self-improvement capability can be achieved thanks to the assistance of the pseudo-training data introduced in the test stage. Moreover, we demonstrate the nature of fast-adaptation capability using the proposed PPUF as most of the modules obtain vast gains within few iterations. Using only the prior works within our modules, we observe that adaptive learning is feasible as long as they are adjusted with a correct algorithm and reliable information, such as pseudo-clean and pseudo-corrupted data. More qualitative results in the SR and 3D body reconstruction case are also provided in the supplementary material. We suggest the readers check all the visual results on an electronic screen. The supplementary material is provided in video format to accommodate the reader to easily view the test-time improvement of the restoration and 3D body reconstructions results.
While we demonstrate the successes above, our method, like other self-adaptive works, requires extra time to perform test-stage iterations (Line 6-10 in Algorithm 2). However, our algorithm only spends around ∼3.5 seconds to run both restoration (SR case using EDSR [34]) and reconstruction (using HMR [6]) for 20 iterations using the input of 224 × 224 × 3 image in a Titan RTX machine. The case of denoising or deblurring (using URIE [46]) joint with the reconstruction task takes about ∼2.8 seconds. Thus, future work that solves automatic-iteration-cutting can benefit testtime self-improvement schemes in joint-task scenario.

V. CONCLUSION
We presented a universal framework in the algorithm forms capable of utilizing various prior state-of-the-art restoration modules, particularly in solving denoising, deblurring, and SR, while jointly tasked with 3D body reconstruction modules. Our work is motivated by the need for life-long learning that focuses on adaptive capability while avoiding the classic approach that requires a vast amount of datasets and recrafting specific novel networks that suit the datasets. With the support of the pseudo-training data during test-time, our framework can perform temporary self-supervised training that yields to the performance gain. By this approach, both restoration and reconstruction modules are endowed with self-adaptive capability. Our experimental results show that coupling these tasks with a correct framework and reliable pseudo-data obtain significant improvement even within a few iterations without any performance drop anomaly. We believe our exploration can be further extended to the more challenging restoration tasks such as dehazing and de-raining as their synthesized data are feasible.