Estimating Physically-based Reflectance Parameters from a Single Image with GAN-guided CNN

We present a method that estimates the physically accurate reflectance of materials from a single image and reproduces real world materials which can be used in well-known graphics engines and tools. Recovering the BRDF (bidirectional reflectance distribution function) from a single image is an ill-posed problem due to the insufficient irradiance and geometry information as well as the insufficient samples on the BRDF parameters. The problem could be alleviated with a simplified representation of the surface reflectance such as Phong reflection model. Recent works have appealed that convolutional neural network successfully predicts parameters of empirical BRDF models for non-Lambertian surfaces. However, parameters of the physically-based model confront the problem of having non-orthogonal space, making it difficult to estimate physically meaningful results. In this paper, we propose a method to estimate parameters of a physically-based BRDF model from a single image.We focus on the metallic property of the physically-based model to enhance the estimation accuracy. Since metals and nonmetals have very different characteristics, our method processes them separately. Our method also generates auxiliary maps using a cGAN (conditional generative adversarial network) architecture to help in estimating more accurate BRDF parameters. Based on the experimental results, the auxiliary map is selected as an irradiance environment map for the metallic and a specular map for the nonmetallic. These auxiliary maps help to clarify the contributions of different actors, including light color, material color, specular component, and diffuse component, to the surface color. Our method first estimates whether the material on the input image is metallic or nonmetallic. Then, it estimates BRDF parameters using CNN (convolutional neural networks) architecture guided by generated auxiliary maps. Our results show that our method is effective to estimate BRDF parameters both on synthesized as well as real images.


I. INTRODUCTION
E XTracting material properties has been a classic computer vision and graphics problem. The image synthesis is the result of complex physics; shape, reflectance, and illumination need to be known on the processes through the rendering pipeline. The inverse rendering problem, i.e. inferring intrinsic properties from an image is very difficult to solve under most of the assumption since the same visual result can be rendered from the multiple combinations of properties. The estimation of reflectance from a single monocular color image with uncontrolled illumination is an ill-posed problem even though we have a prior information of the object's shape.
Bidirectional Reflectance Distribution Function (BRDF) is commonly used to encapsulate the material reflective characteristic. Data-driven approaches have advantage for capturing the detailed appearance of real world materials. However, the acquisition and measurement of the BRDF is a timeconsuming and delicate process due to the high dimension of the function which is at least 4 (BRDF) and up to 8 (BSSRDF). Alternative approaches to avoid this problem are simplified analytic BRDF models controlled by a set of parameters. The efficiency and expressiveness of these models are the major merit when the commercial graphic rendering applications reproduce a realistic scene. Despite those merits, the process of adjusting the model parameters by hand is confusing and error-prone task for the application artists.
Deep learning enables the automatic learning the underlying features of both natural and synthetic images from large training data. Recent works demonstrate the capability of solving the non-linear optimization of classic graphics problems such as parameter estimation of rendering equation and environment reproduction, and furthermore convolutional neural networks are particularly known for high performance in the context of a single object. [1] and [2] estimate BRDF parameters of the material from one or more images of specular objects in a specific class including cars, chairs, and couches. [3] proposed a realtime approach for estimating the surface reflectance from sequential RGBD images in 90ms. These prior works assume the empirical BRDF models such as Phong and Blinn-Phong models that have limited ability to represent the complex appearance of the realistic material.
Our research aims at the handy adjustment of material properties for physically-based rendering applications. Empirical BRDF models have mostly straight-forward and intuitive parameters so that 3D graphic artists can define materials by adjusting them with little difficulty. However, they have limited expressiveness that does not fully capture the physical reflection on the real material surface, especially at grazing angles. In order to synthesize photo-realistic results, state-of-the-art rendering applications support various physically-based models. Disney BRDF [4] is a representative reflectance model that is developed in the principled and art-directable philosophy, but nevertheless it conducts sufficiently realistic results. Our goal is to construct the estimation procedure for major parameters used in Disney BRDF shading model.
In this paper, we propose a method based on generative network guided convolutional neural network to automatically estimate the physically-based BRDF parameters of an isotropic single material in a spherical shape. We assume that acquiring the spherical reflectance map [1] is available prior to our estimation process. We exploit the property which metals and nonmetals (dielectrics) distinctively contain, making two different training networks. Moreover, we enhance the training of the parameter estimator by generating a spherical auxiliary map of the irradiance environment or specular reflection, motivating the cyclical architecture of GAN (generative adversarial networks).
Our contributions are as follows. (1) We propose a deep learning method to estimate BRDF parameters of physicallybased shading model from a single image. By using this method, we can synthesize realistic images with physicallybased materials captured from real-world objects. (2) We design GAN architecture not only to predict the irradiance map but also to help estimating BRDF parameters simultaneously, by applying the cooperative structure of map generator and BRDF estimator. The generated irradiance map can be also used for reproducing the illumination environment of the input image.

II. RELATED WORK A. PARAMETRIC BRDF ESTIMATION
Parametric BRDF estimation has been considered as a reasonable compromise to obtain the visual property of an observed material. Traditional data-driven approaches to acquire material properties are conducted by expensive hardware and slow scanning process in controlled environment. Ngan et al. [5] compare the ability of various analytic BRDF models to represent real measured samples, with the supplementary results of parameter fitting. Ghosh et al. [6] design an optical setup that allows for basis function illumination of BRDF samples. Dupuy et al. [7] use a goniophotometer to simultaneously manage BRDF acquisition and storage for isotropic and anisotrpic materials. As alternatives to complicated acquisition setup, there has been a variety of approaches to formulate the inverse rendering problem by optimizing parameters through three intrinsic properties, i.e., lighting, geometry, and reflectance. Lombardi et al. [8], [9] jointly estimate the reflectance and illumination from a single image of a well-known shape object. Romeiro et al. [10] infer BRDF of the isotropic surface through exploiting natural illumination instead of active lighting device. Nielsen et al. [11] simplify the measurement procedure for isotropic BRDF reconstruction by reducing the measurement samples based on the regions of importance.
With the progress of the recent studies on deep learning techniques, their performance has been proved to surpass conventional optimizations. Li et al. [12] learn spatially varing surface reflectance of a planar material from a single image using self-augmented CNN. Kim et al. [3] take RGB-D images as input to estimate the surface reflectance in real-time using two network architectures, HemiCNN and Grouplet. Meka et al. [2] also suggest an end-to-end approach for real-time material estimation from a single color image. Georgoulis et al. [1] first estimate material intrinsics using a learning-based sparse data interpolation technique, and for the second step, reconstruct reflectance parameters from the estimated material intrinsics. While the objective of prior researches is estimating empirical BRDF models, Vadaurre et al. [13] reconstruct physically-based BRDF from two shots of the material. We present the approach that predicts the parameters used in physically-based rendering software from a single shot of the material.

B. ILLUMINATION ESTIMATION
Several studies have estimated illumination of outdoor environment [14] and indoor environment [15] using CNN-based methods. Image transformation techniques [16], [17], [18] inspire the novel method for generating illumination images. Georgoulis et al. [1] estimate an environment map from the color image of a known shape using U-NET architecture. Boss et al. [19] adopt conditional adversarial network that generates differentiable illuminations to be used for predicting BRDF parameters. Our method concerns the twoway process of GAN [16], both illumination estimation and parameter estimation can be simultaneously performed in a single network.

III. PHYSICALLY-BASED RENDERING PARAMETERS
In this section we introduce parameters of the physicallybased model for synthesizing training image data. The Cook-Torrance model became the common option in computer graphics since it represents the distinction between metals and nonmetals. The model takes into account both diffuse component that is modeled as Lambertian reflection and the specular component that is modeled as the combination of Fresnel effect, microfacet normal distribution, and geometric visibility. The full model is described as a weighted sum of diffuse and specular components.
The specular component assumes that the probability of the microfacets aligned in the direction of halfway vector h and the visibility of the facets from different view angles contribute to the surface reflection. These factors are represented through the functions D, F, and G: The normal distribution function D(h) is expressed by GGX [20] which is suitable for modeling the realistic light reflection, providing a subtle softer highlight than Beckmann distribution.
, where α is square of roughness(σ). The geometry function G describes the attenuation of the light caused by the self-shadowing of the microfacets. This self-shadowing is approximated through the Smith factor formulation [21]: , where k = σ 2 2 . The Fresnel function F simulates the light interaction with an angle of incidence, significantly related to the representation of the surface class, i.e. metals or nonmetals. The actual formula is complex and different for conductive (metal) or dielectric (nonmetal) material, and the Schlick's approximation [21] is widely used thanks to the inexpensive computational cost.
F 0 is the a proportion of the reflected light when a ray hits surface perpendicularly.

A. ARTIST-FRIENDLY BRDF MODEL
Commercial rendering applications or engines commonly adopt art-directable but not physically strict parameters of the traditional microfacet models. Our model uses 5 parameters for determining the surface reflective appearance, i.e., three for albedo color(A c ), one for metallic(m), and one for roughness(σ). While albedo and roughness have a range of 0.05 ∼ 1.0, our metallic parameter is either 0(nonmetal) or 1(metal). On the periodic table, elements are classified as either metals or nonmetals by their distinctive properties between these two elemental groups. Since most materials around us in our daily life are either metallic or nonmetallic, artist-friendly BRDF models used in modeling tools or game engines use a 0-or-1 metallic variable, which we use in this paper as well. We focus on the metallic properties that act a key role of incident light being reflected on a surface or being transmitted through.

1) Metallic Surface Property
Metals normally have high reflectivity over the most visible range of the wavelength spectrum of the incident light since the light wave do not penetrate the metallic surface. Only a small fraction of light may be absorbed or refracted, for instance, gold absorbs the blue and violet regions of the spectrum, producing the yellow color when illuminated with white light. This optical aspect can be represented by albedo color (6) that determines the wavelength distribution with respect to the angle of incidence according to Fresnel reflectance.
, where subscript c denotes the color channel. We consider the difference between reflected light and environmental source is the salient feature to infer the intrinsic color of a material.

2) Nonmetallic Surface Property
Most dielectric (nonmetallic) materials show very different reflectance depending on the angle of incidence. Wood, plastic, rubber, etc. reflect just around 4% of the light at common incident angles and the reflectance gradually increases up to 100% at the grazing angle. In addition, most dielectric materials have similar reflectance curves. Those characteristics are represented by approximating F 0 as 0.04 in the Fresnel reflection (6) for nonmetallic surfaces, where m = 0.
In contrast to metals, color representation is dependent on the diffuse reflection. Transmitted light into the interior of the surface experiences a decrease in velocity, leading to the refraction that varies with the wavelength. Equation (7) shows that the diffuse coefficient is proportional to the absorbed and refracted energy, colored by albedo for each color channel. Regarding the above, the full reflection is combination of the diffuse color representation and the specular by Fresnel effect. We believe that the performance of inferring the VOLUME 4, 2016 intrinsic color would be improved by obtaining separated diffuse or specular component for nonmetallic materials.

IV. BRDF ESTIMATION
There have been several researches to estimate BRDF parameters from a single image using convolutional neural networks [1], [2], [19]. The predicting performance of the BRDF learning is limited due to the ambiguity of the inverse rendering of parameters on naturally illuminated surfaces. The surface color can be interpreted as a result of both the intrinsic albedo of the material and the color of the light source. Since we have only a single input image without the lighting information, it is difficult to distinguish the color of the intrinsic material property from the colors of the light sources, especially for metallic materials. The learning accuracy could be improved with additional data to differentiate these factors.
Our method generates an additional input, an auxiliary map, for guiding the interpretation of the input material image using the conditional generative adversarial network(cGAN) [16]. For metallic materials, we generate an irradiance environment map for the auxiliary map. For nonmetallic materials, it would be more difficult to estimate irradiance environment maps than metals because many of incident lights are refracted inside of the material and scattered out. Learning BRDF parameters could be also limited by the ambiguity between the specular component and the diffuse component of the radiance. For nonmetals, separation of the diffuse reflection and the specular reflection could be an effective clue to reproduce the plausible BRDF parameters. Therefore, we propose to use a specular map, that shows only specular components of the input image, as an auxiliary map for nonmetals. In Section IV-B, we show the results of the experiments to validate this proposition.
Our method first estimates if the material in the input image is metal or nonmetal. For metallic materials, it generates an auxiliary map, which is an irradiance environment map for metals, and estimates BRDF parameters by learning material images as well as the generated auxiliary maps. For nonmetals, it follows the same process except that it uses a specular map as an auxiliary map. Fig. 1 shows the overall process of our method. Our approach to generate auxiliary maps resembles the network of Pix2Pix [16], which is designed for a general purpose image-to-image translation and for generating an image of a given condition. The generator and the discriminator are trained to correspondingly deceive and distinguish the generated image by minimizing the loss function between the generated image and the ground-truth image. For estimating the BRDF parameters of the input material, we intervene the cooperative model named estimator that takes the generated auxiliary map along with the material image as a concatenated input.
We assume that the material in each input image is spherical and homogeneous as shown in Fig. 1. It is also possible to convert general shape material images to spherical images. Georgoulis et al. [1] proposed a method to generate a spherical reflectance map from a single-material object of a known class such as cars. We start with the assumption that we obtain either spherical shape objects or general shape objects transformed into spherical shapes.

A. METALLIC CLASSIFIER
The first step of our method is to classify an input material into two categories, metals and nonmetals. The entire subsequent process depends on this classification since our network is trained for materials of each class separately. Fig. 2 shows our metallic classifier architecture, MetalNet. It is similar to VGG architecture [22] which starts from 3x3 convolutional layers followed by batch normalization, ReLU, and pooling layers. After these steps, three fully-connected layers are connected to the one-hot encoded output nodes that indicate the metallic parameter. MetalNet uses a binary cross-entrophy loss function (8), where t j is the true metallic parameter for the j-th sample and p j is its predicted metallic parameter. The network is trained by Adam optimizer with 0.0001 learning rate.

B. PARAMETER ESTIMATOR
In this paper, we estimate five BRDF parameters, that are metallic, albedo colors (RGB), and roughness. The basic approach to acquire the 5 parameters is VGG-like CNN taking the material image as input (CNN in Table 1). It consists of series of 3x3 convolutional layers followed by batch normalization and ReLU. Pooling layers are adopted in every other convolutional layers reducing the size of features by half. The network uses a mean squared loss for regression of 5 parameters and Adam optimizer with 0.0001 learning rate.
As mentioned in Section IV, our method provides an auxiliary map along with the input image to improve the estimation. As the auxiliary map, we use an irradiance environment map or a specular map. The irradiance environment map is a circular image showing the hemi-spherical irradiance environments providing a clue for the incident lights. The second and third columns in Fig. 6 show the true and generated environment maps respectively. The specular map is the specular component of the input image providing a clue for separating the specular components from the diffuse components. The second and third columns in Fig. 7 show the true and generated specular maps respectively.
In our method, the metallic parameter is estimated by MetalNet as described in Section IV-A. Our BRDF parameter estimator, BRDF-Net, estimates the remaining 4 BRDF FIGURE 1. Overall process: Our method first determines whether the input material is metal or nonmetal, using a metallic classifier. The metallic materials follow the upper network. First, it generates an auxiliary map, an irradiance environment map for metals, providing an additional input to the parameter estimator. Then, it estimates input material's BRDF parameters. The nonmetallic materials follow the lower network, which is same with the upper network except that the auxiliary map is a specular map. parameters, that are albedo colors (RGB) and roughness. Fig. 3 shows the BRDF-Net architecture which is similar to the architecture of MetalNet in layer connection, activation, and pooling. The major modification is a concatenated input taking a material image and an auxiliary map.
Following layers interpret the relation between the reflection on the material surface and the irradiance environment (specular reflection for nonmetals) and transform them into feature maps using the ReLU activation function. They continue to two fully-connected layers before they finally predict 4 BRDF parameters (albedo RGB and roughness) using the sigmoid activation function. BRDF-Net uses a mean squared loss function, L L2 as described in (9) and Adam optimizer with 0.0001 learning rate.
, where x, y m , and y p denote input images, true auxiliary maps, and true parameter vectors, respectively. We let E y be the estimator trained with the true auxiliary map y m . We have conducted experiments to determine the appropriate selection of the auxiliary map. In these experiments, we use three networks: a conventional CNN, BRDF-Net with irradiance environment maps, and BRDF-Net with specular maps. We use three datasets: metals, nonmetals, and mixed data that contain metals as well as nonmetals in even distribution. All the images in the datasets are provided with corresponding ground-truth environment and specular maps. We have experimented with possible combinations of networks and datasets. Table 1 shows the results. We have found that all the results by providing auxiliary maps along with the input images show smaller errors than the results by the conventional CNN. Metallic materials are best estimated by BRDF-Net with irradiance environmental maps. The best estimator for nonmetallic materials is BRDF-Net with specular maps. This results also support our hypothesis in Section IV. Therefore, this is the most promising combination for the BRDF parameter estimation. These two networks are followed by the metallic classifier described in Section IV-A.

C. MAP GENERATOR
Our estimator requires not only an input material image but also an additional environment or specular map to precisely predict intrinsic BRDF parameters. Instead of getting the ground-truth maps, we estimate auxiliary maps from the input material image and use it as an input of BRDF-Net. Our map generator, MapGen, adopts Pix2Pix approach widely known as cGAN which contains generative model competitively contest with discriminative model. The spatial feature resolution of the encoder is gradually reduced by half from 128 to 1, applying 4x4 convolutions followed by Leaky-ReLU with stride 2 for downsampling. The first 4 VOLUME 4, 2016 FIGURE 3. BRDF-Net: Architecture of the BRDF parameter estimation. Two 128x128x3 input images are concatenated into 6 channels and subsequently build similar layers as in Fig. 2 except that the last layer consists of 4 nodes with sigmoid activation.

TABLE 1. Estimation quality comparisons between the basic CNN and our
BRDF-Net with different types of input and auxiliary maps: Input data can be mixed (both metals and nonmetals), metal, or nonmetal. The auxiliary map can be env (irradiance environment maps) or spec (specular maps). The basic CNN is only tested with mixed input without auxiliary maps, and our BRDF-Net is tested with every combination. For both training and testing, we used the ground-truth auxiliary maps. Since metals have no diffuse component, we did not experiment metals with specular maps. p-loss (parameter loss) is the difference between ground-truth BRDF parameters and predicted parameters, and r-loss (render loss) is the difference between ground-truth material images and images rendered with predicted parameters.

Input
Auxiliary downsampling steps double the number of feature channels and the last 3 layers only reduce the number of feature channels. After a bottleneck layer, the process is reversed. To preserve the low-level information shared between input and output, the network requires skip connections generally used in U-NET [17]. Each skip connection concatenates all feature channels at layer i with those at layer n − i. The objective of the map generator can be expressed as L GAN (G, D) = CE(1, D(x, y m )) + CE(0, D(x, G(x, z))) (10) , where discriminator D tries to accept the true map y m on the observed condition of input image x whereas generator G tries to deceive D into thinking the map G(x, z) generated from a random noise vector z is a true map. In addition, L1 loss imposes the generated map not only deceive the discriminator but also imitate the true map in an L1 domain. The previous research [18] suggests mixing those two objectives with the empirically determined weight λ 1 of 100.
We also use Adam optimizer and follow the suggested learning rate of 0.0002 and momentum term β 1 of 0.5 for the training stability.

D. TRIARCHY GAN
The performance of BRDF-Net is enhanced by the additional input generated by MapGen. The goal of MapGen is to mimic the irradiance environment map of the input image so that it looks plausible. However, since the material image might be drastically changed by a subtle change of the irradiance environment, a less accurate map generator could distract the parameter estimator. Therefore, we propose a triarchy GAN architecture, triGAN, that cooperatively estimates auxiliary maps and BRDF parameters. In our architecture, the goal of the map generation is helping to estimate more accurate BRDF parameters in addition to generating more accurate maps. While the network is learning, it improves the auxiliary map as well as the BRDF parameters repeatedly. In our triGAN architecture, MapGen first generates the auxiliary map from the input material image. Then BRDF-Net takes the generated auxiliary map as an input in addition to the material image, and estimates BRDF parameters. The BRDF-Net loss (p-loss) in (13) is added to the MapGen total loss function (14) to improve the map generation. In this way, the map generator MapGen and the estimator BRDF-Net are trained cooperatively so that the generator creates an appropriate auxiliary map to aid the estimator in predicting better BRDF parameters and estimator loss helps the generator to create more helpful auxiliary map. Fig. 5 shows this process.
The objective function of BRDF-Net follows (13), which uses a generated auxiliary map instead of a true auxiliary map used in (9).
MapGen loss function additionally adopts the L 2 -loss of the BRDF-Net with weight value λ 2 = 1000 to adjust the scale.
. The extrinsic quality of auxiliary map is not meaningfully improved by the modified objective, but the estimator trained on the generated auxiliary maps produces better BRDF parameters shown in Table 3.

V. DATASETS
We require a large number of paired spherical material images and auxiliary maps (irradiance environment maps and specular maps). Since it is almost impossible to acquire real photos in highly elaborate installations and even pairs, we synthesize rendered images for the training and testing process with 100 HDR irradiance environment maps. HDR environment maps contain 360 • panoramic irradiance information in various indoor conditions. The total amount of our synthesized dataset is 40k, where 20k are metals and the other 20k are nonmetals. Each material image has a corresponding environment map and a specular map. The material images are rendered by uniformly sampled parameters of albedo(R, G, B each) and roughness in range from 0.05 to 0.95, following Disney BRDF approach described in Section III-A. For metals or nonmetals, the metallic parameter is determined in 1.0 or 0.0 respectively. Test database has 3.2k synthesized images and 10 real images of homogeneous spherical material photographed indoors. The HDR irradiance environment maps for the real photos are captured by DSLR camera compositing multiple exposure images. In addition, the test dataset is synthesized on different irradiance environments from the train dataset.

A. EVALUATION OF ENVIRONMENT AND SPECULAR MAP GENERATION
We evaluate the difference between a ground-truth auxiliary map on the test dataset and a corresponding generated map by MapGen. We compare images using RMSE and SSIM metrics with the maximum value = 1.0. Table 2 shows the qualitative comparisons with color space similarity(RMSE) as well as structural similarity(SSIM). Input images categorized as mixed are composed of both metals and nonmetals in same ratio. The generator trained on the mixed dataset predicts 5 parameters including metallic, unemploying the metallic classifier result. Notice that the generated results of categorized material dataset present smaller errors than the results of mixed material dataset. We first experimented a map generation with MapGen which uses a loss function in (12). Auxiliary maps are also generated in our triGAN architecture to guide a BRDF estimation, which uses a loss function (14) including BRDF parameter loss term. Each network is trained separately on irradiance environment maps and specular maps.
The map generator presents improved results on categorized material dataset. However, the quality difference between the results of MapGen and triGAN is insignificant on both RMSE and SSIM metrics. Nonetheless, the generated maps in triGAN can help the BRDF estimation better than the maps in BRDF-Net as described in the next section.  In this section, we evaluate our BRDF estimation method using the categorized test data. We calculated mean absolute error (MAE) between the ground-truth BRDF parameters and their estimated parameters for each mixed, metal, and nonmetal test set. We also tested two networks BRDF-Net and triGAN with environment maps as well as specular maps. Table 3 shows the results. As derived in Section IV, our approach using environment maps for metals and specular maps for nonmetals shows the best results both for BRDF-Net and triGAN. Unlike the BRDF-Net tested with true environment and specular maps as shown in Table 1, the BRDF-Net tested with the auxiliary maps generated by MapGen presents less accurate performance as shown in Table 3. Whereas BRDF-Net learns the features of true auxiliary maps, which we cannot get, triGAN learns the features of generated maps which can be obtained in our method. In triGAN, the estimator is trained for intrinsic features of generated maps, presenting better results in both parameter and image spaces.

B. EVALUATION ON SYNTHESIZED DATASET
We synthesized spherical material images with the estimated BRDF parameters using the physically-based rendering method described in Section III. We compare the synthesized results to the ground-truth material images using RMSE and PSNR (Peak Signal-to-noise Ratio). The structural similarity (SSIM) is practically identical for the entire dataset and results since they assume the same spherical shape and illumination. Table 4 shows our method also produces finer synthesized images than previous CNNs. Our final system first determines metalness of the input image, then it runs triGAN env for metals and triGAN spec for nonmetals. Based on the accuracy of MetalNet, 0.98, the total RMSE error of our system is 0.02937, which shows a preferable result than the error of triGAN for the mixed inputs.

C. QUANTITATIVE ANALYSIS WITH RESPECT TO MATERIAL ROUGHNESS
We notice that the quality of estimation is highly related on the material roughness. Extremely rough surface presents practically perfect diffuse, leading to be mistaken classifying its metallic even by human perception. The performance of the metallic classifier consequently affect the performance of the generator and estimator. Fig. 8 depicts both accuracy and binary cross entropy loss of MetalNet with respect to material roughness. Accuracy experiences marginal decrease as roughness becomes larger, whereas the loss shows moderate increase. The relatively large losses of rough materials draw question that whether we could obtain the similar accuracy even on real dataset. Since metallic classification is crucial to the performance of the consecutive generation and estimation, this question remains for the future work.

D. COMPARISON TO THE PREVIOUS WORK
We compared our method to the learning based approach by Georgoulis et al. [1]. We tested our method with their real dataset and Fig. 9 shows the results. The first column shows the input images used for the testing, and the second column shows the ground-truth images in different environments. The third column shows their result images and the fourth column shows our results generated from the input in the first column. The images in the second and the third columns use the same irradiance environment map. The images in the first column are taken in an environment different from the second and the third columns. Since we do not have the environment maps used in these images, we use different environment maps for generating our results in the fourth column. Our results show similar or better estimation even though they are rendered in different environments. The last two rows show the results of metallic materials. Since the colors of metals are more affected by the environment, the BRDF estimation might also be affected by the environment. In the third row, The approach by Georgoulis et al. estimates the bluish material, which might be affected by the bluish environment of the input image. Our approach estimates more accurate color and reflections by estimating the environment map as well.

E. RESULTS ON REAL WORLD MATERIALS
Our final test is conducted for the real world materials. Due to our limitation of input material shape, we collected 10 Input GT Georgoulis Ours spherical-shaped materials, 5 are metals and the others are nonmetals, in diverse surface roughness. Fig. 10 depicts the ground-truth input material images and synthesized images rendered by the estimated BRDF parameters. All materials are photographed in a same outdoor environment. We also captured the irradiance environment where the input photo is taken, and used it for rendering the results. However, the captured environment cannot be exactly same because of the temporal difference, the photographer and the tripod shown in the images, etc. For example, the photographer and the tripod appear only in the input and the clouds appear only in the rendered image. Since it is very difficult to create the strictly identical shape for rendering, we show the visual comparisons instead of quantitative evaluations here.

VII. CONCLUSION
In this paper, we have presented the two-way approach to estimate the realistic radiance properties of metallic and nonmetallic materials using the cooperative architecture of CNN and GAN. We have shown that a single material image of known shape is applicable for simultaneously generating an irradiance environment map and estimating material properties. We have also shown that the performance of estimation is improved by the assistance of generated features from GAN. The two different networks working in one dataset with exchange of their gradients can be explained that they have deeper layers in total, whereas they avoid gradient vanishing. To prove our hypothesis that the reflective behavior of metals and nonmetals determines the optimization approach for other parametric properties such as albedo colors and VOLUME 4, 2016 Input photo Rendered image Input photo Rendered image Dielectrics Metals roughness, we have synthesized large scale rendered dataset with various indoor environments and separated into the two groups. We applied a different pipeline of training networks for each group and compared the estimation quality. We confirmed that metals with environment maps and nonmetals with specular maps are appropriate combination for our pipeline. Our method shows meaningful improvement of estimation quality on the synthesized test dataset, and plausible synthesized results from the BRDF estimation of real world materials.
There are many studies that need to be improved in the future. We would like to capture the BRDF parameters from heterogeneous general-shape materials in uncontrolled photos. Our approach also has limitations for the BRDF properties such as anisopropic, tint, sheen, clear-coat, and spatially-varying properties. We would also like to study these properties for more realistic representations of materials.