Auto Complementary Exposure Control for High Dynamic Range Video Capturing

Recent trends in consumer electronics have led to a proliferation of high dynamic range (HDR) video acquisition and transmission studies. High-end video capturing systems increase the bit depth of each pixel to adapt to the high-contrast ambient light. However, the extended bit depth is pathetically meager compared with the far wider dynamic range of the incident light. What’s more, incompatibility issues for the subsequent compression and transmission system may be arisen along with the increased bit depth. In this paper, we propose an auto complementary exposure (ACE) algorithm for optimal exposure parameter estimation under Siamese Trigger (ST) mode to maximize the entropy of the output HDR image. With the proposed algorithm, neither the sensor nor other related hardware needs to be modified. Only the auto-exposure algorithm in the firmware needs to be updated. In the proposed ACE algorithm, every two consecutive images are treated as Siamese twins, and the exposure parameters for the twin images are calculated simultaneously. The target of the ACE algorithm is not to maximize the entropy of each image but to maximize that of the fused image. Compared to existing auto exposure or bracketed fusing algorithms, the proposed algorithm has three significant advantages. Firstly, no extra bit-depth is needed to embrace the details of the scene. Thus the image transmission system is not affected. Secondly, only two raw images are necessary for one output fused HDR image. The computation load is relatively low. Thirdly, the proposed method can fit video recording, which has a broader application prospect than HDR image capturing. Experiment results on a self-built database and publicly accessible database demonstrate that the algorithm works well under various types of scenarios. Furthermore, the experiment results on the real-time system show that the captured video image reveals many details under high dynamic scenes.


I. INTRODUCTION
The auto-exposure control (AEC) algorithm is critical to the camera system. Although it is mature in off-the-shelf cameras, most of them cannot work fine in high dynamic range cases. In these scenarios, the intensity of incident light varies up to 10 5 in different region [1]. For example, when the surveillance camera mounted on the cargo spacecraft shoots the target cross fixed in front of the space station, some parts of the docking target are too bright, caused by the intensive reflected sunlight. In contrast, other parts vanish in the background. As a result, it is impossible to find an optimal integration parameter for the whole image in the field of view.
To obtain a high-quality image in similar cases for standard cameras, researchers proposed multiple exposure fusion (MEF) algorithms. In MEF mode, the camera will get a series of images continuously with different integration parameters. The post-processing algorithms fuse these images to output a single high-quality image [2]. The natural defect born with the MEF algorithm is two folds. For one thing, the image information is entirely redundant in the images captured VOLUME 4, 2016 with the fixed set of integration duration. For the other, the total integration duration is too long. When moving objects present in the scene, extra computation is needed to remove the ghost effect. Therefore, the operational mode is just applicable for still image capturing, failing to meet video recording requirements.
This paper introduces a new Siamese trigger (ST) mode and proposes an auto complementary exposure (ACE) algorithm to solve the problem. The output fused image could achieve equivalent quality to the HDR image by the MEF technique, with only a complementary image pair. Working in ST mode, the camera tries to capture two shots for one output HDR image. One image is for dark regions, and the other one is for bright parts. The two images contain complementary information for the whole scene. The most critical problem with this ST mode is setting the optimal exposure time for the two images. We studied the incident light's statistical characteristic and the corresponding output image pixel and then proposed the auto-complementary exposure algorithm.
For algorithm design and verification, we construct a data set with five scenes. For each one, we captured thousands of images with different exposure parameters. The evaluation result on the test data set demonstrates that it could calculate two near-optimal values for integration. Furthermore, the fused image by the complementary image corresponding to the two exposure times has a higher entropy value than any single-shot image. Similar experimental performances were achieved on publicly available data sets. We also migrate the code into the camera system and confirm the algorithm's effectiveness by online testing.
The contributions of this work are as follows.
• A brand-new mesh-based auto complementary exposure scheme is first proposed in this paper. Unlike traditional auto-exposure algorithms, the proposed method calculates two complementary exposure parameters designed explicitly for image capturing under HDR scenes. • We build a database for complementary exposure control. The database is constituted of five scenes. Within each scene, thousands of images are provided. The database is of great value for exposure control algorithm design and verification. We will release the database by open access. • We propose an efficient auto-complementary exposure algorithm. Building on the duration-entropy model and working on the Siamese trigger mode, our algorithm can continuously output high-quality HDR video. The most fantastic value of the algorithm is that we do not need to modify the existing hardware. It could be easily integrated into any hardware setup. The rest of this paper is organized as below. We review the relevant algorithms in section II, and introduce the proposed algorithm in section III. As a rounded system, we import the duration-entropy model in section III-A, announce the Siamese Trigger mode in section III-B, and offer the exact details of the algorithm in section III-C. The experiment results on both offline and online camera platforms are shown in section IV. Then we conclude the work in section V.

II. RELATED WORKS
We did not find any complementary exposure algorithm in accessible digital libraries. Nevertheless, there are many research results in the closely related topic, including autoexposure control and multiple exposure image fusion algorithms. Here, we review representative works and introduce them in brief. For image capturing, we often use 'exposure' in the image-processing field, but 'integration duration' is more frequently used in the microelectronics field. In this paper, we will use the two phrases indiscriminately.
Auto exposure control (AEC) is a basic functionality in a camera system. In the 1960s of the last decade, the American Standards Association proposed the Additive System of Photographic Exposure (APEX) system [3] for digitized calculation of black-and-white film. The method takes the F-number, integration duration, mean luminance of ambient light, and camera sensitivity into consideration and formulates the exposure equation by log expression. The mathematical model is regarded as a basic prototype for exposure control. The AEC algorithms take the statistical data as input, calculate the optimal exposure parameter, and then configure the sensor registers iteratively until the output image converges to the best. For simplification in algorithm design, it is commonly assumed that the mean luminance of the output image is positively correlated to the exposure time [4]. Most AEC algorithms are designed base on this hypothesis. However, the simplified postulate can only fit the scenes under normal illumination conditions. When the dynamic range of ambient light is far beyond the photosensitive scope, e.g., harsh light or dark shadow area exists in the sensing region, the postulated condition is seriously inconsistent with the facts. Yuan et al. [5] improved the correlation function between the mean image luminance and the exposure time and used a nonlinear mapping function to obtain a better image. Su et al. [6] designed a finer convex and non-convex model to fit different environmental light conditions. For more complicated cases, the video recording system in an HDR environment has to consider the illumination variation along with time. For application specified cases with interested regions or objects, weighted statistical information [7] could be used for exposure parameter calculation. Aside from the regular AEC algorithms, some researchers also consider the impact of the exposure parameters on high-level image processing algorithms. They proposed some AEC algorithms for computer vision applications specifically. For example, Shim [8] and Zhang [9] claimed that the performance of a semantic comprehension algorithm is highly reliant on the intensity of local gradients. Hence the target of the AEC algorithm should be providing the image with the most outstanding value of gradient amplitude. Then they designed AEC algorithms based on this consideration. Mehta [10] further refined the algorithm steps by adding gain compensation and implement the algorithm on Zeus platform. Onzon [11] redefined the problem in HDR scenario specifically for driver assistance systems. They proposed a neural auto exposure method which bundled the exposure control and object detection in one unified end-to-end network. These new thoughts will further drive the performance of image-based applications.
When the dynamic range of the incident light expands a wide range, most auto-exposure algorithms fail to obtain a suitable parameter since the mean luminance value can hardly meet the presumed condition. However, in less severe cases that the dynamic range of the camera sensor is on the verge of the dynamic range of ambient luminance, the image quality could be improved by Tone Mapping Operator (TMO) [12] algorithms. The operators remap the image pixel value non-linearly to achieve pleasant subjective quality for both the areas with intense light and weak light. Since there are many papers on TMO algorithms, we do not introduce this topic in detail. However, interested readers can refer to some newly published books by Eilertsen et al. [13]. The author classified the mainstream TMO algorithms, introduced the origin and improvement direction, and evaluated the performance of each algorithm elaborately.
In addition to regular operators, some researchers also utilize Convolutional Neural Network (CNN) [14] to do the nonlinear mapping. Cai [15] used CNN to train the mapping function for image enhancement. Marnerides [16] proposed a multi-scale network called ExpandNet to learn an HDR image from a single low dynamic image. Siyeong [17] also proposed another interesting method based on a deep learning framework. He inferred several LDR images from one input LDR image. Each of them is with different exposure parameters and light conditions. Then the inferred LDR images are fused to produce an HDR image. Eilersten [18] augmented the existing database by simulating the HDR situation using saturation operation. They designed a hybrid autoencoder network, trained the parameters with the augmented database, and output an HDR image from the network. To facilitate the training process of the neural networks, Ouyang [19] designed a neural camera simulator. They mimic the scenes with different exposure parameters to obtain a series of image sets. The simulated images could be used for HDR image fusion. The deep learning network is powerful. However, the method is better fitted for postprocessing instead of the online adjustment.
TMO algorithms can do well only if the raw image contains sufficient information but didn't present well. When the dynamic range of the sensor is far from the range of the incident light, TMO algorithms fail predictably. In such cases, another mainstream technology is Multiple Exposure image Fusion (MEF) [20]. With this technique, a series of images with different integration duration is collected by camera hardware. The set of images are then fused to generate an HDR image. Mann [21] proposed the concept first in 1995. Debevec [22] observed that the pixel with midst luminance is more credible. He designed a hat function to weigh the co-located pixels from different images and obtains delightful results. Mitsunaga [23] analyzed the set of LDR images using signal processing theories and concluded that the weight factors for a distinct image should be correlated to their optical response function. Then he improved the weight allocation strategy of the algorithm.
The focus point of the common MEF technique is designing a better fusion algorithm. As to image acquiring, most high-end cameras have a preset working mode for a high dynamic range called bracketed exposure mode [2]. Working in this mode, the user can select one of the typical parameters for different light conditions. The camera will collect the set of images with these exposure parameters. Then, the ISP module in the camera will fuse the images and output one final HDR image. In recent years, with the rapid renewal rate of smartphones, the algorithm for portable devices is drawing increasing attention [24]. The MEF technique effectively enhances the image quality under HDR conditions. However, the current working mode also has some flaws. Since the set of images can only be captured sequentially on hardware, a default premise is that the field is frozen, including the light and the objects. Once the light changes fast or a moving object exists, there will be some ghost shadow in the output image. Some researchers try to suppress the ghost effect by more complicated post-processing techniques. For example, Ma [25] decompose the images into three channels, i.e., signal strength, local structure, and mean luminance. He fused each channel from different images separately, considering the structure consistency, and then synthesize the fused channels to obtain the final HDR image. Although the ghost signal is weakened with these kinds of methods, the computation load is much higher, which is still unsuitable for real-time applications. Li [26] and Qi [27] improved the Ma's algorithm by adding some extra pre-processing and post-processing modules. Li decomposes the input images into texture and cartoon components, does structural patch decomposition on the texture components, and combines the fused versions into one image. They add an optimization step to refine the final image quality. Qi's algorithm decomposes the input images into base layer and detail layer using a guided filter, which is similar to Li's algorithm, but more lightweight. Another research stream claims that the key for a better output HDR image is selecting a small exposure bracketing instead of collecting that many input images. For example, Wang [28] proposed an neural network to learn a reinforced agent for flexible exposure bracketing selection. They tried to find the minimum number but optimal combination of input images for a high-quality HDR fused output.
In recent years, application oriented exposure control ideas are quite popular, especially in the area of auto driving. The main consideration is that it is more important to detect interested objects than providing an image with good appearance. As a representative work, Onzon [11] proposed a neural auto exposure algorithm targeting object detection in highdynamic scenes.
In summary, the traditional auto-exposure algorithm can not solve the HDR video recording problem. The tone mapping operators work only if the logarithmic contrast of inci-VOLUME 4, 2016 dent light is in the range of image bit depth. However, the condition usually is not met in the HDR scenario. With the MEF technique, the camera takes too many images, which is redundant and time-consuming. In this paper, we solve the problem from another point of view. We focus on the parameter setting of the bracket. The core idea is that we may not need that many LDR images. To achieve the same HDR image quality, we only need two shots, as long as there is sufficient complementary information in the two images for the scene. The critical problem is how to find the best exposure time settings for reconstructing the final HDR image. We will describe the operating mode and autoexposure algorithm in detail in the following sections.

III. PROPOSED ALGORITHM
The target of an auto exposure algorithm is obtaining a highquality image by a single shot. The mission is almost impossible for most high dynamic range scenarios. The integration duration is neither excessively long for the bright region nor too short for the dark part. However, when we move the concern point out from the defect area, it could be found that most other regions are of good quality for most exposure parameter settings. Based on this observation, we do the opposite. We set up a long-time exposure parameter and a short exposure parameter for the shadowed region and lighted region, respectively. Then the camera snaps two images. We do not demand too much about the quality of every single image. The parameters are set only considering the two particular regions, leaving the normal areas behind. Since the two images contain enough texture information, we can attain a high-quality HDR image after fusion. In this section, we first introduce the duration-entropy model and the Siamese Trigger mode. Then we propose our patch-based splitting and classification algorithm and an auto complementary exposure algorithm based on the classification result. The framework of the proposed algorithm is shown in Fig.1. It is verified on a self-constructed database and publicly available database. Furthermore, We set up experimental hardware and test the algorithm on it for real-time performance evaluation. All the tests report good results.

A. DURATION-ENTROPY MODEL
The mean luminance of captured image increases monotonically over integration duration, which is unsuitable for numerical optimization. Studies have shown that the subjective quality sensed by the human visual system is correlated to the information entropy embodied in the image. Therefore, it could be taken as the foundation of an auto exposure control algorithm [29]. Rahman [30] proved that the entropy is an efficient criterion for parameter setting of auto exposure. The peak value of entropy matches the best integration duration theoretically and experimentally. Kim [31] also proposed an auto-exposure algorithm by weighting the gradient intensity with image entropy. The algorithm was applied in a robot vision system to improve the quality of captured images mounted on an autopilot system. The advantage of taking entropy as a basis is that entropy is a global statistical metric. The target function is approximately convex. Therefore, when calculating the exposure parameters using numerical optimization methods, one can achieve the optimal value in theory.
The entropy theory is usually used to measure the uncertainty of information. Let the n-dimensional vector P = (p 1 , p 2 , · · · , p n ) represents the probability of occurrence of events in set S, with 0 ≤ p i ≤ 1, Then the entropy is defined as (1).
For digital image I, suppose the probability of pixels equal to a given gray level l i is p i , then the image entropy could be expressed as H(I) = i p i log(p i ). H(I) reaches the maximum when every level's probability is the same.
We designed a set of experiments to find out the integration characteristics of sensors in high dynamic scenes. The camera platform is set up on the roof of a building, shooting towards the corridor of another facility under the sunlight. We selected a regular industrial camera, of which the host computer could adjust the exposure parameter register through the network. We set the initial integration duration as 1000µs, increase the parameter by 10µs, and end up at 105880µs. By doing this, we built a new database with 10489 images, which was named "outsideRoof". The entropy of each image in the set was calculated and illustrated in Fig.2a. The distribution property approximately obeys Poisson distribution. Taking out the image with the maximum entropy as shown in Fig.2b, we can see that the image is exactly the best image output by the auto exposure control algorithm. Experimental results show that the entropy metric could be taken as the basis for exposure parameter adjustment.

B. SIAMESE TRIGGER MODE
For a traditional auto-exposure control algorithm, the target is approximating the preset mean luminance value. A simplified Proportion Integration Differentiation (PID) algorithm demonstrating integration parameter variation along with time is shown in Fig.3a. The AEC algorithm adjusts the exposure parameter continuously and captures one image by a single shot. As mentioned before, we can not get a good image under the HDR environment.
The MEF mode in most high-end cameras prefers to use bracketed exposure parameters. Users select a specific type of scene from a set of modes by an interactive user interface with this mode. The style is mapped to a bracket of exposure parameters, usually three to seven values. The camera then captures the set of images correspondingly and fuses them to output an HDR image. The flaw in this model is two folds. For one thing, the preset fixed exposure time may not fit the ever-changing light environment. When snapped images fail to record every texture detail of the scene, the user can hardly obtain a good output HDR image. For another, there is a lot FIGURE 1. The framework of the proposed solution. The algorithm updates complementary exposure parameters consecutively. Each captured image is split into patches and then classified into over-exposed, under-exposed, and well-lit types. We feed the entropy variation feature of over-exposed and under-exposed patches into the exposure parameter estimation algorithm and calculate the optimal complementary exposure parameter. The optimal short exposure and long exposure parameters are configured into the sensor. Every neighbored image pairs are fused to output the high dynamic range image series.  of redundant information from the bracket of images, which will lead to a heavy computation load. Only a small portion of valuable pixels are needed for the final image. In other words, we can get an equivalent high-quality HDR image with less but critical input images. Sometimes, an extra unnecessary effort may be paid to remove the ghost effect when moving objects appear in the scene. Intuitively, the most effective and direct method to obtain a high-quality image in an HDR environment is selecting the optimum exposure parameter instead of developing a complex fusion algorithm. So in our opinion, the target should be finding the best set of exposure parameters with the least number of shots but contribute the most amount of information. Base on the observation, the proposed Siamese Trigger (ST) mode is shown in Fig.3b.
In ST mode, we shoot the scene twice for each output HDR image. The first shoot is configured with a short exposure time parameter (SEP), and the recorded image is labeled as Short Exposure Image (SEI). This is because we only concern with the detailed texture from bright regions of this image. The second shoot is configured with a long exposure time parameter (LEP), and the output image is labeled as Long Exposure Image (LEI), for which the textures from dark regions are concerned. The important thing is, the SEI should be shot firstly, which will help avoid the motion ghost problem.
As the simplified graph is shown in Fig.3b   differs from the traditional working mode in both images capturing and video capturing processes. In the image capturing mode, we get the SEI and LEI at t i s and t i l based on the estimated exposure parameters and output the image I i by fusing the two images. The proposed algorithm could cut down the interference factors imposed by redundant information from too many images. In video mode, we will shoot the scene with SEP and LEP alternatively and continuously. As shown in right part of Fig.3b, we get SEI images at t i s , t i+2 s , t i+4 s , and get LEI images at t i+1 s , t i+3 s , t i+5 s . The SEI image at t i s is fused with the LEI one at t i+1 l and output v i for any i. Then the LEI image at t i+1 l is fused with the next SEI image at t i+2 l , and output v i+1 for any i. In this manner, the output video frame rate is the same as the raw captured rate. The proposed algorithm is competent for HDR video recording, in which case the traditional mode often fails.

C. THE AUTO COMPLEMENTARY EXPOSURE ALGORITHM
The key point of the ST working mode is how to determine the optimal short and long exposure parameter for obtaining the best textured SEI and LEI image in the sunlit and shadowed region, respectively. We first split the image into patches, then classify the patches into over-exposed, underexposed, and well-lit patches. The well-exposed patches are neglected, while the over-and under-exposed patches are grouped into Short Exposure Group (SEG) and Long Exposure Group (LEG). Then we design algorithms for optimum parameters corresponding to these two groups separately.
We take out the "outsideRoof" data set and demonstrate the procedure as an example to express the idea clearly. As shown in Fig.4a, each image in the data set is split into 4 × 4 grids. The pixels in each grid are taken as a patch. We calculate the entropy of each patch for every one of the 10489 images. The variation of the 16 entropy values for co-located patches in all the images along with time is drawn in Fig.4b. The variation of the entropy for the patches has a great discrepancy in different positions from the figure. According to the variation characteristic, we classify the patches into overexposed, under-exposed, and well-lit patches. The labeled patches for high and low lighted regions are illustrated in Fig.5a and Fig.5b. After assembling the patches with the same labels, we draw the entropy variation curve of the SEG and LEG. The two curves are the red and blue solid lines in Fig.5c. It is easy to find out that the curve roughly follows the Poisson distribution function. For simplicity, we choose the Gaussian function instead. By fitting the entropy variation with the Gaussian model, we can obtain the fitted curves as shown in green and orange dotted lines corresponding to SEG and LEG, respectively. The model parameters could be extracted and then fed for the algorithm design.

1) Sample Set Constitution
Suppose we have obtained T images after T times of exposure under HDR scene. Each image is split into M rectangle patches. We calculate the entropy of each patch and note the one captured at time t located at position m as E m,t . With fixed m, the series of paired values constitute one sample of the exposure-entropy curve. The sample set is noted as {(t i , E m,ti )|i = 1, · · · , T } Which is hypothesized subject to N (u m , v m ) curve, in which u m and v m are mean and variation. We can calculate the parameters by solving (2).
arg min In the least square sense, we obtain Approximately, we set a m, Combining all the equations of the N images, we get Equation (4) could be simplified as E m = TA m , then the least square solution of Collecting all the estimated parameters from M rectangle patches, we can obtain M samples, which are denoted as S = {d j |d j = (u j , v j ), j = 1, · · · , M }. d j is a two dimensional vector, for the estimated mean and variation separately. The set S could be used for illumination condition classification.

2) Illumination Condition Classification
Taking the set S = {d j |d j = (u j , v j ), j = 1, · · · , M } as input, we try to classify the patches into three classes, i.e. over-exposed, under-exposed, and well-lit patch. If only the patches are classified into correct groups, we can estimate the optimal exposure parameter for each group. The classification task is accomplished based on Gaussian Mixture Model (GMM). Firstly, we introduce a hidden variable γ j,k , signifying that the jth sample belongs to the kth model. γ j,k ∈ {0, 1}, 1 ≤ k ≤ 3, and k γ j,k = 1. p (γ j,k = 1) = π k indicates the probability of sample belonging to class k. Then the expanded data set is (d j , γ j,1 , γ j,2 , γ j,3 ). The likelihood probability for the expanded data set is The target is finding an optimal set (µ * , Σ * , π * ) to maximize p(d, γ | µ, Σ, π). As well known, we only have d j in hand, the γ is unknown to us. A good solution to this problem is the EM algorithm. At Expectation step, we do not try to maximize p(d, γ | µ, Σ, π), which is hard to differentiate. Providing an initial parameter set (µ 0 , σ 0 , π 0 ) or a previously updated parameter set (µ n , σ n , π n ), we will maximize the Q function defined as (9) instead.
Q (µ, Σ, π, µ n , Σ n , π n ) =E γ [ln p(y, γ | µ, Σ, π) | S, µ n , Σ n , π n ] We follow the formulation of the Q function defined by Dempster [32]. The in-depth mathematical derivation of the formulation is beyond the scope of this article. Here we only used the result of the paper and did not expand the theory. The readers are encouraged to refer to the classical EM paper in [32]. In (9), the E (γ j,k | d j , µ n , Σ n , π n ) is the estimation of γ based on the result of nth iteration.
At the Maximization step, we calculate the updated parameters by maximizing the Q function as (µ n+1 , Σ n+1 , π n+1 ) = arg max Q (µ, Σ, π, µ n , Σ n , π n ) In E-step, we take the estimated µ, Σ, π parameters of each Gaussian model as input, and use the Q function specified in equation (9) to estimate each sample's class label γ . In M-step, we take the updated class label γ as input, and use equation (12) to (14) to update the more accurate parameters µ, Σ and π .
When the iteration converged, each sample in the set S is assigned a class label, indicating which one of the three classes the patch belongs to. At the same time, we got the mean exposure value for over-exposed and under-exposed patches. The algorithm is carried out in Alg.1. Since the general Gaussian Markov Model and EM algorithm is well known to the community, we do not provide unnecessary details about the deduction of the algorithm. Interested readers are encouraged to find the original literature for better understanding.

3) Initial Parameter Estimation
The EM algorithm is sensitive to initial parameters. Good initial values do help for convergence. Since the samples in S have specific physical implications, we can specify meaningful initial parameters. The patch entropy is highly correlating to the richness of texture and light intensity based on the definition in (1). For a clean patch with few textures, the entropy has few changes with illumination. Therefore, the textureless patches are likely to have greater v. When the v is small, the samples could be roughly classified by u. Small u corresponds to high light regions, while large u results from low light areas. Assuming that the maximum and minimum number of the M mean values are u max and u min , we setū = (u max + u min )/2. Then we give an example set of the initial parameters in (15). In the equation, we use a weighted average of the maximum and minimum mean values to estimate the µ 1 and µ 3 . The µ 2 is set the same as µ. Then we set the σ 1 and σ 3 as (u max − u min )/4 and set the σ 2 as (u max − u min )/2. The purpose of setting meaningful initial values is to avoid converging to a local minimum and speed up the convergence rate. We do not try to dig deep theoretically.

4) Optimal Sample Exposure Parameter Selection
With the algorithm proposed in previous sections, the optimal short exposure parameter (SEP) and long exposure parameter (LEP) seem to be ready in hand. However, the prerequisite is that we could obtain a group of sample images with known exposure parameters. Since the auto-exposure algorithm has to be fast response, there is no time to collect a big data set for real-time parameter calculation. We have to find a good sparse set of images for parameter estimation, as shown in the left dotted box of Fig.3b. Then, we can utilize the integration property of the sensor to cut down the total number of sample images while retaining the performance.
The mean luminance of a captured image is a monotonic function of integration duration. This property is generally used in auto exposure algorithms. Here we take advantage of this property to reduce the number of samples. An idea in mind is that the sample points is desirable to be uniform while covering sufficient range.
When N − 1 images are fed to the proposed algorithm procedure, the luminance value of the images constitutes a set l T = {l k |1 ≤ k ≤ N − 1}. According to the modulation transfer function of the sensor, the functional between integration duration and image luminance could be modeled as a zero-mean symmetrical quadratic function. Assuming the coefficient of quadratic term is c 1 , and the constant bias is c 0 , then the model is formulated as (16). In the equation, c 1 and c 0 are two coefficients that need to be fixed.
At the very beginning, we set an integration duration t s1 empirically, and captured an image. The mean luminance is l s1 . After that, a second parameter l s2 ∈ l T could be find out by (17), which has largest distance to l s1 , and a third target luminance value l s3 ∈ l T which satisfying (18) could also be fixed.
Setting l s2 and l s3 as target, we compute two integration parameters t s2 and t s3 with the simplest linear mapping algorithm [4]. By configuring the sensor with the two parameters, we retrieve two images. The real mean luminance values arẽ l s2 andl s3 .
Collecting integration duration T R = [t s1 t s3 t s3 ] T , the corresponding real mean luminance value l R = [l 2 and the coefficient vector c = [c 1 c 0 ] T , we can calculate the optimal coefficient by solving T R = [l R 1]c defined in (19).
Substituting theĉ into model (16), we get the integration duration parameter for each target image in l T . By updating T R and setting the updated integration duration, we can capture the new images I R = {I i |i = 1 · · · N }. T R and I R will be fed into the proposed algorithm to calculate the complementary exposure parameters.

5) Auto Complementary Exposure Algorithm
The proposed auto complementary exposure algorithm is summarized in Alg.2. Two initial luminance values are calculated upon the preset bracket. After images are captured, we get the real value. Then the model coefficientĉ could be obtained. After that, we estimate the initial parameters µ, σ and π. By executing Alg.1 we get the final complementary exposure parameters.
The ultimate target of the ACE algorithm is acquiring the optimal gemel exposure parameters. We use a fixed number of input patches and produce a settled number of parameters. All the steps in Alg.2 is linear except the inner call to Alg.1. It is well known that the theoretical time complexity for each iteration of the EM algorithm is linear in both sample size and the number of clusters. However, as the number of iterations is undetermined, it is hard to estimate the overall time complexity of Alg.1. From a practical point of view, the time complexity of our algorithm is scalable. With the carefully set initial parameter, we can obtain satisfactory results of a fused HDR image. The variation threshold of the executed EM algorithm could also be adjusted to balance the image quality and the computation load.  (17) and (18) 2: Calculate the exposure parameter t s2 and t s3 , configure the sensor, and calculate the mean luminancel s2 andl s3 3: Estimate the model parameterĉ with (19) 4: Withĉ, calculate exposure parameter for luminance l T 5: Trigger sensor by T R to acquire real image sets and I R 6: Calculate the estimated parameters µ, σ and π by (4) 7: Executing Alg.1 with T R , I R , µ, σ and π to find the optimal parameter sets. 8: Output t s = µ 1 , t l = µ 3

IV. EXPERIMENT RESULT
We conducted three sets of experiments to verify the proposed algorithm. The first one is the validation study. This experiment is actually done along with the algorithm design. We set up a test bench and collected sufficient experiment data to verify each part of our idea. The target of the second track is to ascertain the effectiveness of the proposed algorithm on a publicly accessible database. Last but not least, we test the algorithm on an actual image capturing system. The hardware platform captures images, the ACE algorithm calculates the exposure parameter and configure the image sensor. At the same time, the post-processing functions fuse the complementary images with every link in real-time. The objective of the last track is to test the performance of our proposed method. The topic of this paper is auto exposure control, not HDR image fusion. Since the proposed method is brand-new, we cannot find other proper existing solutions to compare. The experiment set is precisely designed to verify the effectiveness of the proposed method.

A. VALIDATION STUDY 1) Test bench setup
We set up a test bench for algorithm design and verification. The test bench is mainly constituted by two parts, i.e., the camera and host computer. The camera is an ordinary industrial camera bought from a web store, as shown in Fig.6a. The image resolution is 2048 by 1536. The advantage of this camera is that all the parameters could set by the USB cable. We fixed the focal length and aperture of the lens group. At the same time, we disabled the auto-focus, autogain, and other image sensor pipelines and only changed the exposure parameter. The host computer is a laptop. We develop software to transfer the exposure parameters and store the captured image into the disk.
We collected one group of images named 'outsideRoof' for algorithm design and analysis with the settled environment. And collected other four groups of the image set named 'outsideLab', 'bicycle', 'insideClassroom', and 'satelliteModel' for algorithm verification. The image capturing equipment is a commonly used three-megapixel industry camera. The device is powered by a USB line used both for working mode configuration and image data transmission. We developed software on the host computer for consecutive exposure parameter configuration and successive image capturing. The images captured with specified exposure parameters are transferred to the host computer. The parameters for each image are recorded. Then we obtain a series of images that are captured with uniformly increased integration duration. The example working bench is shown in Fig.6. The initial integration parameter, increasing step, and the maximum integration parameters are shown in Table.1.
The 'outsideRoof' data set is collected outside the corridor at noon under bright sunshine. The target region is the corridor inside a building. Since the sunlight is strong, we set up a small increasing step for exposure. This image set is used for analyzing the statistical property of the image under high dynamic environmental light. The test results will provide sufficient clew for algorithm design. The 'outsideLab' and 'bicycle' data set are captured at the ventilation aisle on the first floor. The 'insideClassroom' is for the scenario of taking a photo of the scenery outside the window on a sunny day. We also mimic the space camera scenario, in which case a glaring flashlight sheds hard light on the satellite model. Two data sets are collected in this case, and the step size is 400µs and 1000µs correspondingly. We union the two sets and named 'Satellite'.  outsideRoof  1000  10  105880  10489  outsideLab  1100  100  99900  989  bicycle  1100  100  78900  779  insideClassroom  1300  300  299800  996  satelliteModel1  1400  400  601000  1500  satelliteModel2  602400  1000  999400  398 2

) Verification results
The validation test is applied to each image set. During testing, one parameter and the corresponding image are randomly selected from the data set and fed into Alg.2. The output parameters are SEP t s and LEP t l . By fusing the two images collected with the two parameters, we output a final HDR image. The simplest linear average weighting method is chosen to avoid the impact caused by the fusion algorithm. We use the image entropy metric for quantitative evaluation. The bit depth of image data is 8, and target luminance for collection is l T = [30,55,80,105,130,155]. The purpose of the preset values in bracket is to make the data uniformly distributed. According to Alg.2, the number of total samples for each set is 8. One intuitive question on the proposed algorithm is whether eight sample images are sufficient to achieve a good estimation. We design an experiment to test the bias. By executing the Alg.2, we get 8 images. Each image is split into 4 × 4 blocks. We classify the blocks into SEG, LEG, and well-lit regions. The well-lit regions are omitted. The group entropy of SEG and LEG are calculated separately. The red and blue solid lines in Fig.7 are the entropy lines calculate from SEG and LEG. For easy comparison, we also fit two Gaussian style lines with all samples. The fitted curve is labeled with green and orange dotted lines in Fig.7. The similar Gaussian curve fitted by the eight sample parameters and images embraced in the bracket in Section.III-C4 is plotted as yellow and brown dash-dotted lines. It is easy to discriminate that the optimal short exposure time and long exposure time calculated by Alg.2 is quite close to the ground truth. Since the variation of image entropy with the integration duration is neither smooth nor convex, and the image entropy is quite related to the context, we can only estimate the theoretical optimization value using the GMM model. Since the curve in Fig.3 is non-convex, different initial exposure parameter may result into dissimilar final t s and t l . It is necessary to evaluate the degree of influence. We extract 80 samples from the 'outsideRoof' data set with equal distance to test the impact. By selecting any one of the 80 samples as the initial exposure parameter, we executed the algorithm and tested the convergence. The test result is shown in Fig.8a. The horizontal axis is SEP, and the vertical one is LEP. The point marked as '+' is the initial exposure parameter. Since the initial point is a single shot, they have equal value at the x-axis and y-axis, i.e., located at the diagonal line. The point marked as red 'o' is the output parameter by the algorithm. The horizontal and vertical coordinate value is the optimal SEP, and LEP calculates by our algorithm. We link the initial and the final point pair. It is apparent from the experiment result that, despite the initial exposure parameter, the final SEP and LEP clustered into a compact region. Admittedly, there is some small variation among the final points from the different initial points. The reason is two folds. On the one hand, with different initial exposure parameters, the mean luminance of the captured image is biased relative to the target. On the other hand, the calculation result of (15) varies with the different input images. In Fig.9, we show the entropy of fused image with the algorithm. For better visualization, we firstly plotted a 3D mesh. In the 3D mesh, the x and y coordinates correspond to SEP and LEP, respectively, and the z coordinate is the entropy of the fused image. We take the images captured with LEP and SEP as input and calculate the entropy of the fused image. Then we map the LEP, SEP, and entropy to the three coordinates. By projecting the mesh to the x-z plane, we get the figure Fig.9. We also plot the entropy of a single-shot image with the exposure parameter corresponding to the x coordinate. The curve is shown as a solid purple line. For the red triangle point, the x-axis is its initial exposure parameter, and the y-axis is the entropy of the fused image. The black dotted line indicates the highest entropy value achievable by single shot.
Three conclusions could be drew from Fig.9. First, the initial exposure parameter has limited influence on the SEP and LEP result. The final entropy of the fused image is almost the same. Second, no matter the initial exposure parameter, the entropy of the fused image with a complementary exposure algorithm is greater than a single-shot image. And third, our algorithm could always achieve almost the optimal fused image quality comparing to fusing any two images in an ergodic manner from the same scene.
We listed the detailed entropy value in Table.2. 'AE' is the optimal exposure parameter for a single shot. The corresponding image entropy is shown in column 'AEE', which is short for Auto Exposure Entropy. The maximum entropy of the fused image by fusing any pair of images in an ergodic manner is listed in the 'Ergo' column. The 'CM' is the entropy of the fused image, for which the two images are captured by t s and t l respectively. It could also be seen that the 'CM' is quite approximate to the 'Ergo', which is far higher than 'AEE'.  .
The experiment results on 'outsideRoof' are shown in Fig.10. (a) and (b) are the two images captured using parameter t s and t L . The image with maximum entropy by single shot among Fig.9 is shown in (c). The image fused by (a) and (b) is shown in (d). To avoid the perturbation of fusion algorithm, the simplest linear combination algorithm is applied. By comparing (c) and (d), it is easy to find that the synthesized image fused the texture information from the SEI and LEI. The texture detail at the wall mullion, the satellite model, and the shadowed regions are more richer.

B. TESTING ON PUBLIC DATASET
We also tested the algorithm on publicly accessible HDR image sets. There are lots of data set for HDR image tests. VOLUME 4, 2016 However, most of the data set are targeted for algorithm verification. Only 3 to 5 images are provided for a specific view. For example, the RASCAL data set [33] from The Columbia University in the City of New York has only 5 or 6 images. The SIG17HDR image set by UCSD has 74 scenes. However, only three images are published for each scene. The Multi-exposure image stacks by Cambridge University only has five images for each scene among all the 36 views. The HDR data set from SFU has 105 groups of images. Each group consists of images with different exposure parameters. Unfortunately, the purpose of the data set is to verify the consistency of different color channels under varying exposure parameters. For most scenes, there are only 3 to 5 images. The best option for testing a complementary exposure algorithm is the HDRPS image set [34] from RIT. The HDRPS has 105 groups of HDR images. About nine to sixteen images for each scene are provided. Although nine images are not sufficient for evaluating the performance of the proposed algorithm, it is enough for verifying the effectiveness of optimal SEP and LEP. For any group, we extract eight images as the result of the algorithm proposed in Sec.III-C4 and verify the effectiveness of other parts. The experiment results of all the scenes are listed in Table.3. The first column of the table is the name of the scene. 'AE max.' column is the maximum image entropy for each scene. 'Alg. max[S,L]' is the maximum fused image entropy we achieved by the proposed algorithm, and 'Ego. max[S,L]' is the maximum entropy of fused image by fusing any two images. The 'S' and 'L' is the sequence number of the image used for fusing. Experiment results show that the image entropy of fused images is higher than single-shot images. For 85 of 105 groups, the 'Alg. max' is the same as 'Ego. max'. For the other 20 groups, the two images selected by the algorithm and by ergodic calculation have one common image. The sequence number of the other image is adjacent. We analyzed the data of these cases in depth. The algorithm converges to a correct point. The difference is caused by discrete sampling bias, which is unavoidably constrained by the data set. We take the 'Lab Window' scene as an example for performance analysis. Each image in the group is split into 4×4 blocks. The blocks are classified as SEG, LEG, and well-lit regions. For the SEG and LEG regions, we estimated the mean of the Gaussian model, and the result is shown in Fig.11. In the figure, the definition of SEG and LEG is the same as before. Since there are only nine images, we linked the nine patch-entropy points as red and blue solid lines. The Gaussian modeled curve by Alg.2 is shown as yellow and brown dotted lines. The Gaussian function could well model the property of SEG and LEG. The optimal exposure time is also approximate the peak value. By picking up any pair of images from the 'Lab Window' set and calculating the entropy of the fused image, we can plot a mesh grid. The x and y-axis in the mesh figure are the sequence number of two shots, and z is the entropy of the fused image. When the mesh is projected to the x-z plane, we get the figure Fig.12. In the figure, the point of junction in the projected mesh corresponds to all possible entropy points. The purple line is the entropy of every single image. For clarity, we marked the maximum achievable entropy as a black dotted line. The image pair selected by our proposed algorithm and the entropy of the corresponding fused image is marked as a red triangle. It is reasonable that the image entropy of most fused images is higher than singleshot images. But the critical point is that the two images selected by the proposed algorithm could achieve almost the maximum fused image entropy. The sequence number of SEI and LEI images voted by our algorithm are 3 and 8. The entropy of the combined image is 7.4849. The maximum entropy of fused images by any possible pair is 7.5029, and the corresponding sequence number of SEI and LEI images are 3 and 9. Although the sequence number differs slightly, the fused image entropy is almost the same.
The selected and output images are shown in Fig.13. In the figure, (a) and (b) are the optimal SEI and LEI selected by the proposed algorithm. Fig.13c is the fused image by the selected image pair. If we fuse any possible pairs from the  data set and figure out the image with maximum entropy, we get (d). There is nearly no difference between (c) and (d).

C. ONLINE TESTING
The real working platform of the ACE algorithm will be the ISP processor. The micro-controller calculates and updates the exposure time in real-time and configure the exposure parameters of the sensor chip. To further verify the effectiveness of our algorithm in real-time processing, we transplant our code to the online verification platform. The hardware platform is the same as we used for validation study. However, we develop a new software to do calculation and real time control. We compared the quality of the images captured with the default auto exposure algorithm and our ACE algorithm during image capturing. It should be noted that the image output by our algorithm is the fused version, and for clarity, we selected the simplest average weighted fusing algorithm. During online test, the captured images are transferred to the host computer through USB cable. The algorithm running on the host calculates the exposure parameter for the next image and configure the sensor. Although the throughput is not sufficient for real-time image transfer, it is enough for algorithm verification. The experiment result is shown in Fig.14. In the figure, (a) is the best image by the default auto exposure algorithm. (c) and (e) are the two complementary images captured by the proposed algorithm. (g) is the fused image using (c) and (e). (b), (d), (f) and (h) are the edge texture parts extracted by the above images correspondingly.
We mask the edge texture image with different colors for better visualization. In (d) and (f), the red and blue masked regions are textured and textureless regions. It is easy to find that the sky and lightly reflective glass region at the upper left corner of SEI image Fig.14c has abundant texture. But the front door of the library is dark. For the LEI image, the upper region has over-exposed. However, the glasses on the exterior wall are clear. Even the reflective glass regions have clear textures. The fused image is richly textured in all fields of view. As to the single-shot image 14b, the green masked region has overexposed when the glass exterior wall is not well exposed.

V. CONCLUSION
The image quality in HDR mode is the focus of most consumer electronics in recent years. As the demand for highquality video increases, the related techniques like compression, transmission, and display of HDR video make significant progress. HDR has been embraced in almost all newly published video coding and display standards. However, we still need professional equipment to record the HDR video. As a result, only a fraction of high-end companies could produce such kind of HDR videos. One of the most im-