On Box-Cox Transformation for Image Normality and Pattern Classification

A unique member of the power transformation family is known as the Box-Cox transformation. The latter can be seen as a mathematical operation that leads to finding the optimum lambda ({\lambda}) value that maximizes the log-likelihood function to transform a data to a normal distribution and to reduce heteroscedasticity. In data analytics, a normality assumption underlies a variety of statistical test models. This technique, however, is best known in statistical analysis to handle one-dimensional data. Herein, this paper revolves around the utility of such a tool as a pre-processing step to transform two-dimensional data, namely, digital images and to study its effect. Moreover, to reduce time complexity, it suffices to estimate the parameter lambda in real-time for large two-dimensional matrices by merely considering their probability density function as a statistical inference of the underlying data distribution. We compare the effect of this light-weight Box-Cox transformation with well-established state-of-the-art low light image enhancement techniques. We also demonstrate the effectiveness of our approach through several test-bed data sets for generic improvement of visual appearance of images and for ameliorating the performance of a colour pattern classification algorithm as an example application. Results with and without the proposed approach, are compared using the state-of-the art transfer/deep learning which are discussed in the Appendix. To the best of our knowledge, this is the first time that the Box-Cox transformation is extended to digital images by exploiting histogram transformation.


INTRODUCTION
It is not uncommon that image-based computer vision algorithms start with a pre-processing phase whereby images are transformed to prepare the data for further processing. Image transformation may embody contrast stretching of intensity values, histogram equalization or its adaptive version, intensity normalization, point-wise operation (e.g., gamma correction), etc. The colours present in an image of a scene supply information about its constituent elements. However, the richness of this information depends very much on the imaging conditions, such as illumination conditions [1] which may significantly degenerate the performance of a variety of computer vision and pattern recognition algorithms.
To eradicate any confusion, we stress -in what follows-that by the term gamma correction, we mean the power-law adjustments performed to improve the quality/contrast of images. Gamma correction, likewise, arcsine transform, are all members of a class of transformations known formally as power transformation which also encompasses the so-called Box-Cox transformation (BCT), a theme that forms the core of this work. The BCT, as a versatile technique, is mostly popular within the statistical and information theory communities. It aims at improving normality of a distribution and equalizing variance (reducing heteroscedasticity) to meet statistical assumptions and improve effect sizes for quantitative analysis of data [2].
Traditionally, BCT is applied to a vector (one-dimensional data) but, to the best of our knowledge, it has not been extended to matrices exhibiting adjacency correlation such as images except in Bicego and Baldo [3], whose work, unfortunately, provides only a cursory overview of the subject. Besides, the generalization to a d-dimensional set of points that they advocate for which typically consists in performing d 1-dimensional transformations, one for each direction of the problem space, is time consuming and not feasible in our case. The other work is that of Lee et al. [4] who exploited the parameter lambda (λ) to further extend the classical mixtures expectation-maximization segmentation to allow generalisation to non-Gaussian intensity distributions for medical MR images.
The rationale behind our approach is not in quest of gaussianity, since images do not always conform to unimodality, but rather to enhance images and boost classes separability. Of the many techniques currently in vogue for image enhancement, we advocate for the use of our approach both in tandem with machine learning and as a general tool for image enhancement. This work is motivated by the scarcity of automatic algorithms that fine-tune the parameter λ in gamma power transformation for image enhancement. Power transformations are ubiquitously used in various fields, however, estimating proper values for λ remains problematic. For instance, Fattal [5] proposed an algorithm that returns the atmospheric light colour (orientation and magnitude) and stated within the implementation that gamma correction might help orientation recovery where he suggested setting it to 1.5. In Ren et al. [6], the authors recommended in their implementation that if the input is very hazy, one can use a large gamma value but they did not reveal the mechanism. In Berman et al. [7]'s implementation, they borrowed gamma values of specific images from Fattal [5]. In Meng et al. [8]'s implementation they set λ = 2 as a regularization parameter.
MATLAB's built-in function, imadjust 1 (ver. 2019a), defaults λ to 1 (linear/identity mapping) to dictate the shape of the curve describing relationship of input and output values. MATLAB lets fine-tuning it to the user's arbitrary guess, though the software highlights generic intuitive guidelines to set λ without delving into any insights on how to estimate that automatically. This was partially the impetus behind this study.
In a nutshell, the contributions of this paper can be summarised as follows: • Extending the statistical method, BCT, to digital images to establish informed statistical inference on how to estimate image transformation.
• Suggesting a simple yet robust, efficient and inexpensive image enhancement technique that is data dependent (i.e., adaptive) and parameter-free.

•
Refining current state-of-the art colour pattern identification algorithm.
The remainder of the paper is apportioned to the following sections: Section II discusses the related work, Section III reviews the BCT algorithm, Section IV discusses the application of BCT to digital images (termed henceforth BCI), Section V brings about the experimental set-up as well as the data sets which are utilized in this study. Section VI is devoted to results and discussion and Section VII concludes this paper.

II. RELATED WORK
Herein, we list some of the existing and commonly used image enhancement techniques.
Contrast limited adaptive histogram equalisation (CLAHE) [9]: In response to the drawback of global histogram equalisation in giving unfavourable results, the CLAHE operation was proposed with two major intensity transformations. The local contrast is estimated and equalized within non-overlapping blocks in the projection image, subsequently, the intensities are normalized at the border regions between blocks through bilinear interpolation. The name contrast-limited refers to the clip limit, which is set to avoid saturating pixels in the image [10].
Successive means quantization transform (SMQT) [11]: This is an iterative method that can automatically enhance the image contrast. It is capable to perform both a non-linear and a shape preserving stretch of the image histogram.
Brightness preserving dynamic fuzzy histogram equalization (BPDFHE) [12]: This method enhances images by means of calculating fuzzy statistics from image histogram and is built on a prior work. sRGB and Adobe RGB 1998 (aRGB) standards [13]: These are the two standards to transform linear RGB values by applying gamma correction. Lambda is set to a static value in both 1 https://se.mathworks.com/help/images/ref/imadjust.html#budqw0o-1gamma transforms as follows: • sRGB is a colour space conceived by HP and Microsoft cooperatively in 1996 to be used on displays, printers, and on the Web. The gamma correction to transform linear RGB tristimulus values into sRGB is defined by the following parametric curve: where a = 1.055, b = -0.055, c = 12.92, d = 0.0031308, λ = 1/2.4 and u is the R, G or B colour value.
• The aRGB is carried out using a straightforward power function: ( ) = , ≥ 0 (2) where λ = 1/2.19921875. Adaptive Gamma Correction with Weighting Distribution (AGCWD) [14]: Huang et al., presented an automatic transformation technique that improves the brightness of dimmed images via gamma correction and probability distribution of luminance pixels.
Weighted Variational Model (WVM) [15]: This algorithm estimates both the reflectance and the illumination from a given image whereby a new weighted variational model is imposed for better prior representation. The authors claim that their model can preserve the estimated reflectance with more details. However, when we tested it on square matrices of size 2 11 x2 11 , it took 76.59 sec to converge on average using the authors' original implementation.
Low-light Image Enhancement (LIME) [16]: The algorithm proposes a simple yet effective low-light image enhancement method where the illumination of each pixel is first estimated individually by finding the max (R, G, B). Subsequently, it refines the initial illumination map by imposing a structure prior on it to produce the final illumination map. Finally, the enhancement is achieved guided by the obtained illumination map.

III. THE BOX-COX TRANSFORMATION (BCT)
BCT is a parametric non-linear statistical algorithm that is often utilized as a pre-processing channel to convert data to normality, it is credited to Box and Cox [15]. The method is part of the power transform techniques whose quest is to find the parameter lambda, λ, by which the following log-likelihood is maximized.
(3) where ��� is the sample average of the transformed vector.
There are different attempts to modify this transform, such as those of John and Draper [16] who introduced the so-called modulus transformation and Bickel and Doksum [17] who provided support for unbounded distributions, nevertheless, we prefer to stick to the original definition of the transform as defined in Eq. 4.
∀ χ ∈ ℝ >0 , where χ is a data vector that we wish to transform, and ln is the natural logarithm applied when λ approaches zero (i.e., invoked in our case arbitrarily when |λ| ≤ 0.01). The tested λ values are normally in practice bounded (e.g., [-2 2] or [-5 5] are a two common ranges). The BCT's goal is to ensure that the assumptions for linear models are met so that standard analysis of variance techniques may be applied to the data [18]. The algorithm could be a direct possible solution to automatic retrieval of the value of λ that somewhat relates to gamma correction. If the parameter λ can be properly determined, then each enhanced pixel brightness can be mapped to the desired value and hence contribute to maintaining the overall brightness [19]. The BCT does not change data ordering as per Bicego and Baldo [3].
Obviously not all data can be power-transformed to yield normality, however, Draper and Cox [20] argue that even in cases that no power-transformation could bring the distribution to exactly normal, the usual estimates of λ can help regularize the data and eventually lead to a distribution satisfying certain characteristics such as symmetry or homoscedasticity. The latter is especially useful in pattern recognition and machine learning (e.g., Fisher's linear discriminant analysis).

IV. BOX-COX FOR IMAGES (BCI)
As mentioned earlier, there is a lack of studies that deal with BCT and its power transformation in conjunction with digital images. BCT is an iterative algorithm and applying it to large images would take prohibitively considerable time to converge (e.g., on a square image of size 2 11 *2 11 it took the BCT algorithm around 10sec to converge on our machine, while operating at the histogram level the time complexity is theoretically size independent, and it took 0.05sec on this image). This feature proves its merit in the big data era where processing large scale image data sets is a concern. The key idea here is to consider the image histogram as a compressed proxy of the entire data matrix since it reflects the estimate of pixel probability distribution as a function of tonal. In this section, we lay down our algorithm in reference to colour images and the application to a grayscale type is encompassed within.
Given a true colour image in the primary red-green-blue (RGB) colour space, where (u, v) are the pixel spatial coordinates u=1,….U, v=1,…V and (U,V) are the two image dimensions. By referring to Eq. 1 and after the parameter λ has been estimated for an input image, we can experimentally scrutinize the following inference: ℱ ′ corresponds to the gray level channel as the YC b C r colour space calculates it. This colour space is proven to be useful in teasing apart the high frequency signal from the chroma tones that are blended in the RGB space.
As for finding the transformation parameter, lambda, deriving it from the image matrix, ℱ ′ , or from the image probability function (a.k.a histogram),̂, (see Eq. 6) would yield quasi-similar gamut enhancement effect in many cases, however, the merits of relying on the histogram are two folds. Our empirical observations indicate the stability as well as the high gain in time complexity when estimating λ from the histogram. Fig. 1 depicts the Spearman correlation coefficients of both transformations using a sample size of 600 randomly selected natural images acquired by several camera models. Despite the plot exhibiting a high correlation in most cases, there are a few instances (e.g., images 64 and 174) when examined they pinpointed the stability of our choice (λ histogram). Since the BCT may produce values outside of the image permissible dynamic range, therefore, in our case, rescaling the range is invoked which takes the form: The extension of the Box-Cox transformation to digital images would not be complete without exploring how the estimation of λ affects some image-domain specific applications. This section shall provide insight into two dominant areas: image enhancement and image colour pattern classification using a recent pre-trained model.

A. Image Enhancement
Testing for the capability of our proposed approach, BCI, against commonly used methods is carried out, for this task, using the Phos II data set along with images collected from the illumination dataset [21]. Phos II [22] is a colour image database of 15 scenes captured under different illumination conditions. More concretely, every scene of the dataset contains 15 different images: 9 images captured under various strengths of uniform illumination, and 6 images under different degrees of non-uniform illumination. The images contain objects of different shape, colour and texture.

1) Tests and Evaluation Metrics: a) Probability Distribution Test on Simulated Data:
As a sanity test, we first create a synthetic image (a gradient map) where each row is a vector that defines 257 equally spaced points between 0 and 1, see Fig. 2.
We then contrast our proposed approach, BCI, to enhancements using the methods reported in section I. To assess the goodness of fit, the QQplot (quantile-quantile plot) is utilised which plots the quantiles of the input vector data against the theoretical quantiles of the distribution specified by pd (probability distribution). If the empirical distribution conforms to pd, then data points shall fall on a straight line. Our choice of pd landed on the Rayleigh distribution for the very reason that it is commonly used in imaging technology [23] [24][25] [26].
The Rayleigh distribution is a special case of the Weibull distribution and its probability density function is formally defined as: is a scale parameter of the distribution.
In Fig. 3, QQplots of the synthetic image and the enhancements using the eight methods (as shown in Fig. 2). The overall impression one gets from this visualisation of assessing goodness of fit is that BCI, as compared to other methods, is the output that fitted most on the probabilistic line although it is quasi-similar to the fitted distribution of sRGB and aRGB.
In the above experiment, we noticed that the AGCWD's output was far from what we expected from this algorithm. This observation triggered us to extend our experiments by varying the vector length to observe the algorithm's behaviour. The AGCWD re-affirmed our observation, see Section VI and the web-link therein.

b) Quantitative Evaluation Metrics
In this sub-section, we highlight the different statistical metrics that we utilise. The intention here is not to go into details as these metrics are well established popular measurements.
Quantitative evaluation of contrast enhancement is not an easy task. Huang et al., [14] attributed that to the absence of an acceptable criterion by which to quantify the improved perception, quoting also [27], [28]. However, since then, a couple of image quality evaluator metrics have been proposed and are currently widely used. Hence, to gauge image enhancement efficiency, the so-called blind image quality metrics are adopted.
Naturalness image quality evaluator (NIQE) [29]: This metric compares a given image to a default model derived from natural scene statistics. The score is inversely correlated to the perceptual quality, in other words, a lower score indicates better perceptual image quality.
Perception based Image Quality Evaluator (PIQE) [30]: this metric calculated the score through block-wise distortion estimation. The score is inversely correlated to the perceptual quality.
Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [31]: this metric compares a given image to a support vector regressor model trained on a set of quality-aware features and corresponding human opinion scores. The score is inversely correlated to the perceptual quality. VI.

RESULTS AND DISCUSSIONS
Herein, we warrant the merits of the proposed approach (BCI) by conducting quantitative comparisons. The results give us a cue that BCI can be a potential alternative for existing methods. BCI time complexity should not be a concern since the algorithm, as we stated earlier, operates on image histogram (<= 256 points to process) to derive λ. BCI, like any other image enhancement algorithm, alters colour gamut. Therefore, for studies that are interested in the relationship between colours (assuming quantitatively accurate intensity values), such as the case in studies on β-cells promotion of insulin secretion or protein expression levels [32], should keep this fact in mind when dealing with image enhancement in general. The numerical output that we report here go into three directions, first the image enhancement domain, second, the vivid research area of image de-hazing [33], and finally the area of machine learning for image classification. Additional results of applying BCI to image de-hazing and deep/transfer learning for image classification (e.g., Camera model identification, face recognition, digit recognition, scene classification) are shown in the Appendix.

Image enhancement
In this section, we demonstrate the integrity and stability of our approach against two tests, namely, quality enhancement test and colour pattern segmentation test. In the first experiment, we selected 550 images exhibiting non-uniform lighting and contrast conditions. Images are of different sizes and are stored in RGB format. Table 1 tabulates the obtained results averaged across the entire set. It is evident that, on average, BCI outperforms all methods in quality assessments (i.e., NIQE, PIQE, BRISQUE). It is important to know that a couple of the methods shown in Table 1 operate only on single channel images (e.g., CLAHE), consequently, we convert the input image to HSV where these algorithms operate on the V channel then the image is reverted back to RGB space. It is interesting to see, from this analysis, that the algorithm SMQT retains image statistics which results in it having the same scores as the original image. Twelve randomly selected samples drawn from the 550 set are shown in Fig. 4. Given both extremes, BCI can be singled out for giving consistent favourable results in both cases. The methods, sRGB and aRGB can handle under-exposure but for over-exposure, images appear washed out. As for the SMQT, imadjust and BPDFHE, the contrary is true, they are prone to severe performance degradation under low exposure. This observation is consistent across additional experiments we conducted as a sanity check. Ultimately, this may be the best use of BCI transformation technique for those cases when inferences on the optimal transformation can be affected by exposure uncertainty.  Table 1.
Results on a grayscale image are shown in Fig. 5. To not clutter this paper with images, higher resolution visual qualitative comparisons on RGB still images and on simulated synthetic data (animation) that define different vector lengths (see Sec IV) are all furnished online through the following page: http://www.abbascheddad.net/BCI.html It is observed that the LIME algorithm (Fig. 5j) malfunctions around bright light regions in the image (i.e., exaggerates the oversaturated/bright areas), this phenomenon was also observed on additional tests that we conducted (data not shown).

Colour pattern segmentation
Pixel-wise colour pattern segmentation has been a longstanding research problem. Weijer et al. [34] [35], proposed a handy algorithm where colours are learned from real-world noisy data. To avoid manual labelling, their learning model is trained on colour images retrieved from Google image search engine. The algorithm can recognise colour patterns belonging to 11 colour gamut, namely, black, blue, brown, 2 Full resolution available online: http://www.abbascheddad.net/BCI.html grey, green, orange, pink, purple, red, white and yellow. In this experiment, we show that BCI does improve the performance of Weijer et al.'s method if incorporated prior to segmentation. In Fig. 6, we provide three examples, showing challenging synthetic chromaticity images.

VII. CONCLUSION
In this paper, we propose a new approach to enhance images by extending the renowned Box-Cox transformation to 2D data. Since Box-Cox algorithm stems from statistical and probability theories and since it is formulated to, among other benefits, stabilize the variance in one dimensional data (e.g., a vector of covariate/confounding variables), extra vigilance should be taken when tackling digital images. Our approach, termed herein BCI, precludes the need to arbitrarily estimate the parameter λ in Gamma correction or the need to find limits to contrast stretch an image. When this approach was conceived, we tried to not involve regularization parameter controls into our algorithm to reduce complexity and ease replication of results. The proposed scheme is simple and fast, does not require any model training, and we believe that it can complement other existing image enhancement algorithms.
The results land credibility to the efficiency of our proposed approach and show its stability and robustness compared to commonly used contrast enhancement techniques. Subsequently, we support our approach by improving the performance of the state-of-the art colour learning algorithm (results on other deep learning algorithms are supplied in the Appendix). This paper warrants a succinct description of the proposed approach, however, due to the page limit we have omitted other promising results in other domains which could have otherwise instilled credibility even more in the notion of BCI.
The Box-Cox algorithm as a well-adopted statistical and probabilistic method, is shown in this study to retain its fidelity even on two-dimensional data (i.e., digital images). One of the aims of this paper is to rekindle interest in the Box-Cox algorithm in conjunction with image enhancement. In a wider context, this optimisation algorithm might even help leverage the results of other enhancement algorithms that depend on the parameter λ and/or those setting it arbitrarily for gamma correction, and in other areas which we did not cover here such as image retrieval where informative features are sought [36]. There are some attempts to devise new methodologies to estimate λ for 1-dimensional data transformation, like the work of [37], however, this proposal comes to create an accrual of evidence regarding the utility of the renowned Box-Cox transformation in the imaging field.
• Image De-hazing: O-HAZE [4]: This is a reference dataset that is commonly used to evaluate objectively and quantitatively the performance of dehazing algorithms. It consists of a benchmark with hazy and haze-free real outdoor images, 45 samples each. Ancuti et al. [5] provide a challenge around this topic at each CVPR conference. It is of paramount importance to understand that by applying BCI to the de-hazing domain, we do not claim that we are devising the best de-hazing algorithm, rather we merely show platforms where our method may boost existing or upcoming algorithms in the area.

A) Image Classification Results
Tests were run 10 times for each method in each data set. The average, min, max and standard deviation of the accuracies are shown. The area under the curve (AUC) is derived from the run with the best accuracy using oneagainst-all strategy averaged over the number of classes.      Fig. 2A. The confusion matrix in relation to the best accuracy of both methods, with BCI (a) and without (b) as shown in Table 5A for AlexNet Transfer Learning SVM (TransL).

B) Image de-hazing
Visual quality can be decreased substantially due to adverse weather conditions (e.g., fog), man-made air pollution (fire fume, smoke-bombs by football fans, lachrymator when combating riots), etc. The optical field of science that deals with restoration of degraded photographs captured during such situations, is known as image de-hazing. It is a vivid research area, as evidenced by its presence in one of the major conferences on computer vision and pattern recognition, namely, CVPR'2019 workshop on Vision for All Seasons: Bad Weather and Night-time (https://vision4allseasons.net/). The advantage of our approach in boosting the performance of image de-hazing algorithms on the O-Haze dataset is depicted in the results shown in Table 6A. For a visual inspection, Fig. 3A shows an example of this application on the House image which is commonly used among the de-hazing research community. Comparing all existing de-hazing methods, He et al. [6], when incorporating our approach, yields the best algorithm in having the lowest colour distance error to the ground truth, it also exhibits the highest correlations. On the other hand, when Ren et al. [7] takes advantage of BCI, it outperforms the rest of the methods in preserving structural similarities. ) Row-wise, and for each method, the highest score of each metric is given in bold. ( ** ) Column-wise, the highest score of each metric is given in italic-bold.  (Table 6A), however, there is still a room for improvements (i.e., +BCI). (a) Hazy image -House. (b) Ren et al. [7] +BCI. (c) He et al. [6]. (d) Galdran [9]. (e) Ren et al. [7]. (f) Fattal [10]. (g) Berman et al. [8]. (h) Meng et al. [11].

C) Statistical Metrics
In this sub-section, we highlight the additional statistical metrics that we utilised in the experiments reported in this Appendix. For the de-hazing scenario, the reference-based image quality metrics are used.
-Peak Signal-to-Noise Ratio (PSNR): This ratio is derived from the mean square error expressed in decibels (dB) and is positively correlated with quality. There is no upper bound to this measurement.
-Structural Similarity Index (SSIM) [12]: This is a quality index that aggregates local image structure, luminance, and contrast into a single local quality score. This metric is generally more favoured than the PSNR due to some critics about the latter.
-Pearson Correlation Coefficient (Corr): A measure of correlation between an image and a reference which is returned as a numeric scalar.
-Information Content Weighted SSIM (IWSSIM) [13]: Zhou Wang, the one who coined the SSIM metric, and his colleague at the laboratory for computational vision (New York University) provided an improvement to the SSIM and PSNR metrics by incorporating the idea of information content weighted pooling. In this work, we use both extensions; IWPSNR and IWSSIM.
-Multi-scale Structural Similarity Index (MSSSIM) [14]: Noticing the dependency of SSIM on image scale, Wang et al. came up with the multi-scale version to circumvent such shortcoming.
-Euclidean Distance Norm (ED): This is a straightforward calculation of accumulated colour errors using the Euclidean distance between the estimated image and the ground truth source image in the RGB trimulus colour space [15].
As for evaluating machine learning models, we mainly use the accuracy, the area under the curve (AUC), and the confusion matrix.