Spatial Frequency and the Performance of Image-Based Visual Complexity Metrics

There is a wide range of visual and spatial complexity measurement methods that aim to quantify perceived image complexity. While image-based calculation methods (edge detection, image compression, contrast) characterize a digital image, visual perception studies focus on fundamental visual mechanisms, such as contrast sensitivity and visual task performance. Despite the evidence from several vision studies, spatial frequency information has not been widely utilized to assess image complexity. Previous studies suggest that image-based performance metrics are limited in explaining perceived complexity due to confounding factors, such as context, memory, familiarity, and expectation. Here, a visual experiment is conducted to assess the performance of image-based metrics and spatial frequency information using 16 abstract and natural images. A new image complexity metric (<inline-formula> <tex-math notation="LaTeX">$R_{\mathrm {spt}}$ </tex-math></inline-formula>), based on detectability suprathreshold, was proposed to benchmark the performance of existing measures. Forty-four naïve participants used a 5-point Likert-type scale to judge the visual complexity of the images displayed on a tablet. Results indicate that root-mean-square error (RMSE) and <inline-formula> <tex-math notation="LaTeX">$R_{\mathrm {spt}}$ </tex-math></inline-formula> correlate statistically significantly with subjective evaluations. Biological sex did not affect perceived spatial complexity. While RMSE and <inline-formula> <tex-math notation="LaTeX">$R_{\mathrm {spt}}$ </tex-math></inline-formula> can potentially be used to estimate the spatial complexity of display images, the performance of spatial frequency information and image assessment measures in immersive viewing conditions require further research.


I. INTRODUCTION
Visual (or spatial) complexity is a widely discussed but not precisely defined term. In its simplest form, visual complexity refers to the level of detail within an image. Visual images, such as computer displays or photographs, are widely used to study visual mechanisms since three-dimensional environmental stimuli are reduced to two-dimensional retinal images. The complexity of images can be analyzed through the characterization of the stimuli and its impact on higher-level visual processes. Theories explaining visual complexity, such as fractal [1], fuzzy [2], and information theory [3], are typically based on the physical stimuli itself and computational in nature. For example, Kolmogorov complexity theory [4] measures the computational resources needed to specify an object (the length of the shortest binary computer program that describes it) [5], and it has been applied to image complexity [6], aesthetics [7] and image similarity [8]. In addition The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues . to complexity, algorithmic information theory considers the randomness and probability of an algorithm to reproduce a string (sequence of characters) [9]. The computational probability aims to explain the likelihood of a sensory input leading to the perception of the object.
The effect of visual complexity on cognitive and emotional responses can also be examined without a computational approach. For example, people and other animals find visually complex images intrinsically more attractive than simple images [10]. The complexity of the stimuli also impacts the speed and accuracy of visual search [11], scene preference [12], human cognition, and emotion [13], [14]. In addition, abstract images have been previously used to estimate the perceived complexity of architectural spaces [15] and renderings [16], building facades [17], and artwork [18]. In imaging research, the complexity of an image is typically called spatial complexity due to the indirect observation of stimuli (observers perceive stimulus that is digitally processed through another channel, such as a display). Spatial complexity measurement methods have also been VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ used for satellite image mapping [19], remote sensing for wildlife conservation [20], and quality assessment of imaging systems [21]. However, the use of image-based metrics to quantify the perceived complexity of scenes under projection systems [22] highlighted the need to investigate the accuracy of spatial complexity metrics. Here, the perceived visual complexity of images is investigated through spatial frequency information and spatial complexity metrics.

II. BACKGROUND
Visual complexity has been a topic of interest for psychologists since the late 19th century. Several theories and models have been proposed based on a single visual form, visual arrays, information pickup, visual displays, perceptual learning, and neural circuit theory to explain perceived visual complexity [23]. Both empiricists' and nativists' theories investigated the roots of the perceived quality of images to build a theoretical framework. Empiricist approaches focusing on the discrimination of primitive (basic) attributes of shapes [24], [25] failed to develop a universal metric [26], [27]. On the other side, from a nativity perspective, two main factors influencing the visual complexity of an image have been proposed; familiarity/novelty and spatial frequency information [27].

A. FAMILARITY, NOVELTY, AND INTEREST
Familiarity is an instinctive parameter that affects cognitive functions. Both familiar and familiar objects have been used in visual experiments. The analysis of a series of black and white drawings of real-life objects resulted in three levels of complexity: low complexity for kitchen utensils, fruits, human body parts, clothing, furniture; medium complexity for vegetables; and high complexity for vehicles, birds, animals, insects and musical instruments [28]. Another study showed that participants require more time to encode, mentally rotate, and compare unfamiliar stimuli [29]. Similarly, familiarity and learning (introduced through training) impacted the perceived complexity of unfamiliar shapes [30]. This study also highlighted the shortcomings of image-based complexity metrics. On the other hand, visual complexity is also associated with visual search tasks, which, in return, is affected by novelty. For example, the search time and accuracy of simple forms depend on the number, spacing, color, shape, and size of the forms constructing an image [31], [32]. While color discrimination showed no effect, spatial frequency discrimination had an intermediate effect on task performance [32].
Complexity is also considered a major determinant of visual interest and pleasure. For example, Berlyne's inverted U-shape curve for pleasure, interest, and complexity [33] predicts higher arousal for medium complexity images, and lower arousal for low and high complexity images, as shown in Fig. 1. This curve acts similar to the spatial frequency slopes (α) found in the spatial frequency patterns, in that divergence from the medium complexity decreases preference ratings. Berlyne's proposed model has been supported FIGURE 1. A modified version of Berlyne's inverted U-shape curve [33] shows the change in preference and arousal with increased complexity. The inverted U-shape suggests that the medium complexity images are optimal, and further increase in complexity does not equate to pleasure.
by studies investigating abstract patterns [34], paintings [35], and natural images [36], but the model failed to explain preference in the visual complexity of websites [37].

B. SPATIAL FREQUENCY INFORMATION
The physical characteristics of an image can be described using basic attributes, such as contrast, orientation, spatial phase, and spatial frequency. Spatial frequency (f ) is a measure of periodic grating across a position within a given distance on the retina [38]. Although the SI unit of spatial frequency is cycles per meter (c/m), it is commonly reported in cycles per degrees (cpd) of visual angle. In vision research, sinusoidal patterns with varying frequencies, amplitudes, and angles, are commonly used to test the visual system's capabilities, especially contrast sensitivity [39], [40] and visual performance [41]. Studies show that spatial contrast sensitivity in adults peaks between 3 cpd and 8 cpd [39], [40], [42].
The experimental observations show that the visual cortex strongly responds to sine-wave gratings, and a multichannel human vision model can explain how neurons of different receptive field sizes create a neural representation of different grating scales [42]. Psychophysical and electrophysiological evidence supporting this hypothesis underlines that the visual cortex has multi-dimensional spatial filters with narrow (nonzero) bandwidths [43]. It is suggested that spatial frequencies of a complex grating are detected independently [44], and adaptation to spatial frequency can enhance contrast sensitivity in the short and long term [45], [46]. Different spatial frequencies can also cause visual discomfort depending on the discomfort groups (sensitivity to physical stimulus). In a study, low and moderate visual discomfort groups found 8 cpd and 12 cpd stimuli unpleasant, whereas high visual comfort group's discomfort peaked around 4 cpd [47]. Another study found increased discomfort with increasing spatial frequency up to 16 cpd for both moderate-high and low visual discomfort groups [48].
The spatial frequency analysis that is based on gratings faced criticism due to its dependence on simpler forms that are not common in the real world. Although the complex real-world images can be considered a superposition of a large number of fundamental patterns, it is unlikely that models based on simple-stimulus experiments are sufficient to predict the visual quality of complex real-world images [21]. These limitations urged researchers to create datasets that include images of both basic and complex patterns [49] and propose methods to assess the spatial quality of complex images, such as blind/referenceless image spatial quality evaluator (BRISQUE), JPEG, and JPEG2000 [50]. However, image quality metrics do not explicitly asses the perceived spatial complexity of an image. In addition, natural image complexity and the decorrelation problem (strong dependency between intra-and inter-channels in natural images) [21] may impact the accuracy of image quality metrics.
The growing evidence of the human visual system functioning as a frequency analyzer led to the analysis of natural image statistics using spatial frequency information. The regularity (common patterns) in the natural images have been demonstrated to exist in spatial [51] and wavelet domains [52]. The natural statistics underlie the non-randomness in images and enable predicting the visual quality of natural images. For example, the power spectrum of natural images reportedly follows a frequency function of where the amplitude (A) is averaged across all the orientations, f is spatial frequency, and α is the negative slope on loglog coordinates [53]. The slope α varies from image to image, and it typically ranges between −0.7 and −1.5 in achromatic images [53]- [55], 0 and −2 in chromatic images [56].
The studies suggested that slope in natural images peak between −1.1 and −1.3 [53], [55], and deviation from natural scene statistics may cause discomfort [56]. The discomfort caused by non-natural image statistics hints to an evolutionary adaptation mechanism. It should be noted that the natural statistics of images are limited with datasets analyzed in these studies. Therefore, slopes may considerably vary across images. While spatial frequency theory aims to ground the visual perception of images to a fundamental cognitive framework, the effect of physical stimulus characterization on perceived complexity is still unclear. While the natural statistics and image-based spatial complexity exist in different domains of research, the performance of f statistics has not been previously tested for visual complexity. Here, the performance of spatial frequency information, spatial complexity metrics, and an arbitrary complexity metric have been analyzed.

A. STIMULI AND COMPLEXITY MEASURES
The performance of nine visual complexity measures have been compared using 16 images selected from open-source repositories. The image dataset included natural images, such as humans, landscapes, paintings, human-made objects, as well as abstract images, such as diagonal achromatic lines, as shown in Fig. 2. All the images were colored, except for two images (child and lines). The smaller version of two of the images (pebbles and move it) were added to the set to test the effect of size on perceived and calculated spatial image complexity. Three images consisted of repeating patterns (pebbles, pebbles small, pepper). Although there are several spatial information and compression-based image quality measures, the analysis was limited to visual complexity metrics and spatial frequency information. The spatial characteristics of images and the calculated metrics are given in Table 1.
Inspired by the definition of the Kolmogorov complexity, the size of various compressed image formats has been previously used to estimate the complexity of images [18], [30], [57], [58]. Here, only two of the most popular lossy image compression methods (GIF and JPEG) have been used. GIF compression was performed with a local selective palette of 256 colors and at normal row order. Although a large image size is associated with image complexity (simple images contain more redundant information that can be compressed; therefore they have smaller file sizes than complex images), it should be noted that compressed file size is also affected by other factors, such as luminance and chrominance information [30].
The RMSE (the difference between the original image and the lossy compressed image), spatial frequency slope α, and entropy E were calculated via MATLAB R Image Processing Toolbox TM . Fractal dimension D was calculated by a box-counting method via ImageJ, a public-domain image-processing software. Edge detection algorithms commonly used in spatial complexity and quality assessment, such as Canny and Sobel methods [30], [57], [60], were also analyzed. In addition to the existing measures of complexity, a new arbitrary metric (R spt ), based on suprathreshold detectability, was introduced to benchmark existing spatial complexity assessment measures. The calculation of the R spt is explained in the following section.
The correlation coefficients between all the complexity measures are given in Table 2. The highest correlation was between spatial frequency slope and RMSE (0.71). JPEG correlated highly with GIF and R spt (0.60 and 0.66, respectively). While there was a high correlation between GIF and Canny-GIF (0.67), GIF negatively correlated with Sobel-GIF (−0.51). The correlation between GIF and spatial frequency slope was also negative (−0.62). The p-values of the multiple correlations were tested using the Benjamini and Hochberg method [61], which is a powerful correction method for false discovery rate. In the level of p < 0.01, the statistically significant correlations were found for only two pairs: JPEG -R spt and GIF -Canny-GIF.

B. A NEW VISUAL COMPLEXITY MEASURE
Goodhart's law implies that ''when a measure becomes a target, it ceases to be a good measure'' [62]. To address the potential misuse of image-quality assessment methods, an arbitrary visual complexity metric (R spt ) was introduced as a pseudo-random benchmark.
The first step of the calculation of R spt is converting an image into grayscale and then binarizing it through Matlab R Image Region Analyzer application, which uses an adaptive thresholding method [63]. A series of low to high complexity images were visually judged by the author to identify the smallest detectable region. A detection threshold of 1 in 25000 pixels was found reasonably accurate for a variety of images. The number of suprathreshold regions (R spt ), approximating the perceptual complexity of images, is calculated where PR is the size of a region, i is the region index, and PR total is the total number of pixels in the image. R spt quantifies the number of detectable regions in an image and considers an image with a high number of regions to be more complex. The arbitrary suprathreshold metric R spt was tested against a set of images previously used in visual clarity and blur perception research [64], as shown in Table 3. An empty white or black image (R spt = 0) and Jackson Pollock's move it (R spt = 466) are the known boundaries of the R spt scale. All of the tested images (i.e., natural scenes and paintings) lie within these boundaries, and approximately R spt < 100 denotes visual simplicity and R spt > 300 denotes high complexity. Participants were instructed to judge the visual complexity of the images using a 5-point Likert-type scale with a neutral midpoint. The scales were ''very complex,'' ''complex,'' ''medium,'' ''simple,'' and ''very simple.'' The order of the images was randomized for every participant. Participants were allowed to move back and forth between images and change their judgments (anchoring was allowed). Although there was no minimum or maximum time limit to make judgments, participants did not spend more than three minutes per image.

A. QUANTITATIVE IMAGE COMPLEXITY ASSESSMENT
Participants' subjective evaluations were tested for normality with the Shapiro-Wilk test at p = 0.01. The test showed that participants' subjective visual complexity evaluations were not normally distributed. Based on the non-normal distribution, interobserver differences were tested with the Kruskal-Wallis test. The null hypothesis was rejected (χ 2 = 267.77, p < 0.001, df = 703), which hints a statistically significant variation in the interobserver judgments, as shown in Fig. 3. However, performing multi-sample comparisons in the Kruskal-Wallis test can inflate Type I errors (rejection of a true null hypothesis). Therefore, the rejection of the null hypothesis should be taken with a grain of salt. Spearman's rank correlations between subjective judgments and computation measures for all images are given in Table 4. There were only two cases of statistically significant relationship between subjective assessments and metrics; RMSE (ρ = 0.92, p < 0.001) and R spt (ρ = 0.62, p = 0.01).
The influence of biological sex was investigated by looking into female (n = 23) and male responses (n = 17). Spearman's rank correlation for image complexity metrics was not different for biological sex. The only small difference was the VOLUME 8, 2020 increased statistical significance for R spt for males compared to females, as shown in Table 5. The effect of size on the perception of spatial complexity was also analyzed with the Wilcoxon-Mann-Whitney two-sample rank-sum test. The large and small variations of two images (move it and pebbles) were tested separately. The difference between move it and move it small was not statistically significant (U = 957.5, n 1 = n 2 = 44, z = 0.10, p = 0.46). Similarly, the difference between pebbles and pebbles small was not statistically significant (U = 790, n 1 = n 2 = 44, z = 1.56, p = 0.12). However, the effect sizes for both of the results were small (r = 0.01 and r = 0.23, respectively). The image size did not have a significant effect on female or male participants' subjective evaluations.
The correlation between image complexity metrics and subjective evaluations of paintings (Mondrian, Dutch, Gothic, move it, move it small) and natural images (architecture, balloons, child, coast, park, pebbles, pebbles small, pepper, water, woman-car) are given in Table 6. Although there was a high correlation between subjective ratings and several metrics, the correlations were not statistically significant (D ρ = 0. In previous studies, medium correlation with JPEG and mixed results for GIF correlation were recorded [18], [57]. Compression file-formats correlated significantly with subjective complexity judgments of representational paintings [57], whereas edge detection methods correlated moderately with subjective measures for both environmental scenes and paintings [30]. The results found in this study do not strongly support these findings. Instead, results support evidence from another study where RMS measures outperformed JPEG compression [58]. The discrepancy in these results can be attributed to differences in experimental procedures. Training and elaborate explanations of the sought terms allow participants to be more consistent in their responses [65]. Precautionary methods (e.g., training) may provide different results compared to studies where data are collected using a less restrictive method. The wording of the research questions or statements can also cause bias [66]. In this study, the question was limited to ''visual complexity,'' but there was no training provided prior to the experiment.

B. QUALITATIVE IMAGE COMPLEXITY ASSESSMENT
In a preliminary study, a small group of participants were presented the same stimuli (16 images) to examine the perception and definition of visual complexity. When the complexity was defined as ''ease of remembering details,'' participants reported that they would provide different responses than when there was no definition (e.g., Mondrian might be easy to look at, but hard to remember, therefore complex). Participants seem to have a non-universal and intrinsic definition of complexity.
In the actual experiment, participants were asked to judge only ''visual'' aspects of the presented images rather than encoding the semantic complexity. Some participants reported that certain features of an image were whole objects (i.e., the face of a human) and they did not consider the whole objects to be complex. Some participants thought that abstract paintings (e.g., move it) have no identifiable objects in them, and they considered the abstract structure to be a background; therefore, they did not find it complex. Some images (e.g., lines, architecture) caused nausea and dizziness, and they were considered complex due to the reaction they caused, not due to their structural formation. This finding supports the notion that while the human visual system is attuned to horizontal and vertical gratings [67], they may cause somatic and perceptual side-effects [68]. On the other side, one of the participants reported that the park was ''predictable,'' therefore participants considered it to be relaxing and not complex.
Images with repetitive objects (e.g., pebbles, lines) caused the highest disagreement among participants. Pebbles small was considered complex for some participants, but not for others. Although most participants did not report considering color as an influencing factor, a high level of variation in woman-car image hints the role of color contrast in visual cognition. This is likely due to the influence of color contrast on visual performance [69].
Three paintings (Mondrian, Dutch, Gothic) used in a previous experiment [64] showed high correlations to their intended use. Participants typically judged Dutch image to be the most complex in the survey (even more complex than move it). This is possibly due to participants' consideration of the number of interesting objects (i.e., relatable objects such as people), rather than abstract shapes (grating). Gothic was considered medially complex, and Mondrian was generally judged to be visually simple, with some noted exceptions.

C. IMPLICATIONS FOR IMMERSIVE VIEWING CONDITIONS
Subjective evaluations of 16 display images supported the notion that image-based complexity metrics are limited in quantifying perceived complexity with a caveat. While RMSE performed well in predicting the perceived visual complexity, it may not be possible to use the rootmean-square analysis in realistic (immersive) environments due to its dependence on a reference condition (undistorted image). In immersive viewing conditions, the reference point of a subjective evaluation is typically stored in participants' short or long-term memory. The spatial memory reference is often distorted in time, even within a very short period [70]. The lack of a meaningful reference in realistic, immersive environments limits the use of reference-based image assessment measures.
Another important difference between displayed images and immersive environments is the visual angle. The images displayed in these experiments forces participants to make judgments based on narrow to medium field of view. Therefore, the effects of background and surrounding field may not be taken into account in the image assessment metrics. Visual acuity peaks in the fovea, but the optical quality of the retinal image reduces slowly with peripheral angle and spatial frequency sensitivity changes with visual eccentricity [71]. The changes in the visual system with the visual angle hints the importance of experimental conditions in image quality assessment.
The discrepancies between the metrics and subjective assessments can also be linked to other endogenous and exogenous elements. Image-based metrics do not consider internal factors, such as context, memory, familiarity, and expectation [30]. There are several external factors, such as illumination levels [72], spectral power distribution of a light source [73], and adaptation [74] that may impact subjective and objective judgments of visual perception for both display images and real scenes [75]. The limitations are highly relevant for integrative lighting systems that are based on the modeling of the spatial and spectral sensitivity of the visual system. For example, it is possible to conceptualize a real-time integrative system that detects the physical characteristics of the built environment using sensors (CCD or CMOS for spectral, spatial, and luminance imaging of the built environment [76], [77]) to optimize the light output for energy efficiency, visual comfort, and visual performance. Such a system can estimate the perceived spatial quality and complexity of the built environment and enable making predictions through computational models based on machine learning and optimization algorithms. However, the performance of such an integrative system depends on the accuracy and precision of the mathematical models and parameters.

V. CONCLUSIONS
Image quality assessment measures have been widely used to assess the quality of display images. The performance of these measures varies depending on the dataset and experimental methods. A visual experiment with 16 images that consist of paintings, natural scenes, and abstract structures was conducted to assess the performance of spatial frequency information and spatial complexity measures. Images used in this experiment contained unusually repetitive structures, varying sizes, and chromatic characteristics (i.e., not all the images were natural). The wide variety of characteristics in the physical stimuli enabled a rich exploration, where root-mean-square error (RMSE) and newly proposed R spt correlated significantly with subjective assessments. However, visual complexity judgments are likely influenced by higher-level processes, such as grouping, object identification, and emotional responses.
It should be noted that reported findings are limited to visual complexity. Although image complexity and image quality are interrelated, they are not interchangeable concepts. Future research should investigate the use of imagebased quality assessment measures and spatial frequency information for other dimensions of visual perception, such as overall image quality and visual clarity.
In this experiment, spatial frequency slope did not achieve a statistically significant correlation with subjective evaluations of visual complexity. There may be two main reasons for the lack of correlation: high variation in the image dataset and experimental conditions. Spatial frequency information is widely used to estimate contrast sensitivity and visual performance for natural and simple images. However, the dataset used here was not limited to natural images.
RMSE and R spt performed well in this study, and it is likely that other studies will find statistically significant results for other complexity measures. The image complexity metrics are typically used to quantify the perceived complexity of images displayed on a screen. However, visual tasks in the real-world can be harder to complete and model [78].
Moreover, immersive environments illuminated by an integrative lighting system based on light projection [79], [80] requires further investigation. Future research aims to address the effects of exogenous elements, such as the spectrum and intensity of the illumination and spatial frequency information, on visual perception.