How Visual Discomfort Is Affected by Colour Saturation: A fNIRS Study

The wide colour gamut displays have improved the viewing experience greatly. However, highly saturated colours may also bring problems such as visual discomfort. In this study, the influence of colour saturation on visual discomfort is investigated. Visual stimuli sequences with different images and saturation levels were presented in subjective and objective experiments. The scores of image quality, visual comfort, and preference were collected in questionnaires. The peak amplitude of heamodynamic response was extracted from fNIRS. The results show that, in the mid saturation level, subjective evaluation on visual discomfort was the lowest and the smallest peak amplitude of heamodynamic response was observed. While the image quality and preference increased as the saturation increased. For wide colour gamut displays, the visual discomfort induced by highly saturated colours should be taken into consideration.


I. INTRODUCTION
I N RECENT years, the consumers' and manufacturers' pursuit of a better viewing experience has given rise to many new display technologies. Wide colour gamut (WCG) is one of the most typical types [1]. WCG displays offer an extension of the colour gamut. It is suggested that WCG could match what we see in the real world better than conventional displays. Wider colour gamut is considered to provide more realistic, more natural and more vivid colours, and could enhance image quality [2]. However, it is found that highly saturated colours in WCG displays do not always result in better experience. Based on research from Kumakura et al. [3], the viewers' preference for saturation varied in different images. Besides, there was no decrement of subjective valence and arousal when the saturation was not preferred the most. According to François et al. [4], although studies in power consumption of WCG displays are abundant, the visual comfort aspect is not yet been fully studied. The authors suggest that the properties in human visual system might be a reasonable limit to future development. Thus, it is meaningful to explore various viewing experiences of WCG displays. Visual discomfort is a major area of interest within the field of viewing experience. It is defined as a complex symptom referring to the viewers' unpleasant feelings when watching visual stimuli and causes problems in visual performance [5]. For subjective measurements, questionnaires such as the Visual Discomfort Scale (VDS) [6] and Simulator Sickness Questionnaire (SSQ) [7] are widely used. Concerning the objective, according to Wilkins et al. [8], brain activity is related to visual discomfort from images. Previous studies have reported that electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) features are correlated with brain activities. Watching urban photos whose spatial frequency distributions were unnatural caused visual discomfort and elicited large haemodynamic responses in the visual cortex [9]. Gratings with larger chromatic separations evoked increased haemodynamic response amplitude and subjective discomfort evaluation, and moving gratings evoked steeper response decrement after stimulus offset [10], [11]. Besides, gratings with certain colour pairs evoked lower alpha power and greater alpha desynchronization in EEG and resulted in stronger visual discomfort [12]. Together, these studies show that the cortical haemodynamic response measured using fNIRS is sensitive to visual discomfort, and can be considered an indicator.
Many authors have reported the influence of colour saturation on viewing experience. Gao [13] performed an experiment to show that too saturated image impaired visual comfort and caused suppression of sympathetic and parasympathetic activity, reflected by electrocardiogram (ECG) features. Besides, a series of studies on conventional displays revealed that for extremely saturated colours, subjective evaluations of naturalness, image quality, and likeness, would decrease [14], [15], [16], [17]. However, most studies in the field of colour saturation have tended to focus on subjective evaluations rather than brain activity.
In this paper, the influence of colour saturation of WCG display on visual discomfort is studied. Data are collected using fNIRS and subjective questionnaires. The fNIRS is chosen for two reasons. Firstly, the cortical haemodynamic response measured with fNIRS is sensitive to visual discomfort induced by display parameters. Our previous study on display luminance [18] reported that increased visual discomfort was accompanied by larger haemodynamic response. Secondly, compared to other brain activity measurements, fNIRS has several advantages. It offers better temporal resolution and tolerance for head movement than functional magnetic resonance imaging (fMRI), and higher spatial resolution and robustness to movement artifacts than EEG [19]. The peak amplitude of HbO 2 response was extracted from fNIRS, and the subjective questionnaire included three items: image quality, visual comfort, and preference.

A. Display and Visual Stimuli Design
For the display device, a 65-inch SAMSUNG WCG QLED TV (QA65Q8CAM) was selected. When showing a full-screen white field, the mean luminance was 650 cd/m 2 and the CIE (x,y) chromaticity coordinates of the white point was (0.2880, 0.2883). The colour gamut of the display was 0.92 of NTSC (National Television Standards Committee). For the visual stimuli, three high saturation 4K images with similar luminance named Parrot, People, and Flower were selected for the exploratory study. It should be noted that the purpose of this study was to check whether the haemodynamic response was sensitive to the changes of saturation in natural images. As this was a preliminary exploratory experiment, limited images were selected. For more general evaluation, more images were necessary.
For different saturation levels, the process of colour adjustment is presented in Fig. 1. Firstly, Gamma look-up tables for red, green, blue, and white, were measured, and the colour coordinates were specified respectively. With the Gamma correction, the R i G i B i values of the original image were transformed to linear space, normalized, and stored as floats to improve accuracy. Then, the image was transformed from the RGB to the XYZ colour space: The matrix C included the chromaticity coordinates of the measured red, green, and blue primaries, respectively. The matrix T included the proportional constants for the corresponding primaries under the specified white point. After that, the image was transformed from the XYZ to the L * a * b * colour space: X n , Y n , Z n were the chromaticity coordinates of specified white point. These colour space conversions were based on the method from Benson [20]. After that, the components lightness (L), chroma (C i ), and hue (H) were calculated. In every pixel, C i was adjusted to C o according to the saturation ratios, as shown in the equation in Fig. 1. Three saturation levels were adopted: low, mid, high, corresponding to the saturation ratios: 0.3, 0.7, and 1.0 (original) to the original image. The acquired The colour gamuts of display in different saturation levels are presented in Fig. 2. The colour gamut reduced with lower saturation: 0.13, 0.73, and 0.92 of NTSC colour gamut in low, mid, and high saturation, respectively. The white points were consistent, around (0.2880, 0.2883). The images in different saturation levels are plotted in Fig. 3. Here, the NTSC colour gamut was selected as a reference for comparing the colour gamuts in three saturation levels, because it was a standard gamut for television broadcasts in most areas.
As to the changes in colour, the colour differences CIE ΔE 94 of adjusted images from the original images (high saturation) were calculated [21]. According to a study on just noticeable difference (JND) [22], the JND of saturation was ΔE 94 = 0.8±0.3. In this study, the mean ΔE 94 was 9.75±0.84 for low saturation (ratio = 0.3) and 3.38±0.22 for mid saturation (ratio = 0.7), ensuring the saturation differences could be perceived. The averaged luminance of the images in different saturation levels were similar to each other. The averaged luminance was 138.36 cd/m 2 , 137.32 cd/m 2 , and 139.11 cd/m 2 in low, mid, and high saturation.

B. Participants
The participants were 16 volunteers (10 males and 6 females, age: 24.69±1.54 years) from Southeast University, Nanjing. All the participants had a normal or corrected-to-normal vision of over 1.0 (decimal), which was ensured by an intelligent refractor RT-5100 with chart SSC-370 (NIDEK, Japan); and a normal colour vision, which was ensured by Ishihara tests. The participants were asked to ensure 8 hours of sleep and avoid taking food or drinks containing caffeine or alcohol 24 hours before each experiment. The research complied with tenets of the Declaration of Helsinki and was approved by the ethical regulations at Southeast University. Informed consent was obtained from each participant.

C. Environmental Design and Procedure
The experiment was conducted in a dark environment (illuminance at the eye level < 0.1 lx, in absence of display). The participant sat in a comfortable chair, with adjustable height to ensure a 120-cm eye level from ground, the same height to the center of display. The viewing distance was 240 cm, three times the screen height.
A within-subject design was employed. For every participant, the study included all the images and saturation levels and was divided into two parts: objective and subjective. To avoid the learning effect, the interval between two parts was at least seven days. The objective part, to avoid visual fatigue accumulation, was divided into three sessions further, and each session lasted for approximately 24 minutes. The interval between two objective sessions was at least one day so that participants could fully recover from the previous session. The experiments were run using E-prime 2.0 (Psychology Software Tools, USA). The procedure details are presented in Fig. 4.
In the objective part, fNIRS was used as the measurement. In each session, the visual stimuli were one randomly selected image with all the saturation levels. First, the participant adapted to the dark environment and the display with a full-screen gray field (average luminance = 94.49 cd/m 2 ) for 5 minutes. Then, 1 image × 3 saturation levels × 8 repeats = 24 trials were presented in pseudo-random order. The signal-to-noise ratio of fNIRS was limited, thus repeated trials were essential to yielding reliable averaged responses [23]. Based on other researchers and pre-test results, 8 repeats were decided. In each trial, a white fixed cross was presented in the center of a black background for 1 second to catch the participant's fixation. Then  the visual stimulus was displayed for 16 seconds to ensure that the haemodynamic response reached the peak during the stimuli presented, based on existing fNIRS studies [10]. After that, the stimulus interval (ISI) with a full-screen gray field lasted with a random duration of 27 to 36 seconds, to wait for the responses to return to the baseline.
In the subjective part, questionnaire was used as the measurement. All the images with all the saturation levels were included. First, the participant went through a 5-minute environmental adaptation. Then, 3 images × 3 saturation levels × 3 repeats = 27 trials were displayed in pseudo-random order. In each trial, similarly, a fixed cross was presented for 1 second at first, followed by the visual stimulus displayed for 5 seconds. The display duration was enough for subjective evaluations, based on pre-test results and existing studies on visual discomfort [24]. Then, a questionnaire with the gray background was presented in the center of the screen, and the participant answered the items using a keyboard. Once the questionnaire was filled, the next trial began automatically.

D. Data Collection
The questionnaire consisted of three items: image quality, visual comfort, and preference, with 5-point scale designs, as shown in Table I. The higher the score, the better the evaluation was.
The fNIRS data were recorded by a portable fNIRS device OctaMon+ with a sampling rate of 50 Hz, and the software Oxysoft (Artinis Medical Systems, The Netherlands). Two transmitters (Tx1 and Tx2) and one receiver (R1) were placed according to the standard 10-20 EEG layout. Hereafter, the occipital left channel (Tx1-R1) was abbreviated as OL, and the occipital right channel (Tx2-R1) was abbreviated as OR. The optode placement is shown in Fig. 5.

E. Data Process and Analysis
The fNIRS process was conducted using Matlab R2021a (Mathowrks, USA) with signal processing toolbox. First, the signal was filtered using a fifth-order Butterworth bandpass filter, with a cutoff frequency of 0.01 Hz and 0.5 Hz [25]. After the filtering, the epochs were extracted from the fNIRS signal, which lasted from 5 seconds before the stimulus onset to 30 seconds after the stimulus onset. In each epoch, data in the first 5 seconds were defined as the baseline to which data was subtracted to obtain the haemodynamic response. Finally, data with obvious drifts, artifacts, and burrs were manually removed based on visual observation.
After the process, 114 epochs were discarded in total. Then, the HbO 2 responses were calculated from the epochs based on modified Beer-Lambert law (MBLL), with a differential path length factor of 6.26 in μmolar units. The differential form of MBLL indicated that light attenuation changed in proportional to the concentrations of tissue chromophores, mainly oxy-and deoxy-haemoglobin. Based on MBLL, the fNIRS could be transformed to optical density, and then to cortical haemodynamic response [26]. After the transformation, the HbO 2 responses of 8 repeats were averaged for each participant, image, saturation, and channel separately. Then the peak amplitude (PA, unit: μmolar) from 5 seconds to 25 seconds after the stimulus onset was calculated for every HbO 2 response. SPSS Statistics 21.0 (IBM, USA) and Matlab R2021a were used for statistical analysis.

A. Haemodynamic Response Results
The HbO 2 responses of two participants (No.5 and No.6) are plotted in Fig. 6, to show the similarities and differences among individuals. After the visual stimulus was presented, the HbO 2 responses gradually increased, reached a peak in 5 to 20 seconds, and then fell back to around the baseline. However, the waveform shapes of HbO 2 responses varied among individuals.
The averaged PA and PT of HbO 2 responses are shown in Fig. 7. The PA was the lowest in mid saturation, and the highest in high saturation. The PA of Flower was slightly larger than other images. The PA values between channels were similar. No obvious differences among images, saturation, and channels were observed in PT.
The results of Kolmogorov-Smirnov test and Levene test showed that PA data were normally distributed and had equal variances. Thus, repeated measures ANOVA was used to compare the dependent variables PA and PT. Saturation, image, and channel were fixed factors. The significance level was Sig. < 0.05. Greenhouse-Geisser correction was applied for adjusting for lack of sphericity. The effect size was evaluated using partial eta squared (η p 2 ), with values 0.01, 0.06, and 0.14 regarded as small, medium, and large, respectively [27]. Tukey's-b method was used for post hoc tests.
Repeated measures ANOVA results are shown in Table II. There was a significant influence of saturation on the PA, with   Table III. The PA was the highest in high saturation, followed by low saturation, and the lowest in mid saturation.

B. Questionnaire Results
In the data analysis, the averaged results of three repeats were applied. Fig. 8 presents the questionnaire results. As to image quality, the score was the lowest in low saturation, between 'poor' and 'neutral'. As to visual comfort, the score was the lowest in high saturation, from 'uncomfortable' to 'neutral'. The  score in low and mid saturation were higher than 'neutral'. As to preference, the score was the lowest in low saturation, from 'dislike very much' to 'neutral'.
Considering the ordinal scores, Friedman test, a nonparametric analysis method, was applied. The scores of image quality, visual comfort, and preference were dependent variables. Saturation and image were fixed factors. The significance level was Sig. < 0.05. The effect size was evaluated using epsilon squared (ϵ 2 ), with value 0.01, 0.04, and 0.16 regarded as small, medium, and large, respectively [28]. Dunn's test was used for post hoc tests.
Friedman test results are presented in Table IV. The influence of saturation was statistically significant on all items, with large effect sizes. The influence of image was not significant. The post hoc results are presented in Table V. In low saturation, the score of image quality and preference was the lowest. The lowest score of visual comfort was found in high saturation. Subjective results accorded with HbO 2 response results: high saturation induced the largest PA and the most severe feelings of visual discomfort.

IV. DISCUSSION
The tendencies of HbO 2 response PA and questionnaire with saturation are shown in Fig. 9. Since the differences among images were not statistically significant, only the averaged values in different saturation levels were plotted. When the saturation ratio increased from low (0.13 of NTSC) to mid (0.73 of NTSC), PA decreased from 0.30 to 0.24 and visual comfort score increased from 3.13 to 3.45 gradually, between 'neutral' and 'comfortable'. When the saturation ratio increased further to high (0.92 of NTSC), changes were the opposite: PA increased to 0.37 and visual comfort score decreased to 2.54, between 'poor' and 'neutral', at a much greater rate. The negative correlation between PA and visual comfort was shown clearly. Differently, as the saturation increased, the score of image quality increased from 2.27 to 3.75, then to 3.85, close to 'good'; Thus, it was possible to assume a saturation range from 0.6 to 0.8 of NTSC, with lower visual discomfort, reasonable image quality and preference. As the saturation was larger than 0.8 of NTSC, the image quality and preference could be ensured while the induced uncomfortable visual feelings increased. On the other hand, if the saturation was smaller than 0.6 of NTSC, the visual comfort could be ensured while the image quality and preference decreased.
Considering the visual discomfort, the results of HbO 2 response and subjective evaluation matched well: more visual discomfort was accompanied by higher HbO 2 responses. In addition, the tendencies of image quality score and preference score were similar to each other and different from visual comfort. When the saturation ratio increased, the scores of both image quality and preference increased at first and stayed relatively constant to last. Our findings confirmed the association between visual discomfort and haemodynamic response.
Although research on natural images is limited, prior studies using coloured patterns showed the same findings. Gratings with larger chromaticity separations [10], [12] were found resulted in increased HbO 2 response amplitude and worse rating for visual comfort.
The visual discomfort was associated with haemodynamic response through brain activity. Increased activity of cortical neurons and demand for oxygen could be reflected by stronger HbO 2 response [8]. Similar results were found in studies using other brain activity measurements, including EEG and fMRI, as shown in Table VI. With larger colour differences, greater neural response [30] and cortical activations [12], [31] were found, along with stronger visual discomfort.
Our results also partly matched the influence of image colours on visual discomfort observed in earlier study. Penacchio et al. [29] reported visual discomfort kept rising when the average CIELUV chromaticity difference of art images increased from 0.005 to 0.025. They suggested that too-saturated colours were varied from nature, resulting in inefficient coding of the scene. In this study, the range of average CIELUV chromaticity difference was close to Penacchio et al.'s research: 0.007 for low saturation, 0.014 for mid saturation, and 0.018 for high saturation. Correspondingly, the visual discomfort was stronger in high saturation, with the largest chromaticity difference. Furthermore, in low saturation (0.13 of NTSC), compared to mid saturation (0.73 of NTSC), larger PA was found, while questionnaire results on visual comfort showed no difference. It could be deduced that HbO 2 response was more sensitive than subjective evaluation. Low saturation might interfere with the efficiency of visual information process. Such a change might be too weak to be detected by subjective feeling, while HbO 2 response PA increased because of slightly influenced visual perception. However, further work on the visual mechanisms with more data was required.
The subjective evaluations on image quality and preference changed differently, compared to the HbO 2 response and visual comfort scores. Image quality and preference in mid and high saturation were improved significantly compared with low saturation, the difference between mid and high saturation was not significant. This finding supported the idea that the HbO 2 response was especially related to visual discomfort/comfort, not other subjective viewing experiences.
It was clear that a larger colour gamut did improve image quality and preference. However, in this study, larger colour gamut (0.92 of NTSC) was found to cause visual discomfort. The results of this study indicated that mid saturation could achieve a better balance: the image quality and preference were ensured, and the visual discomfort was low, reflected by subjective feelings and brain activities.
It doesn't mean that a larger colour gamut would certainly induce visual discomfort. Very saturated colours are rare in natural scenes and would not occupy extremely large areas [32]. Thus, it was possible that in highly saturated WCG displays, the increasing saturated areas made these images on the screen look far from the corresponding scenes in nature, and led to visual discomfort feelings. Manufactures should avoid producing too many saturated colours for only pursuing the visual effects with WCG displays.
However, only three saturation levels, three more saturated images, limited participants, and one display with a certain colour gamut were tested here. Besides, the saturation variance between display and nature wasn't taken into consideration. Thus, the recommendations on colour saturation had some restrictions. Further studies, which take more variable levels into account, should be undertaken. Besides, an extended image database should be applied for more general evaluation.

V. CONCLUSION
The influence of colour saturation on visual discomfort was studied using heamodynamic response and subjective questionnaire. The results showed that increasing saturation (from 0.13 to 0.73 of NTSC) improved image quality and preference, while high saturation (0.92 of NTSC) could ensure the image quality and preference but induced visual discomfort. Low subjective visual discomfort was found in mid saturation (0.73 of NTSC), accompanied by lower peak of HbO 2 response. Differently, image quality and preference increased as saturation increased. We suggested that for WCG displays, when saturation of images increased, except for the improved image quality and preference, visual discomfort should be considered to avoid too many saturated colours. The results also showed that the peak amplitude of HbO 2 response was related to visual discomfort and more sensitive than subjective evaluation.