Group Perceptual Quality Optimization for Multi-Channel Image Encoding Systems Based on Adaptive Hyper Networks

Images and short videos that produced by social networks surge in recent years. Image/Video encoders, such as JPEG and H.264, are indispensably involved to reduce the transmitting bandwidth. However, based on our observation, the encoding parameters and their candidates are often preset to fixed values (or fixed candidate values) in real-world scenarios, which might not be the optimal bandwidth allocation strategy. Considering that, we propose an efficient group quality optimization (GQO) framework for multi-channel image/video encoding systems in which the encoding parameters are configured in a perceptual-quality-driven manner. The GQO framework employs adaptive hyper network to predict the relationships between encoding parameters, transmitting resources, and perceptual qualities, i.e., just taking the pristine image as input, the adaptive hyper network could accurately yield a global overview of perceptual quality and transmitting resource varied along encoding parameters. A step-by-step optimization procedure is then employed to search the optimal encoding parameter for each channel so that overall perceptual quality could be maximized under limited transmitting resource. Experimental results demonstrate the proposed GQO framework could achieve higher perceptual quality whilst maintain the same bandwidth compared to traditional allocation strategies where encoding parameters are preset.


I. INTRODUCTION
Recent years has witnessed the significant growth of social networks which produces massive amounts of images and short videos. Those digital images and videos could get contaminated in any stage of their lifecycle such as acquisition, compression, transmission. Consequently, research of Image Quality Assessment (IQA) is in great need to automatically evaluate the perceptual quality of images, based on which the Quality of Service (QoS) of video transmission systems (e.g., the social networks and live video apps) could be monitored, guaranteed, and optimized.
Despite increasing interest has been drawn in exploiting deep convolutional neural networks (DCNN) for designing no-reference/full-reference image quality assessment (NR/FR-IQA) and notable success has achieved [1], there has The associate editor coordinating the review of this manuscript and approving it for publication was Chun-Wei Tsai .
largely been a disconnection between research that aimed at evaluating perceptual quality issues and research that aimed at other downstream image processing tasks [2,3]. For example, in the field of IQA, researchers always focused on how to precisely calculate the perceptual quality of images solely and somehow neglect to optimize the perceptual quality of transmitting systems in real-world based on their promising work.
The disconnection between IQA filed and other downstream image processing tasks makes it difficult to incorporate those sophisticated IQA models into other tasks which might yield exciting results with the help of accurate quality control. Specifically, given a situation that need to compress N images or N videos and transmit them from locale X to locale Y via an approximately loss-less network with limited bandwidth B, the IQA models is capable of monitoring the perceptual quality of each image/video stream via applying FR-IQA evaluation in locale X or applying NR-IQA evaluation in locale Y , but that is far from satisfactory -the IQA model is capable of measuring the current quality of each image/video stream whilst fail to figure out how to re-allocate the bandwidth resources to achieve better overall perceptual quality. Therefore, our work analyzes how to incorporate IQA models into multi-channel image encoding systems to re-allocate the bandwidth resources for each channel so that the overall perceptual quality of the output could be optimized. Such research is of importance because existing popular social networks often contain multiple images rather than single image in a webpage, video content providers also have to compress multiple video streams simultaneously for users. Moreover, encoding parameters such as Q value for JPEG image encoders and bitrate/QP for video encoders could not precisely represent the perceptual quality, therefore incorporating IQA models as quality criteria and exploit a reasonable bandwidth re-allocating strategy based on such IQA quality score would benefit both the servers (saving transmitting resources) and clients (improving perceptual qualities).
It is difficult to yield bandwidth re-allocating strategy just given the current output of the encoders, that is mainly because by feeding current output into IQA model, we could only get the perceptual quality of current parameter settings and know nothing about which direction should the encoder parameters move towards by changing current encoding parameters to another.
Considering the obstacle mentioned above and inspired by the ratio-distortion curve used for testing the performance of video encoders, our work tries to get a global view of relationship between bandwidth and perceptual quality for each image channel by compressing the source into different versions using various encoding parameters and calculating the IQA prediction score for each of the distorted version (shown as Fig.1 Given the quality distribution O and transmitting resource distribution B, it is feasible to figure out the best bandwidth allocating strategy by optimizing: where N denotes the number of image channels, M denotes the number of selectable encoding parameters, B L denotes the j denotes the IQA prediction score and transmitting resources of the i-th image channel under the j-th encoding parameter respectively, k i denotes the index of selected encoding parameters of the i-th channel.
In this work, the coupled quality distribution O and transmitting resource B is called IQA incorporated ratio-distortion curve and abbreviated as IQA-curve. The overall quality of multi-channel image encoding system could get optimized by such IQA-curve according to Eq.1 and Eq.2. However, calculating the ratio-distortion curve for each channel is timeconsuming, the image has to be compressed for M × N times and the IQA models also have to be called for M × N times.
In order to save time and calculating resources, this work tries to predict the IQA-curve by resorting to deep learning techniques, i.e., we could design a DCNN that takes the original image as input and predict the IQA-curve directly without compressing the original image and evaluate the distorted image by IQA for multiple times.
The main contribution of our work is summarized as follow: (1) We propose an effective group quality optimizing framework for multi-channel image encoding systems by re-allocating the transmitting resources of each channel via IQA-curves.
(2) We design an adaptive DCNN that could predict the IQA-curves in one-shot, which significantly saves the calculating times and resources.
(3) We implement our GQO framework for multi-channel JPEG image encoding systems and achieve satisfactory results, indicating the proposed work is also promising for multi-channel image and video encoding systems.
It also should be noticed that the proposed GQO framework is quite different from prevalent dynamic adaptive streaming techniques. Firstly, existing adaptive streaming techniques such as HTTP-DASH [6] are designed to fit various network environments, e.g., when the transmitting bandwidth are limited, the HTTP-DASH could switch to lower bit-rate to guarantee the fluence of the playback, whilst our proposed GQO framework are designed for the multi-channel encoding system to make full use of the limited storage (or transmitting bandwidth) and to yield optimal overall perceptual quality, i.e., the HTTP-DASH are driven by the transmitting VOLUME 9, 2021 environments but our GQO framework are driven by perceptual quality. Secondly, although HTTP-DASH could switch different bit-rates according to the network environment of the clients, the candidate bit-rates are still preset and fixed, that means if the candidate bit-rates are preset to 100kbps, 500kbps, and 2000kbps, the HTTP-DASH server could only switch between those three candidate bit-rates and cannot jump to other bit-rates, in contrary, the GQO framework could analysis the IQA-curve of each input to pursuit the most suitable encoding parameters.
The rest part is organized as follow: Section II introduces several related IQA works; Section III illustrates our proposed Group Quality Optimized framework and its specific version for multi-channel JPEG image encoding system; Section IV shows the experimental results; and Section V is conclusion.

II. RELATED WORKS
With the help of Deep Neural Networks, numerous fascinating BIQA models are proposed in recent years. For example, Kang et al. [7][8] proposed a multi-task shallow CNN to learn both the distortion type and the quality score; Kim and Lee [9] applied state-of-the-art FR-IQA methods to provide proxy quality scores for each image patch as the ground truth label in the pre-training stage, and the proposed network was fine-tuned by the Subjective annotations. Similarly, Pan et al. [10] employed the U-Net to learn the local quality predicting scores previously calculated by Full-Reference IQA methods, several Dense layers were then incorporated to pool the local quality predicting scores into an overall perceptual quality score; Liang et al. [11] tried to utilize similar scene as reference to provide more prior information for the IQA model; Liu et al. [12] proposed to use RankNet to learn the quality rank information of image pairs in the training set, and then used the output of the second last layer to predict the quality score; Lin and Wang [13] tried to learn the corresponding unknown reference image from the distorted one by resorting the Generative Adversarial Networks, and to assess the perceptual quality by comparing the hallucinated reference image and the distorted image; Chiu et al. [1] proposed a new IQA framework and corresponding dataset that links the IQA issue to two practical vision tasks which are image captioning and visual question answering respectively; Su et al. [14] employed self-adaptive hyper network whose parameters could adjust according to image contents; Zhu et al. [15] leveraged meta-learning to learn a general-purpose BIQA model from training set of several specific distortion types.
As described above, recent state-of-the-art BIQA methods focus on predicting the distorted image solely, and take less consideration on how to make their proposed models incorporated into other downstream vision tasks. Amongst the IQA works above, only [1] tries to link IQA issues with other image vision tasks. Our work exploits how to leverage IQA models to optimize the perceptual quality of multi-channel transmitting systems, which is of much importance because such work could not only optimize the transmitting system both for servers and clients but also represents a beneficial attempt for linking the gap between IQA and other image vision tasks.

III. PROPOSED METHODS
In this section, we detail our group quality optimizing (GQO) framework based on learning IQA-curves. The proposed GQO framework is capable of optimizing the overall perceptual quality for multi-channel image transmitting systems.
We implement our GQO framework for multi-channel JPEG image encoding systems to verify its superiority, therefore the encoding parameters is the commonly used Q value for JPEG compression.
It should be noticed that the proposed GQO framework is also supposed to be applied in video multi-channel systems with limited adjustment of the DCNN, but the implementation of this work focus on JPEG image encoding system for the consistency of narrative and convenience of implementation.

A. PROBLEM FORMULATION
For a multi-channel JPEG image encoding system, we try to optimize the overall perceptual quality given limited transmitting resources, i.e., where N denotes the numbers of channels in the encoding systems meaning that N images are compressed at one time, B L denotes the limitation of the transmitting resources (or storage), o i and b i denotes the perceptual quality and transmitting resources of the i-th channel. As described earlier in Section I, just calculating the IQA predicting score for current multi-channel output is not enough to optimize the overall perceptual quality. That is mainly because we still cannot know how the overall perceptual quality would change if the transmitting resources is re-allocated. Therefore, we leverage the IQA-incorporated ratiodistortion curve (abbr., IQA-curve) to take a global view of the relationship between bandwidth and perceptual quality. The IQA-curve is calculated as shown in Fig. 1. Specifically, supposing the Q value candidates range from 2 to 100 with step 2 (i.e., encoding parameter P = [p 1 , p 2 , . . . , p M ], and M = 50, p 1 = 2, p 2 = 4, . . .), then the IQA-incorporated curve of i-th channel is comprised of quality distribution where o i j is obtained by feeding the j-th distorted version (compressed under j-th Q value) and the corresponding i-th original image into FR-IQA model, and b i j is represented by the file size of the j-th distorted version of the i-th original image.
Based on the IQA-curves of each channel, the optimized overall performance could be derived by searching an index vector K = [k 1 , k 2 , . . . , k N ] (denotes the selections of QP value for each channel) which obeys that In order to accelerate the time-consuming calculating procedure of IQA-curve for each channel, a self-adaptive hyper network is designed inspired by [14], which could precisely predict the IQA-curve in one-shot, and get rid of M ×N times of JPEG compression and IQA prediction.
The overall procedure of our proposed group quality optimized framework for multi-channel JPEG image encoding system is shown as Fig.2, which is mainly composed of two steps, i.e., a) using self-adaptive hyper network to predict the IQA-curve for each channel, and b) optimizing the overall perceptual quality according to the predicted IQA-curves and at last the encoding system would operate based on the optimized encoding parameters.

B. HYPER NETWORK FOR PREDICTING IQA-CURVE
Inspired by [14], a self-adaptive hyper network is designed to accurately predict the IQA-curve, and its pipeline is illustrated as Fig.3. Multi-scale features extracted by ResNet-50 are employed to regress the IQA-curves. Rather than directly fed the multi-scale features into traditional regression networks whose parameters are fixed for all images, our work employs a content understanding hyper network to generate customized regression parameters for each test image. The reasons we employ the content understanding hyper network are summarized as follow: Firstly, such model has been applied for blind predict IQA score in [14] and achieved satisfactory performance, the predicting of IQA score and IQA-curve are related; Secondly, different image contents yield discriminative patterns of IQA-curves, e.g., the firstorder derivative of IQA-curve obtained from different source images exhibit quite unsimilar patterns; Thirdly, based on our experience, using fixed parameters for all test images to predict a multi-dimension curve often leads to overfitting problem, which could be avoided by generating customized parameters from the hyper networks.
As described in Fig.3, the overall pipeline of our IQA-curve prediction model is similar with [14], but it is comprised of two parallel dataflows, the upper one is employed for predicting the transmitting resource distribution whilst the lower one is employed to for predicting the perceptual quality distribution. The overall structures of the two data streams are similar, so the illustration is focused on the lower one that employed for predicting the perceptual quality distribution.
Specifically, given an input image denoted as x, its sematic features are extracted by ResNet-50. The sematic features extracted from the last convolutional layers, denoting as S(x), are employed as the input of the hyper network whilst the multi-scale sematic features extracted from each conv-blocks, denoting as S O ms (x), are employed as the input of target network. The parameters of the target network, denoting as θ O x , is generated from the hyper network: where H O denotes the mapping function of hyper network and γ O represents hyper network parameters. Denoting the target network as φ O , the prediction of the quality distribution O p is then can be described as: Based on similar structures, the distribution of transmitting resources B p could also be predicted, the predicted IQA-curve could be drawn given B p and O p .
We make several modifications to ensure the self-adaptive model are fitted to our problem. The modifications are summarized as follow: (1) Rather than using L1 norm to measure the difference between the ground truth and prediction, our work adds the KL-divergence as another cost item to ensure the overall distributions between the prediction and ground-truth are similar, e.g., when predicting the quality distribution O p , the loss is defined as: (2) The model proposed in [14] limit the input size as 224 × 224, resulting into massive loss of local features when original input size is HD (1920 × 1080), considering that, the input size is modified to 640 × 384, the input HD image is firstly resize into 640 × 360 then padded its top and bottom to meet 640 × 384.
(3) Based on our observation, when predicting the IQAcurves, shallower layers conveys more detailed feature infor- mation which is more useful than higher level features, therefore the input of target network, i.e., S ms (x), is extracted by the first three conv-blocks (in [14], features from all conv-blocks are extracted).
(4) The size of the target network is adjusted from 224-112-56-28-14-1 to 480-960-480-240-50. The last layer is set to 50 because in our implementation the encoding parameter P (i.e., Q value for JPEG compression) totally has 50 candidates, which means the dimension of a quality distribution O or a transmitting resource distribution B is 50 (i.e., M = 50). The other values are all carefully selected, since the output of conv-layers from hyper network is assigned to the target networks, the dimensions of each layer in the target network should be divided by the width multiplied by height of the feature map in the last conv-layer of ResNet50, the feature map size is 20 × 12 given the input size is 640 × 384, therefore the dimensions should be divided by 240.

C. IQA CRITERIA SELECTION
There is still a vital thing need to be figure out, i.e., which IQA criteria should be selected to depict the quality distribution. Since original image is available in the compression system, full-reference (FR) IQA is therefore preferred.
There are several prevalent FR-IQA metrics, e.g., SSIM [16], VIF [17], GMSD [18], FSIM [19], VSI [20], and MAD [21], almost all of them could achieve satisfactory accuracy in predicting single distortion type (e.g., in TID2013 image database, the Pearson Linear Correlation Coefficients of nearly all above FR-IQA metrics with subjective score could obtain 0.9 towards JPEG compressed images solely). However, we would like to search for a more effective FR-IQA method that could predict the subjective score as accurate as possible. Based on [22] and [23], the nonlinear combination of multiple FR-IQA metrics would achieve better performance by resort of machine learning techniques, at the same time, [24] points that such nonlinear combination might lead to a generalization ability decline of the combined FR-IQA model.
In order to improve the accuracy of FR-IQA via nonlinear combination whilst maintain the generalization ability, this work propose a pre-training engaged multi-method-fusion FR-IQA model (PTMMF-FRIQA) and firstly pre-train it by prior quality knowledge to pursuit better generalization ability. Specifically, 1000 pristine images collected from internet are compressed by JPEG using 3 different Q values, which are 10, 35, 80 and representing 'Poor', 'Acceptable', 'Satisfactory' respectively. A classification network is then pre-trained by supervising it to successfully classify inputs (combination of FR-IQA scores) into correct quality labels ('Poor', 'Acceptable', 'Satisfactory'), which is shown as Fig.4. After that, the last layer is substitute by a fully connect layer whose output size is 1 and activated by sigmoid that more suitable for regression task. At last, training samples with subjective score are employed to fine-tune the regression network to accurately predict the perceptual quality. Ablation Experiments demonstrates such pre-train and fine-tune pipeline would yield satisfactory multi-method fusion FR-IQA metrics.

D. GROUP QUALITY OPTIMIZATION PROCEDURE
The group quality optimization procedure could be conducted based on the prediction of IQA-curve. Above implementations sampled 50 encoding parameters (Q value) ranging from 2 to 100 and denoted as P = [p 1 , p 2 , . . . , p M ] where M = 50, p 1 = 2, p 2 = 4 and so on. The predicted IQA-curve therefore contains two 50-dimensional vectors which representing the quality distribution transmitting resource distribution respectively.
Supposing N images are engaged in the GQO, the predicted quality distribution and transmitting resource distribution of the i-th image is denoted as respectively. Given a bandwidth limitation B L , the GQO procedure is described as follow: (1) The selection of encoding parameters for N channels is denoted as K = [k 1 , k 2 , . . . , k N ] and initialized with i ) and other elements of G is set to −1.
(3) The selection of encoding parameters is then updated by K (t+1) = K (t) + j * × δ(i * ), where i * and j * are selected by < i * , j * >= argmax <i,j> G ( i, j), and δ(x) denotes a N -dimensional vector whose elements are all zeros except the x-th element is 1.

the update stops and K (t) is
returned as the optimal parameter settings, i.e., the i-th channel uses the k (t) i -th candidate encoding parameter to conduct image encoding; otherwise the update procedure would jump to step (2) and continue.

IV. EXPERIMENTS
The involved training and testing images, performance of each component and the overall framework are illustrated in this section.

A. TRAINING SET CONSTRUCTION
Both the proposed adaptive hyper network in Section III-B and the multi-method fusion FR-IQA model in Section III-C need large amounts of training samples.
We therefore collected up to 1000 high-resolution images from digital cameras and online image resources. The collected images are cropped and rescaled into 1920 × 1080 while their original aspect ratio is maintained. Snapshot of part of the involved training samples are shown as Fig.5, which demonstrates that the established training set contains various content types ranging from ceremonies, sceneries, people, sculptures, animals, buildings, and so on. The established training set is mainly used for the training of adaptive hyper network and the PTMMF-FRIQA model, as for the fine-tuning of the PTMMF-FRIQA model, image samples with sophistically annotated quality labels are needed. Considering that, several popular publicly available image quality database (CSIQ [21], LIVE [25], and TID2013 [26]) are involved in the fine-tuning of the multi-method fusion FR-IQA model, which are listed as Table 1. Since the subjective annotations in different database has different types and ranges, higher MOS value means higher perceptual quality whilst the DMOS value is opposite, we therefore linearly mapped the subjective annotations into range [0,1] then reverse the normalized DMOS value by x , = 1 − x.

B. PERFORMANCE EVALUATION FOR PTMMF-FRIQA
As described in Section III-C, the PTMMF-FRIQA is employed to accurately annotate the perceptual quality of JPEG distorted images. A classification is firstly pre-trained by the established training set containing up to 1000 images, and after substituting the last layer, the changed network is fine-tuned by the IQA quality database. VOLUME 9, 2021 The inputs of the networks are 8 traditional FR-IQA scores including PSNR, SSIM, VIF, GMSD, FSIM, FSIMC, MAD, VSI. The baseline of the classification network is shallow BP networks with size 8-56-28-6-3, as shown in Fig. 4. Images in the training set is firstly compressed by JPEG with Q value 10, 35, 80 representing 'Poor Quality', 'Acceptable Quality', 'Satisfactory Quality' respectively. The classification network is then pre-trained to predict the quality labels of the 3000 distorted images.
After the pre-training procedure, the last layer is substituted by a 6-1 Dense layer activated by sigmoid, this new network is then fine-tuned by training samples in IQA quality database. We employ 5-fold cross validation to demonstrate the effectiveness of the proposed multi-method fusion FR-IQA model, i.e., for each IQA database, 80% of the JPEG distorted images are employed as training set and the others as testing set, such procedure repeats 5 times to guarantee all images are involved in testing set. The predicting score in testing sets are shown in Fig.6 and Table 2.  In order to further validate the multi-method fusion FR-IQA model, cross validation is conducted in which JPEG distorted images in LIVE is employed as training samples whilst JPEG distorted images in TID2013 is employed as testing samples. The testing results is shown as Fig.7.
It should be noticed that above results demonstrate the employed lightweight PTMMF-FRIQA could accurately predict the perceptual quality of JPEG distorted images and overwhelm others. That is mainly because PTMMF is trained specially for JPEG distortion whilst other metrics are towards a more general purpose. However, considering the proposed group quality optimization framework is for server-side, in which the distortion types are fixed aforehand (i.e., JPEG in this implementation), The proposed PTMMF is therefore the most suitable IQA metric for our GQO framework.

C. PERFORMANCE EVALUATION FOR ADAPTIVE HYPER NETWORK
The adaptive hyper network takes pristine images as input and yield its IQA-curve by predicting the quality distribution and transmitting resource distribution respectively. Therefore, we construct the training sample as follow: each image in the training set is compressed by JPEG compression with Q values ranging from 2 to 100 (the interval is 2). The perceptual quality of each distorted image is predicted by the multi-method fusion FR-IQA model, and its file size is also recorded, let o i j and b i j denote the perceptual quality and file size of the j-th distorted version of the i-th image, the ground-truth quality distribution and transmitting resource distribution of the i-th image is then denoted respectively. The adaptive hyper network is trained on one Tesla V100 GPU card with 32G memory installed on Dell Quarterly Server Tracker, the mini-batch size is set to 16 and other settings are similar with [14].
The 5-fold cross validation is also employed to verify the effectiveness of the IQA-curve prediction accuracy of the adaptive hyper network. The experimental results are shown in Fig.8. It should be noticed the file size and perceptual quality are normalized.
As shown in Fig.8, the adaptive hyper network could accurately predict the IQA-curve given solely one pristine image without cumbersome JPEG compression and PTMFF calculation. In addition, time costs of the proposed adaptive hyper network are also tested, in average, the time that predicting IQA-curve via adaptive hyper network for per image takes only 0.037 seconds (tested on Intel i7-8700@3.2GHz with GTX2080 by pytorch), whilst generating IQA-curve via multiple times of JPEG compression and PTMMF calculation costs up to 76.325 seconds (M = 50, tested on similar host PC by MATLAB, with GPU acceleration on). Therefore, predicting the IQA-curve via DCNN significantly saves the time-consuming.

D. PERFORMANCE ANALYSIS OF GROUP QUALITY OPTIMIZATION FRAMEWORK
Experimental results in section IV-B and IV-C indicate each component of our GQO framework is sensible and effective. The overall performance of the GQO framework is evaluated in this section.
The proposed Group Quality Optimization framework (shown as Fig.2) aims to allocates the transmitting resources for each encoding channel in a perceptual-quality-driven manner, i.e., by predicting the IQA-Curve of each channel, the GQO framework could intelligently yield customized encoding parameters for each channel so that the overall quality of the output achieves optima under limited bandwidth.
In opposite, existing multi-channel encoding framework neglects the diversity of IQA-curve between various image contents and tends to compress the images via fixed (or preset) encoding parameters (Q value or output file-size).
In order to demonstrate the superiority of the proposed GQO framework, two prevalent allocating strategies for multi-channel image encoding systems are involved as the comparison, i.e., the 'Q-fixed' mode and the 'B-fixed' mode. In the 'Q-fixed' mode, all inputs of the multi-channel encoding system are compressed via the same Q values, and in the 'B-fixed' mode, all inputs are compressed into nearly the same bandwidth (i.e., output file-size).
The evaluation procedure is shown as Fig.9 and described as follow. The encoding channels N are set to 40, i.e., 40 images are encoded simultaneously via the multi-channel encoding system. 100 images with various contents are downloaded from public-available website as the candidate inputs of the multi-channel encoding system. The evaluation procedure repeats for 3 epochs. For each epoch, 40 images are randomly sampled from the 100 downloaded images. As for each evaluation epoch, the 40 involved images are firstly compressed via 'Q-fixed' mode. The multi-channel encoding of 'Q-fixed' mode repeats 8 periods with ascending Q values, i.e., in the 1 st period, the Q value of all channels are set to 10, and the 2 nd period, the Q value of all channels are set to 20, etc. The overall output file-size of the 'Q-fixed' mode in each period is recorded, which is then employed as the   bandwidth limitation of the GQO framework and the overall output file-size of the 'B-fixed' mode.
The 'Quality Loss' is introduced to measure the performance of different modes of multi-channel image encoders. Specifically, considering the proposed PTMMF-FRIQA value (ranging from 0 to 1) could accurately assess the perceptual quality of images compressed with JPEG and higher PTMMF-FRIQA value means better perceptual quality, therefore, the output of the i-th channel coupled with its original input is fed into the PTMMF-FRIQA model yields o i , and the value 40 i=1 (1 − o i ) is then employed to measure the quality loss, i.e., higher value means the output of the multi-channel image encoder system suffers more quality degradation. The quality loss of the 'Q-fixed' mode, 'B-fixed' mode, and GQO mode is denoted as L Qf , L Bf , and L GQO respectively.
Based on the 'Quality Loss', we could conduct a more intuitive measurement to illustrate the superiority of the GQO framework, which is called the 'Quality Gain'. The Quality Gain of the GQO framework compared to 'Q-fixed' mode is calculated by (L Qf − L GQO )/L GQO , similarly, the Quality Gain of the GQO framework compared to 'B-fixed' mode is calculated by (L Bf − L GQO )/L GQO . Higher value of quality gain means that the proposed GQO framework could preserve more perceptual quality compared to 'Q-fixed' or 'B-fixed' mode.
Experimental results of the 1 st epcoh, 2 nd epcoh, and 3 rd epcoh are shown in Fig.10, Fig.11, and Fig.12 respectively, demonstrating the proposed GQO framework could significantly reduce the overall quality loss by a range from 1.5% to 49.2% compared to traditional Q-fixed and B-fixed mode, regardless of the image contents.
As shown from Fig.10 to Fig.12, although various image contents could achieve quality gain via GQO framework, but the effects are quite different, e.g., results in Fig.10-(b) is much obvious than that in Fig.12-(b). Further analysis is conducted to investigate what factors leads to such diversity of effects. Intuitively, if inputs of the multi-channel encoding system are all the same, their IQA-curves are also the same, and GQO framework could gain noting. Therefore, we measure the difference of IQA-curves amongst all input channels (denoted as D) as follow: Let O (i) and B (i) denote the quality distribution and transmitting distribution of the i-th channel, bilinear interpolating  . The difference of IQA-curves (denoted as D) is then obtained by calculating the standard deviation of Y amongst axis i and then calculating the mean of standard deviation, i.e., supposing N images are involved in the multi-channel image encoding, their difference of IQA-curves (D) is calculated by D = mean(std i (y i 1 ), std i (y i 1 ), . . . , std i (y i 100 )). The relationship between D and the quality gain in above three experiments are shown as Fig.13., demonstrating that large difference between IQA curves might leads to more obvious quality gain in our GQO framework.

V. CONCLUSION
This work presents an efficient group quality optimization framework for multi-channel Image encoding systems. Specifically, the GQO for multi-channel JPEG image encoding system is implemented and estimated. Without any modification of the native image encoder, the proposed GQO framework could effectively analyze the IQA-curve and improve the overall perceptual quality of output given limited transmitting resources. The time-cost is imperceptible via acceleration by adaptive hyper network. We therefore think the proposed work is capable of deployed into real-world multi-channel encoding systems and benefit a lot. The implementation of GQO for multi-channel H.264/HEVC video encoding system would be our future work. YONG CHEN is currently pursuing the M.B.A. degree with Zhejiang University. He has worked in the field of image and video encoding for more than 20 years. He is also the Vice President with Hangzhou Arcvideo Technology Company Ltd. He is also a Senior Engineer. He has filed more than 30 patents that are relevant to image and video encoding techniques.
DINGGUO YU was born in 1976. He received the Ph.D. degree in computer application technology from Tongji University, China, in 2011. He is currently a Professor and the Director of Key Laboratory of Film and TV Media Technology, Zhejiang, China. His research interests include media fusion technology, big data, and artificial intelligence for media.