Product Design Award Prediction Modeling: Design Visual Aesthetic Quality Assessment via DCNNs

A visual aesthetic is a crucial determinant of product design evaluation. Through the analysis of image features, not only can we evaluate the aesthetic level, but also we can reveal the whole quality of the design proposal. We assume that it could be a potential pattern to predict the ultimate success of the proposal in product design that a visual aesthetic can be a cue for award classification modeling. Consequently, we conduct investigation on a dataset of over 10,003 design submissions in a design competition held once a year from 2008 to 2018 in order to manifest the assumption. Due to the remarkable performance of deep convolutional neural networks (DCNNs), we compare seven deep learning methods to explore an optimal model for design award prediction based on product image analysis. The result of the experiments indicates that the proposed method achieves comparative accuracy in design award classification result predication, with the optimal classification accuracy of 70.79% using the SEFL-ResNet (Squeeze and Excitation – Focal Loss – ResNet) method.


I. INTRODUCTION
A high aesthetic quality of product design appearance can promote commercial sales tremendously. Consequently, a large amount of investment is put into visual aesthetic design and testing of new products before they are put on the market [1], [2], since visually appealing commodities and aesthetically pleasing packaging can attract consumers to purchase from offline and online retailers. According to the research of Bloch et al., visual aesthetics were considered as the top three critical attributes in product choice [3]. Aesthetics gives the product competitiveness and establish differentiation beyond basic attributes, which makes visually appealing products rival other products with similar functionality [3], [4]. Generally, aesthetics has been regarded as a significant strategy for product research and development, marketing, and brand promotion [5]- [8]. Presently, the evaluation of product design mainly relies on The associate editor coordinating the review of this manuscript and approving it for publication was Alberto Cano . the manual judgment from design experts, which is inevitably influenced by subjective preference.
Why are some products so popular that they can transcend their simple commodity category? Why can some product design schemes be chosen as winners among all the candidates' proposals? Can we model the design award prediction by machine learning methods? Various factors are affecting the awarding result of a design proposal. However, visual aesthetics is proved to be critical in product design evaluation by statistical analysis and user study [9]- [11]. We put forward a hypothesis that there could be a potential pattern to predict the successfulness of the proposal in product design that a visual aesthetic can be a critical cue for classification modeling.
The visual aesthetic level of product design scheme was explained by the fact that whether it is awarded in the competition, since excellent design proposal usually is also outstanding in visual aspect. Image aesthetics analysis can provide approach to predict the product design aesthetic level. In fact, image aesthetics computing has attracted many researchers' attention around the world [12]- [16]. Here, the relationship of visual aesthetic and image features were empirically estimated via computational algorithms in this study. We trained an aesthetic computing model based on a dataset of 10,003 product design images classified as ground-truth categories, which has 835 samples of awarded class (high aesthetic level), 4,990 samples of qualified class (middle aesthetic level) and 4,178 samples of eliminated class (low aesthetic level). Image features were extracted by DCNNs. We applied ResNet-50, InceptionNet, MobileNet, DenseNet-201, VGG-19, EfficientNet and SEFL-ResNet (Squeeze and excitation -focal loss -ResNet) to predict the classification of the design images. The result suggests that the deep learning method offers good utility and great potential to evaluate product aesthetic design automatically.
The main contributions of this research are concluded as follows. (1) A total of 10,003 product design images with ground-truth award category annotations were collected from a design competition held from 2008 to 2018 to form the novel database. (2) Multiple deep learning methods were compared to explore the optimal model for design award prediction, including ResNet-50, InceptionNet, MobileNet, DenseNet-201, VGG-19, EfficientNet and SEFL-ResNet. The best result was obtained by SEFL-ResNet with an average accuracy of 70.79% in classification. (3) The ratio in the total design samples in the awarded, qualified and eliminated class was 1: 6: 5. To train our aesthetic model with an imbalanced training dataset, re-sampling approach was studied in this work as imbalanced learning. The roadmap of this study is illustrated in Fig. 1.
This article is structured as follows: in Section 2, we review the related studies in the existing research literature; Section 3 introduces the applied DCNNs of ResNet-50, InceptionNet, MobileNet, DenseNet-201, VGG-19, Effi-cientNet and SEFL-ResNet including the methodology framework and feature analysis process; Section 4 provides the experiment procedure, including dataset construction and product aesthetic classification modeling; Section 5 presents the discussion of the experimental results; and Section 6 concludes the study and suggests the potential research direction and opportunities for further study.

II. RELATED WORKS
Various machine learning methods and statistical methods have been used in aesthetic computing, which have been proved effective by existing studies. The existing literatures have provided us with valuable classification methods, feature processing procedure and analysis of crucial factors influencing aesthetic judgments. Most of the research is based on the existing standard databases, such as AVA, Datta, and CUHKPQ etc., which studied the classification problem and score prediction of aesthetics level through statistical methods and machine learning algorithms. Specifically, the existing visual-aware product recommendation systems rarely take the aesthetic aspect into consideration. In conclusion, related research on product aesthetic assessment can be summarized in two parts: (1) Product visual appearance aesthetic quality computing and key design elements analysis; (2) Aesthetic image feature analysis and image aesthetic evaluation with various machine learning algorithms.

A. PRODUCT AESTHETIC MODELING
Here we have extensively reviewed the related works of aesthetic evaluation for product design from various aspects, including aesthetic factor analysis by statistical methods and aesthetic quality assessment by computer algorithms (see Table 1). W. Yu et al. took the aesthetic factor into the prediction of users' clothing preference. Clothing features of color (RGB, HSV, etc.) and semantic information (collar, hemline, fabric texture, and shape) were used in this study. The result shows that VRA-NMPR (Visually-aware Recommendation model-Neighbor-enhanced Multi-object Faced Personalized Ranking) outperforms VRA-MPR method 4.7% in F1-score [17]. M. Berghman et al. held two surveys to testify the principles in the Unified Model of Aesthetic and unity-in variety was proved to have the strongest impact on aesthetic pleasure [18]. Unity-in variety is thought VOLUME 8, 2020 to be a well-known principle of beauty. R. A. G. Post et al. investigated the influence of this factor on aesthetic appreciation. They found that unity is the dominant factor and it facilitates the appreciation of variety. Moreover, unity and variety both have positive effect on aesthetic appreciation. They arranged three studies to testify the factor correlation, using photos of lamps, espresso machines, car interiors, tables, and USB-sticks as the stimuli in the experiments [19]. Aesthetic appeal will follow certain rules of design. F. A. López et al found that objects in square shape, or have predominating straight lines are symmetrical, and they usually achieved greater acceptance. While objects with low color contrast that are arrhythmic with circular or curved forms prevailing, are less acceptable in the experiment. General elements of forms and shapes were selected as the stimuli for aesthetic acceptance study [20]. Aesthetic can not only be perceived in visual image, but also can be perceived in music and sound. The complexity of sound stimulus and sound preference was recognized as inverted-U pattern, which can achieve an optimal point. J. Delplanque et al. studied the sound complexity and its relation to aesthetic preference and confirmed this century-old pattern [21]. Apart from multimedia contents, we are also exposed to the influence of aesthetic labor in various services. S. Tsaur et al. provided a discussion on the linking between positive emotions, aesthetic labor and behavioral intentions. And they found that aesthetic labor has a significant effect on the other two factors [22]. P. Shamoi et al. applied FHSI color model and used fuzzy linguistic variables and hedges to propose a classification method of aesthetic judgment. The model can predict the preference level towards clothing colors. In their study, a total of 10,000 images with fashion looks were collected to form the fashion database. The result shows that users' preference for color schemes is influenced by preferences for the component basic colors and ratings of color harmony [23]. In the study of N. Myszkowski et al., the visual aesthetic sensitivity test is designed to perform aesthetic judgments. They revised the VAST proposed by Götz in 1985 to achieve a substantially improved unidimensionality and structural validity [24]. V. Böβ proposed an innovative method for luxury yachts coating process. The aesthetic criteria were discussed and applied to create mathematical features and limitation of outer surface creation [25].
Aesthetic experience is not within the appreciation of images, but also within various multimedia contents. S. G. B. Johnson et al. conducted three studies to explore how people rated the similarity of simple mathematical arguments to landscape paintings, piano music, and they found participants' rating mainly relied on three dimensions for beauty judgment, including elegance, profundity and clarity [26]. Aesthetic experience is also important in website usability assessment. Y. Liu et al. discussed the principles of aesthetics in interface design in several aspects, including friendly design, color simplicity, proportion and symmetry [27].
Moreover, aesthetic evaluation can be also applied in visual attention region detection. D. N. Anh et al. presented a geometric aesthetic approach for visual attention region extraction from images, which can identify the region of interest in image. In this study, golden ratio features, balance of lighting, and non-local difference were applied to predict attraction region by SVM. The method was proved to be effective with a MSE of 0.01 in the modeling result [28]. Aesthetic perception is also existed in the art of linguistics. Metaphor displays aesthetic effect in various scenarios. Q. Yang et al. held a really interesting study to investigate metaphor's attractiveness, which indicated that attractiveness is positively correlated with figurativeness, imageability, romance and arousal, while it is negatively associated with familiarity. Hierarchical regression model and one-way ANOVA methods were utilized for analysis [29].
Additionally, physiological features analysis can be a way to detect human aesthetic perception on images. M. Cheung et al. collected 80 artistic paintings and 80 images of fashion window displays to explore the classification of beautiful/not beautiful images. They measured the human EEG responses on these image stimuli to found that positive frontal alpha asymmetry was stimulated by beautiful commercial stimuli [30].
How a product is perceived as pleasurable? P. Lin et al. studied this question and investigated the eight pleasure factors of product design based on Jordan's theory. A total of eight wooden crafts images were used for pleasure level rating in this study. The experimental result indicates that the most important attribute is color (49.5%), proportion (26.7%) and shape (23.8%) with conjoint analysis [31]. A product design with aesthetic improvement can increase commercial sales by 30% or more. The aesthetic value is critically important to product commercial value. The researchers applied machine learning method to augment user evaluation in new product aesthetic design process. Probabilistic variational autoencoder (VAE) and generative adversarial networks (GAN) were combined to train the image datasets of 1,836 rated car design schemes. The method performed well in the appeal aesthetic prediction, with a 38% improvement than conventional machine learning method. And new design proposals were generated by GAN for consideration, which proposed a promising method for automatic aesthetical product design [2].
Aesthetic evaluation of product design is also studied as a decision-making activity. The researchers proposed a new angle to solve the problem of aesthetic assessment. J. C. Arbeláez et al. used CARE to get feedback from users by their mobile devices. The study mainly developed an application to collect user aesthetic feedback by CARE, which is promising that the designers are able to obtain user's perception in a collaborative way [32]. Y. Wu et al. used continuous fuzzy Kano model to process the ambiguity of users' need, then FAHP (Fuzzy Analytic Hierarchy Process) was applied to compute evaluation criterion weight. Finally, the study discovers the crucial factors that influence the attractiveness of electric scooter design, which can provide reference for consumer perception analysis [33]. In the experiment conducted by C. Spence et al., the results emphasize the similarity in aesthetic preferences for the horizontal / vertical alignment of visual perception of either paintings or plates of food. The results show that VOLUME 8, 2020 people prefer linear food elements. The participants were asked to rotate the food by moving the cursor around the center of the image until it achieved the best aesthetic quality [34]. An aesthetic-driven image cropping method was proposed based on a regression network. The experiment result outperforms the state-of-art methods, with an IoU (Intersection over Union) of 0.843 and BDE (Boundary Displacement Error) of 0.029 [35]. L Deng et al. collected 24 homepage of e-commerce websites to analyze the aesthetic quality. Complexity and order were selected as the salient aesthetic features for web pages. The study reveals that complexity and order have significant effect on customer's preference. There is a moderating effect of motivational orientation on consumer's preference for web complexity, while there is no such influence on the preference for web order [36]. Also, in the study of W. Huang et al in 2012, they revisited the relationship between novelty and aesthetic preference. A total of 213 photos of chair design were selected as stimuli. They employed three dimensions of product semantics, including trendiness, complexity and emotion. Both complexity and emotion show inverted-U relationships with user aesthetic preference, and trendiness presents a small positive linear relationship with aesthetic preference. The trendiness dimension has the greatest effect on novelty. Besides, a moderate level of novelty can trigger higher aesthetic preference, in comparison to product that are very typical or very novel [37]. C. Lo et al. used genetic algorithm to optimize the product form in 2015. They combined aesthetic measurements principles in the study to improve the aesthetic quality of product appearance [38]. Similarly, in the research of V. Cheutet et al. in 2005, fully free-form deformation features are introduced to be classified and it is proposed to create an efficient access to the desired shape in CAS/CAD systems [39]. R. A. G. Post et al. held studies on the aesthetic appreciation of six types of products. They found that unity is the dominant factor and it facilitates the appreciation of variety. Both unity and variety have positive effect on aesthetic appreciation [19]. The most related studies include W. Yu et al. [17], A. Burnap [2] and P. Lu al. [35], where they used the DCNNs methods to build aesthetic level evaluation model based on image datasets. The studies of aesthetic assessment on product appearance mainly used statistical methods and factor analysis to investigate the relationship between design factors and the product's aesthetic level. The computational model for aesthetic assessment of industrial product design based on large dataset is less studied in the literature.

B. IMAGE AESTHETIC MODELING
Apart from the aesthetic perception studies by psychological research method, the image aesthetic can be studied by subjective visual attributes analysis [40]- [44]. The image aesthetic evaluation method can provide the basic method and empirical research procedure [45]- [50], which can be utilized as references in product aesthetic evaluation. Machine learning algorithms, especially deep learning networks, were widely used in image aesthetic score prediction and aesthetic quality classification [51]- [54]. We have reviewed the existing studies of image aesthetic computing and listed the works in Table 2. C. Zhang et al. proposed a CNN model for aesthetic classification. Global average pooling is employed to generate an aesthetic activation map and attribute activation map, which represents the likelihood of spatial location aesthetic quality and the likelihood of different attributes. They built the classification model to identify images of high and low aesthetic quality based on AVA dataset and achieved an accuracy of 78.87% [55]. Early in 2014, L. Guo et al. used SVM classifier to obtain a highest classification accuracy based on CUHK dataset. They combined the hand-crafting and semantic features in the experiment to achieve the best performance [56]. Aesthetic perception is also style-adaptive. F. Gao et al. combated the limitation of aesthetic annotation collecting and proposed a novel automatic aesthetic rating method. Aesthetic-aware features were extracted by CNN, and then SVM was applied to make style classification. Finally, they explored a multi-task learning method to learn the style-specific aesthetic evaluation model with an accuracy of 79.2% [57]. Y. Kao et al. developed an A&C CNN, which can simultaneously assess the aesthetic quality and classify the image category. The framework of A&C CNN has three specific CNNs for different categories. The classification accuracy achieved 86.2% based on AVA dataset [44]. X. Zhang et al. combined classification and regression method by multi-task learning framework to identify high/low aesthetic quality. This study addresses the challenges of taking fix-size patch as training sample and some neglect of ordinal questions in user aesthetic evaluation. GLFN-Net is equipped with random cropping method to extract the local fine-details information to obtain an accuracy of 82.95% in the experiment [58]. F. Lemarchand et al. built a cross-dataset aesthetic classifier based on brain-inspired image feature, extracting percentage distributions for orientation curvature, color and global reflectional symmetry. They conducted the experiment on the Datta dataset and AVA dataset using DNN method to achieve an accuracy of 71.63% [59].  niques [62]. Y. Chen et al. proposed a CNN-based framework by calculating the textual and visual attributes with graphlet-based weakly supervised attributes learning method instructed by the corresponding textual attribute. The experiment demonstrates the effectiveness and inseparability of modeling components, including sparsity-constrained textual attributes, weakly supervised visual attributes localization and the normalized CNN training [63]. F. Gao et al. proposed DeepSim based on VGGnet, which can accurately predict the image quality across image datasets of CSIQ, LIVE, LIVEMD and TID2013 [64]. R. Datta et al. downloaded a total of 3581 photos from Photo.net to build the Datta dataset to discriminate images of aesthetically pleasing and displeasing with SVM and linear regression, achieving a classification accuracy of 70.12% [65]. In the study of E. Mavridaki et al, they combined the image features of simplicity, colorfulness, sharpness, pattern, composition with generic images to enhance the modeling based on SVM. The modeling performance achieves an accuracy of 85.02% in the experiment [66]. Y. Kao et al. showed that the semantic information is beneficial to aesthetic feature learning and the high-level features are important in aesthetic quality assessment [67]. Y. Tan et al. proposed an improved neural network in aesthetic computing of photos in 2016 [68]. Then they held the view that a single patch could not represent the whole image, and they cropped 10 patches of the images in the experiment and trained a fine-tuned network to predict the photos aesthetic level [69]. X. Tian et al.
proposed query-dependent model using DCNN equipped with fewer parameters and fewer convolutional layers, which achieves better performance. The experimental classification accuracy of AVA dataset is 80.38% and the accuracy result on CUHKPQ dataset is 91.94% [70].
In general, image aesthetic computing has been investigated with different methods. Crucial elements for visual aesthetic perception have been analyzed with factor analysis and conjoint analysis. The existing studies can provide useful techniques and suggestions for research focus of design factors. By comparison, our work has three major contributions: (1) This study focuses on the exploration of product design aesthetic evaluation modeling, and a novel real-world product visual appearance image dataset was built in the experiment; (2) DCNNs methods were utilized to predict the product' visual aesthetic quality; (3) We trained our model based on the novel dataset of product design, which has ground-truth annotations collected from product competition award results. The comprehensive study which compared advanced machine learning algorithms and utilized implicit aesthetic annotation of awarded/qualified/eliminated proved to be effective for product design aesthetic assessment.

III. METHODOLOGIES
With respect to judging design aesthetic level of a large dataset efficiently, computational aesthetic method is in great demand in many scenarios, including new design concept evaluation, design competition review, and product online purchasing. And thus the computational evaluation can filter out the ones with low aesthetic level and free people from a great deal of work of preliminary reviewing. The framework of the experiment comprises two stages. We first set the collected product design proposal image into the standard size according to the requirement of each network. Then, we used multiple DCNNs methods to extract the deep feature and form the image feature set. Afterwards, we conducted aesthetic quality three-class classification of awarded/qualified/eliminate product designs based on the feature set to make a rough identification of high aesthetic quality proposals.
Many existing studies have used diverse DCNNs yielding satisfied results in aesthetic computing. Consequently, we extracted image features using ResNet-50, Inception-Net, MobileNet, DenseNet-201, VGG-19, and EfficientNet, and then fully connected network was applied in the training process. Meanwhile, SEFL-ResNet was proposed by improving ResNet-50 using extended SE-block. The experiment was implemented for a three-class classification to separate the design proposals of different aesthetic levels in the competition selection, emulating the procedure in the real-world design competition. Image features extraction approach and the selected methods are introduced below.  Table 3. A total of 11,507 product design images were collected in the original dataset. After removing the images with low resolution, damaged image files and repeated images, a total of 10,003 high-quality images (300 dpi) were gathered to build the database for modeling. The data acquisition for this aesthetic modeling study has been permitted by the Hardware Product Design Competition committee.

B. FEATURE EXTRACTION VIA FINE-TUNED DCNNs
In this work, we compared multiple DCNN models to learn the product visual aesthetic quality classification and make a ranking prediction. We fine-tuned ResNet-50, Inception-Net, MobileNet, DenseNet-201, VGG-19, EfficientNet and SEFL-ResNet in the aesthetic computing tasks. For each network, we set the corresponding model parameters and choose a subset of the product aesthetic dataset for finetuning. Finally, we got the deep image feature as the output by utilizing a fully connected layer for the modeling construction in the next step. The specific feature extraction procedure details for each approach are described as follows: A bilinear interpolation loss function was utilized in the feature extraction by each network. Bilinear interpolation loss function is an image preprocessing method, which is used for pixel calculation in image resizing. Interpolation works by using known data to estimate values at unknown points, which can solve the problem that the image will lose some pixel value in resizing [71]. Rectifier Linear function (ReL) was set to be the activation function in the network layers. Image features obtained in the extraction were reduced to a 512-dimentional features vector as an output for aesthetics learning.

C. DEEP NEURAL CLASSIFICATION MODELS
In this study, we implemented seven model training methods for comparison based on the product aesthetic database, including InceptionNet-V3, MobileNet, ResNet-50, DenseNet-201, VGG-19, EfficientNet and SEFL-ResNet, since, according to the existing related studies, these methods have been used in image aesthetic model exploration and have achieved great performance [35], [57], [62], [64], [72]. Here we summarize the characters and general framework of the applied deep learning networks in this section.

1) INCEPTIONNET
In pursuit of higher performance and better efficiency with less network weight, InceptionNet, proposed by Christian Szegedy et al. in 2016, is also a benchmark for deep learning network development [73]. It has been successfully applied to a large variety of computational tasks, including image aesthetic computing. It has less computational cost and network complexity than the standard deep convolutional networks, such as VGGNet and AlexNet. It has a total of 22 layers. The structure is formed with six layers of 3 × 3 convolutions and one layer of pooling, then it was followed by three layers of inception modules one pooling layer, and a logits layer, and the last layer is softmax as output. It implemented a combination of fewer parameters, effective regularization with batch-normalization method and labelsmoothing, creating a solution of high-quality networks. The inception network pipeline is described in Table 4.

2) MOBILENET
MobileNet was presented by Andrew G. Howard et al. in 2017 [74]. It was developed for mobile and embedded vision application based on a streamlined framework, aiming to construct a lightweight deep learning method, renowned for its efficiency. It has two global hyper parameters which can trade off latency and accuracy to achieve the balance. It was proved to show strong modeling performance across various experiments and learning task in image analysis. The framework of MobileNet has 28 layers. The architecture of MobileNet is based on depth-wise separable convolutions, while the depth-wise convolution applies a single filter to each input channel, described in Table 4. The depth-wise separable convolutions are formed with two layers, including depth-wise convolutions and point-wise convolutions. Batch-norm and Rectifier Linear function (ReL) are both used in MobileNet for both layers. Compared with the standard convolutional networks, MobileNet has much fewer parameters, less computation load and competitive accuracy, which constitute the superiority of this method. The network structure of MobileNet is shown in Table 5.

3) RESNET-50
In 2015, Kaiming He et al. proposed ResNet, a residual learning network to ease the network training optimization, which was deeper than the existing networks at that stage, and won various prizes on the tasks of image detection and localization [75]. ResNet has two basic blocks, the identity block and the convolution block.
The existing DCNN network AlexNet and VGG usually are constructed to directly learn the mapping between input and output, while ResNet uses multiple layers to learn the residual representation between input and output. Experiments show that it is much easier (with faster convergence) and more effective (with higher classification accuracy can be achieved by using more layers) to learn residuals directly by this structure. However, the network training time of ResNet is relatively long, which limits its application. The network structure of ResNet-50 is presented in Table 6.

4) DENSENET-201
DenseNet was proposed by Huang et al. in 2018 [76]. Comparing to ResNet and InceptionNet, it is constructed in a new structure that is simple and effective. The network superiority can be concluded in several aspects, including vanishinggradient reduction, enhancement of features delivery between layers, high efficiency in features utilization, and less network parameters. DenseNet has an innovative design in the dense block structure, so that the number of output feature maps of each layer can be really small. An implicit deep supervision is obtained, by means that each layer can get access to the gradients directly from the loss function and the original input data, so that it can reduce the gradient disappearance problem. The specification of the network layers are shown in Table 7. [77]. VGG explores the relationship between the depth of a convolutional neural network and the network performance. By stacking 3 × 3 small convolutional kernels and 2 × 2 maximum pooling layers repeatedly, it can successfully construct a convolutional neural network with a depth of 16-19 layers. VGGNet uses 3 × 3 convolution core and 2 × 2 pooling core to improve performance by continuously increasing the depth of the network structure. There are five segments in VGGNet structure. In each segment, there are 3 × 3 convolutional kernels, which are followed by a maximum pooling layer. Afterwards, there are a total of three full connection layers and a softmax layer for the final output. The details of the network layers are shown in Table 8.

6) EFFICIENTNET
EfficientNet was proposed by M. Tan et al. in 2019, which balancing network specification for better performance [78]. EfficientNet is a new scaling method, which uses a simple and efficient composite coefficient to enlarge the network from three dimensions: depth, width and resolution. It does not scale the network dimensions arbitrarily. It can obtain the best set of parameters (coefficients) based on neural structure retrieval technology. EfficientNet is achieves a fast computation speed and significant model performance comparing to the existing networks. The network specification of EfficientNet is presented in Table 9.

7) SEFL-RESNET
Squeeze-and-Excitation-Focal Loss-ResNet is an improved ResNet-50 network with a squeezed excitation block and focal loss function. In the experiment, we constructed SEFL-ResNet aiming to pursue an optimal accuracy in image aesthetic assessment task, considering the model performance and efficiency. SEFL-ResNet has the advantage  of Squeezed-and-Excitation block inserted as a unit in the network structure, which can recalibrates channel-wise feature responses adaptively and improve the accuracy with little additional computational cost.
Focal loss function facilitates the network with an improved cross entropy solution, in order to solve the data imbalance problem [79]. The standard cross entropy method can control the weight (w pn ) of positive and negative samples. VOLUME 8, 2020 However, it cannot control the weight (w ed ) of samples that are easy to classify and difficult to classify. Consequently, focal loss was proposed by T. Lin et al, which can adjust both w pn and w ed . The focal loss F loss is specified as follows: In which, α is the focusing parameter, α >= 0. And (1 − k t ) α is the modulating factor, which is set to reduce the weight of easy negatives and make the model concentrate on the samples which are difficult to classify. In the proposed SEFL-ResNet structure, extended SE-block with double fully connected networks is inserted between layers of conv2_x, conv3_x, conv4_x, conv5_x and Dense1_x, in order to solve the problem of exploiting channel dependencies with squeeze operation and fully capture channel-wise dependencies with excitation operation [80]. Here we use global average pooling as squeeze operation. Then three fully connected layers are adopted as a bottleneck structure to construct the correlation of channels and output the weights. In the improved Squeezeand-Excitation block structure, a fully connected network layer was added to extend the SE block processing, which enhances the resolution of network to features. We minimize the number of feature dimensions to be 1/16 of the input dimensions and activate the features with ReL to be as many as the input ones. After that, sigmoid is used to obtain normalized weights in scale of 0∼1. Finally, a scale operation is used to add the normalized weights to the features of each channel. The output feature map is then operated with a three-layer fully connected network, in which average pooling, dropout and softmax are adopted between each layer to prevent over fitting problem. The specific network structure of SEFL-ResNet is presented in Table 10 and Fig. 2.

IV. EXPERIMENTS
The aim of this research is to build the optimal computational model of assessing the design aesthetic quality and find the relationship between image features and image aesthetic level. We evaluate the selected DCNNs algorithms that have potential in aesthetic modeling based on the Hardware Product Design Competition image dataset.
In the first experimental section, we first divided the dataset randomly into two sections, so that 75% of the images were set as the training set, and 25% were set as the testing set. We adopted modeling performance indices of ACC and loss to evaluate the result of different methods. The experiment procedure is described in Fig. 3.

A. DATASETS 1) PRODUCT DESIGN IMAGE COLLECTION
We collected a total of 10,003 design submission proposals from the design competition called ''Hardware Product Design Competition,'' which is held once a year in China. The submissions from the year 2008∼2011, 2017, and 2018 were gathered to form the product aesthetic database. The image samples are shown in Fig. 4. The submission proposals were annotated as three classes, awarded designs, qualified designs and eliminated designs, according to the competition result of the year. The design proposal images constructed a real-world dataset, which were all evaluated by expert judges from industrial design companies and renowned design colleges in China. The award results are relatively subjective to be the implicit index for product visual aesthetic quality.
It is worth mentioning that we noticed the aesthetic style of product design is changing over time. There is a notable improvement in the overall product design quality of the submissions year by year, owing to the development of computer design assistant tools and the improvement of design ability in the industry. Consequently, it might cause some influence in the classification modeling.   and 802,816 features were obtained by SEFL-ResNet. Of the dataset, 75% were used for model training and 25% were applied for model testing.

B. PRODUCT AESTHETIC CLASSIFICATION MODELING
We explored the aesthetic classification model based on the product design aesthetic dataset. Specifically, we split the dataset into two parts, so that 7,502 submitted images were used for training, 2,501 submitted images were used for testing.

1) CLASSIFICATION MODEL CONSTRUCTION
The submitted images were resized to a standard size for each network. The DCNNs networks were trained based on the datasets. The specific network training parameters of each method were described as follows: VOLUME 8, 2020 In the training experiment of InceptionNet-V3, MobileNet, ResNet-50, VGG-19, DenseNet, EfficientNet and SEFL-ResNet, each model was trained for 100 epochs with a batch size of 64. The network learning rate was 1e-5, and decreased step-wise by a factor of 0.5, with a patience of 9. The min_lr was set as 1e-12. The factor value affects the step size of the decline of learning rate. And the value of patience will be adjusted when the accuracy is no longer improved. In the training process, fully connected network was applied for classification. Each model was conducted with 10-fold cross validation.

2) CLASSIFICATION BASED ON IMBALANCED DATASET
It is worth noting that this product aesthetic dataset is highly skewed, that the ratio of awarded/qualified/eliminated designs is approximately 1: 6:5. This circumstance of data imbalance is a direct result of the preferential mechanism in the design competition, therefore only a small proportion was selected as the winners. The imbalanced dataset learning is quite challenging. Based on the theoretical literature on this issue, several effective approaches in the model training proved effective, including imbalanced learning, down-sampling, up-sampling and weighted-loss. Applying imbalanced learning method, there will be no modification on the original dataset. Whereas in training modeling with the down-sampling method, seeing that the total number of awarded ones is 835, the images of the awarded class were enriched to be as many as the qualified class in the up-sampling approach. Finally, we applied re-sampling method that combines the up-sampling and down-sampling approaches in imbalanced classification.

3) IMAGE PREPROCESSING AND DATA ENHANCEMENT
Specifically, the optimized up-sampling method adopts heuristic techniques. We used image generator to generate enhanced and standardized data of the awarded class by Keras [81]. Image sample enhancement generator creates modified samples after each iteration, aiming to achieve data enhancement result. Data   The generated image samples after enhancement are presented in Figure 5.
In the mean time, we conducted down-sampling to qualified and eliminated classes. Then we removed the redundant samples and noise samples to make a more balanced dataset for modeling. Finally, the proportion of each class in the obtained balanced dataset is close to 1:1:1.

4) MODEL PERFORMANCE EVALUATION METHOD
Finally, several model performance metrics were reported to evaluate the model performance, including classification accuracy and loss value. Classification accuracy (ACC) [82] and loss is the widely used indices in classification performance evaluation. Specifically, ACC refers to the proportion of correct samples predicted by the prediction model. In this experiment, ACC is calculated as: accuracy = P awarded + P qualified + P eliminated N total (2) in which, awarded P is the number of corrected classified awarded samples, qualified P is the number of corrected classified qualified samples, and eliminated P is the number of corrected classified samples in eliminated class. Model accuracy is the measurement used to determine which method is the best at identifying patterns variables or features based on the training data. The better a method can generalize to 'unseen' data, the better insights it can produce. The learning speed will be very slow at the beginning of model training, when it is learning with gradient descent method. Consequently, the loss function of classification problem, neither the classification error rate nor the mean square error is the most suitable one. Here we used categorical crossentropy as the loss value for model evaluation. One-hot encoding method was applied to record the labels. For a dataset samples of M categories, then the labels set is labels= (1, 2, . . . , m). Therefore, if the label of the i th sample is m, it is set as y i , m = 1.
Categorical-crossentropy is used to evaluate the difference between the real distribution and the probability distribution obtained by training model [83]. It describes the distance between the actual output (probability) and the expected output (probability). The smaller the value of categorical-crossentropy is, the closer the two probability distributions are. It is indicated that SEFL-ResNet model performed better than the other fine-tuned DCNNs. The modeling results comparison is shown in Table 11.

V. RESULTS AND DISCUSSION
Experiment conducted on the product aesthetic dataset indicates that the proposed method is comparative due to its classification performance. Researchers have sought a computational way of aesthetic judgment in many previous studies; however, the user preference and subjective perception increase the difficulty of aesthetic modeling. In this study, we performed the aesthetic modeling as a classification problem of predicting image aesthetic level. An overall classification accuracy of 70.79% is thus obtained in model validation experiment, which is consistent with the results in the existing works. The specific accuracy, parameters setting and FLOPs of the models are presented in Table 11. Moreover, Fig.6 illustrates the loss during the applied networks training process, and Fig.7 presents the accuracy in the models training procedure.
The classification model was built based on the 75% of dataset of submission, and then it was tested on the rest 25% samples in the dataset. We compared the performance of four types of DCNNs to reveal that InceptionNet and EfficientNet obtained the lowest classification accuracy of 67.51% in verification experiment, and SEFL-ResNet achieved the  Recall that the optimal aesthetic binary (High/Low) classification accuracy in existing literatures is in a range of 70%∼80%, only a few studies can achieve an accuracy of 90% for binary classification task, see Table 2. The three-class classification result of the proposed method has achieved a satisfactory result, which corroborated the reported level of accuracy.
In the training experiment, the optimal classification performance becomes stable after 100 epochs of training. In the model performance evaluation, confusion matrix can provide additional evidence to measure the performance, see   is often used as the final evaluation method. It prevents the detailed classification performance of each category, which indicates the balance between the precision and recall.
Specifically, in the confusion matrix, labels in the abscissa are the prediction labels of awarded/qualified/eliminated, and labels in the ordinate are the true labels. For example, in Fig.8 the value of (awarded, awarded) in the matrix for SEFL-ResNet is 0.67, which indicates that the testing images of awarded class (true label) are predicted to be awarded (predicted label) by SEFL-ResNet with a probability of 0.67. After that, in the block of (eliminated, awarded), the value of 0.04 means that the testing images of awarded class are predicted to be eliminated class with a probability of 0.04. The higher the probability value is, the higher the corresponding classification accuracy is. According to the information in the confusion matrixes of the seven networks, SEFL-ResNet shows a relatively balanced prediction probability of each class, with an accuracy of 0.67 for the awarded class, 0.6 for the eliminated class and 0.74 for the qualified class.

B. IMBALANCED CLASSIFICATION METHOD
In fact, a hybrid method will be the optimal method to handle the data imbalance problem. Adjustment and exploration is to be made on the dataset construction and the network structure.
Here, we used the re-sampling method to balance the dataset. By increasing the numbers of training samples in awarded class and reducing the numbers of samples in qualified class, the samples distribution becomes more balanced. Consequently, the recognition rate of awarded class is improved and the generalization ability of the model is more optimal. The most direct way of up-sampling method is to copy the samples in the rare-class, but it often leads to over learning problem with poor performance of classification of rare-class.
The optimized up-sampling method adopts heuristic techniques. We used image generator to enrich data source of the awarded class. In the meantime, we conducted down-sampling to qualified and eliminated classes. Then we removed the redundant samples and noise samples to make a more balanced dataset for modeling. Finally, the proportion of each class in the obtained balanced dataset is close to 1:1:1.
In summary, the existing image aesthetic datasets were rated by public annotators, whereas the personal preference will influence the perception of aesthetic. The dataset collected from design competitions used award results as the ground-truth annotation, which was rated by design experts in a professional perspective, providing a promising dataset for the related research field. Besides, the method scheme can be applied in intelligent design evaluation process as a useful design assistant. The product design aesthetic evaluation with the proposed method can obtain an effective result that is consistent with human judgment.

VI. CONCLUSION
Currently, it is still challenging to predict the visual successfulness of a product scheme before a new design concept is put on the market. Designers and marketing professionals are still carrying heavy works in evaluation and analysis in pre-design and development [84], and it might be influenced by empirical and subjective factors. An increasing numbers of intelligent design methods have been applied in design industry [85], and deep learning approaches were proved to be effective in understanding product perception pattern and social manufacturing paradigm [86], [87]. We were unaware that visual appearance can be assessed with efficient aesthetic computational method based on large design schemes database as an auxiliary way. There are much discussion and debate around the rationality of deep learning method for automatic evaluation problems. However, it is also proved effective in many aesthetics level evaluation study. Here we set the product aesthetics level evaluation as a classification problem for computation. In the exploration of giving computer intelligent visual aesthetics cognition ability, deep learning methods provide possible avenue in product visual aesthetics quality evaluation.
Therefore, the current study proposed an exploration focusing on computing of the product visual aesthetic quality using the DCNNs method. Our methodology is proved to be effective on product aesthetic classification prediction based on the novel aesthetic database of product design proposals. We mainly included deep image features as the cue for aesthetic evaluation. According to the experiments, the proposed model result suggests that design proposal aesthetic quality is highly related to overall design standard, and it can be a determinant in predicting the level of the design scheme.
Specifically, a real-world design aesthetic dataset collected from works of design competitions with ground-truth annotations were built. Experiment conducted on the product design aesthetic dataset demonstrates the effectiveness of SEFL-ResNet compared to InceptionNet, MobileNet, DenseNet-201, ResNet-50, EfficientNet and VGG-19. A final aesthetic prediction is achieved with an accuracy of 70.79%. This study builds the connection between product design visual aesthetic quality and design scheme image features.
It is an interdisciplinary study and the research is related to design art and computer science. There are several avenues for the further study. Firstly, we plan to separate the content of a design proposal, for instance, design notes, figure of product main view, figures of product details. A comprehensive measurement of the aesthetic quantification of text and figures can be studied. Secondly, an efficient design evaluation system for design competitions can be developed to make a primary screening among the submissions, which can provide some ranking suggestion for reviewers. Thirdly, a fusion of multi-channel physiological signals responded to aesthetic stimuli and image features can be used to study personalized aesthetics preference modeling. The method is promising to evaluate the design scheme in the conceptual design stage and in assessing the product market positioning from the visual appearance aspect. BAIXI XING received the bachelor's degree from the Nanjing University of Aeronautics and Astronautics, and the Ph.D. degree in digital art and design from Zhejiang University, Hangzhou, China, in 2014. She worked as a Postdoctoral Researcher with the College of Computer Science, Zhejiang University, from 2015 to 2018. She is currently an Assistant Professor with the Institute of Industrial Design, Zhejiang University of Technology. Her research interests include affective computing and multimedia retrieval. At present, she is focusing in multimodal emotion recognition and cross media retrieval. She is also interested in the research of human-computer interaction and user experience design.
HUAHAO SI is currently a Graduate Student with the School of Media and Design, Hangzhou Dianzi University. His research interests include affective computing, multimedia information analysis, and music information retrieval.