Gabor CNN Based Intelligent System for Visual Sentiment Analysis of Social Media Data on Cloud Environment

Social media contains a plethora of information in the form of text, images, videos, and other data. Users across the globe are increasingly sharing their data on various social media platforms. Sentiment analysis of data, such as text, images, and videos are widely used to understand the feelings of users. In recent years, the convolutional neural network (CNN) has been extensively applied for various applications. The cloud computing environment is a popular service due to its reliability, availability, and easy software integration. However, CNN models are deep neural networks that have a high computational cost. There is a need for CNN models which utilize lesser computational resources especially when these models are deployed in a cloud environment due to the remote physicality of servers, resource optimization, and infrastructure cost reduction. In this research, Gabor filters are integrated with CNN models to improve image sentiment analysis in a cloud environment, with advantages such as the reduction in computation energy and time, the elimination of the need for pre-trained models, and a perceived accuracy improvement. Two variants of Gabor-CNN (G-CNN) models with a different number of pooling and normalization layers are developed. The proposed G-CNN is trained and tested using five standard databases as SentiBank, Twitter, MVSO, MultiView_I, and MultiView_II. Maximum classification accuracies of 91.71%, 92.52%, 97.39%, 90.88%, and 91.31% are obtained on SentiBank, Twitter, MVSO, MultiView_I, and MultiView_II databases respectively using the developed models. The proposed G-CNN model has provided an accuracy of 92.76% on average.


I. INTRODUCTION
Cloud computing has become more popular due to its accessibility, flexibility, and reduced time to bring a model to the market [1], [2]. The sentiment analysis of data collected from social media is a significant step in social data analysis [3], [4], [5] and it provides an understanding of people's opinions and behaviors [6]. Social media data analysis using cloud computing is most vital to bring the analysis into real-world The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei .
applications. The analysis of the sentiment has widespread applications in marketing, service sectors, opinion polls during elections, investments, and product analysis [7], [8], [9], [10]. The sentiment analysis has been applied to uncover hidden customer behavior from social media data [11]. Along with textual data, social media has a huge collection of both images and videos. Visual sentiment analysis is related to understanding the sentiment of images and videos. In Borth et al. [12] visual sentiment analysis was carried out and the Visual Sentiment Ontology (VSO) was developed with the Adjective Noun Pairs (ANP). Image sentiment classification VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ was performed in [13] using deep networks. Liang et al. [14] have developed a cross-domain semi-supervised deep metric learning for image sentiment analysis. Preethi G. et al., [15] have developed recursive neural networks (RNN) along with deep learning (DL) to perform sentiment analysis of reviews on a cloud platform. In Arulmurugan et al. [16], an MLbased intelligent system was developed and segmentation ranking followed by sentiment classification was performed on the cloud platform. Moreover, issue related to working with secured data and data sharing in the cloud has been addressed in [17] and [18]. A survey on applications of DL for image sentiment analysis can be found in [19]. Cloud computing services provide high reliability and mobility software integration, also their services require effective utilization of resources. Even though CNN-based models have been effective for various image and video based applications, they demand high computational cost [20], [21]. The computational cost is critical when a CNN model is deployed in the cloud computing environment due to the remote physicality of servers [22], resource optimizatoin [23], infrastructure cost reduction [24]. Hence there is a requirement for CNN models that are more effective in terms of computational cost while improving performance. In recent works, Gabor filter integration with CNN models has shown many advantages. Chang et al. [25] initialized CNN's first layer with Gabor filters and fine-tuned it during training. In [26] Gabor filters were introduced in two CNN layers. Both these studies showed a decrease in computation energy and time consumption during the training of CNN. Chen et al. [27] have used Gabor based CNN model for hyperspectral image classification and showed that Gabor filters can mitigate the overfitting problem in DL networks. Meng et al. [28] have developed Gabor based CNN models and shown a reduction in computational energy and time of 17-19%. The steerable Gabor filters have the property of extracting low-level features from images thus utilizing them eliminates the need for transfer learning. Moreover, experiments conducted in [29] emphasize that the use of Gabor filters helps in fast convergence.
In this research, we integrate Gabor filters with CNN and developed two new Gabor-CNN (G-CNN) models. Our research aim is to take the benefits of Gabor integrated CNN in the cloud environment and develop high-performing Gabor integrated CNN models. In this study, G-CNN models are developed for sentiment image analysis in the cloud environment. Moreover, there is almost rare to none cloud platform-based intelligent system for image sentiment prediction. The objectives of the work are, (i) Modified 3D information diagram based method to design the Gabor filters.
(ii) Integration of Gabor filters with CNN for visual sentiment recognition.
(iii) To develop two new G-CNN models on the cloud platform.
In our research, two new G-CNN architectures are developed. The first network is G-CNN with 6 max-pooling (G-CNN-6M) and this network uses max-pooling layers after every convolutional layer. The second G-CNN uses six max pooling and normalization layers (G-CNN-6MN). In this network max-pooling and normalization are applied after each convolutional layer. Five benchmark databases namely: SentiBank [12], Twitter [13], MVSO [12], MultiView_I [30], and MultiView_II [30] are used. The cross-validation (CV) results of the proposed models are compared with models without Gabor filters (CNN-6MN). Section 2 covers the literature review and the databases used in this study is given in section 3, the Gabor filters design using 3D information is outlined in section 4. The two proposed G-CNN architectures are presented in section 5. The experiment setup and results are given in sections 6 and 7 respectively. Followed by them the discussion in section 8. Finally, the paper concludes in section 9 by providing a summary of the main findings of the work.

Algorithm 1 Algorithm for Dataset Preparation
. . , t n } text messages obtained from the Twitter, . . , A n } AMT workers for the task of assigning labels. Output: Refined dataset of images 1: Select the images D = {x i , t i | x i ∈ X and t i ∈ T selected images or text samples} 2: Annotate the images and A j ∈ AMT } y i the label for the image and z i for text assigned by an AMT worker A j 3: Refine the images: D = {x i , y i |(x i , y j , A k ) ∈ D and ((Case:a) or (Case:b) or (Case:c) or (Case:d))} Case a: SentiBank ∀A k ( y i = y j ), 1 ≤ k ≤ 3 and |AMT | = 3 Case b: Twitter ∀A k ( y i = y j ), 1 ≤ k ≤ 5 and |AMT | = 5 Case c: MultiView_I and MultiView_II ∀A k (y i = y j and y j = t j ) Case d: MVSO

II. LITERATURE SURVEY
Machine learning (ML) and natural language processing (NLP) were used to analyze the text shared by users to understand the emotional contents [10], [31], [32] and enhanced sentiment analysis using DL in [33] and [34]. In Roy et al. [35], a support vector machine (SVM) classifier was used to perform the sentiment classification of the tweets. An aspect-level sentiment classification using mutual attention neural network was developed by Jiang et al. in [36]. Aspect-based sentiment identification is a complex task as it is more implicit. Ishaq et al. have tuned a CNN model using GA in [37] on the semantic and word2vec transformed features. A hybrid DL model was developed in [38] using various word embedding techniques. Based on the features extracted the sentiment classification was performed.
CNN has been a widely used DL model for computer vision, image recognition, and pattern recognition. It has the advantage that it can learn both feature representation and classification without human intervention. Therefore, DL architecture such as CNN has been extensively utilized for complex data such as healthcare systems [39], object detection [40], and time series analysis [41]. Szegedy et al. have developed a CNN architecture 'Inception' in [42] for the classification of images from ImageNet. A detailed discussion on various practical measurements of DL models such as parameters, power consumption, utilization of memory, inference time, etc. was presented in [43]. In Biswas et al. [44] a dilated CNN model was utilized for the segmentation of retinal fundus images. An incremental dilation CNN model was proposed in [45] which effectively performs the classification of MRI images. In Samui et al. [46] have discussed many applications related to image analysis, speech, NLP, and risk analysis. A review was conducted by Al-Saffar et al. on CNN for image classification tasks in [47]. CNN was used to classify encrypted images of vehicles on the cloud [48]. CNN along with SVM has been used by Hossain et al. in [49] for facial emotion recognition on cloud computing. The work also focused on fine-tuning the architecture and provided an analysis of pattern visualization learned by CNN. Md Zahangir Alom et al. [50] have presented various advancements in DL along with state-of-the-art architectures.
Siersdorfer et al. [51] have utilized the SentiWordNet thesaurus to gather sentiment values from the metadata of photos shared on Flickr. Then they developed ML-based techniques to perform sentiment prediction of images using visual features. They reported 70.00% accuracy on the Flickr dataset. The visual sentiment analysis was presented by Borth et al in [12] on the social images. They developed an image library SentiBank for recognizing adjective-noun pairs. Their contribution was the development of ANPs and VSO to represent the sentiment of images. Using linear SVM the classification accuracy of 70.00% was achieved on SentiBank. The classical feature-based sentiment analysis [52], [53], [54] was continued until ML-based analysis has been widely applied. The mid-level features of an image have been utilized by Yuan et al. [55] for sentiment prediction. They also used eigenface-based facial expression detection for the additional mid-level feature. They observed 68.00% accuracy using SVM and logistic regression (LR). In Borth et al. [56], the deep CNN was trained based on the Caffe framework. In their experiments, an improvement in accuracy of 62.3% with respect to the binary SVM classifier for Sentibank 2.0 was observed. Wang et al. [57] proposed an Unsupervised SEntiment Analysis (USEA) framework for sentiment analysis of social media images. Using USEA they achieved an accuracy of 56.18% on the Flickr dataset. The Flickr and Twitter datasets of sentiment images were prepared by You et al.
in [13]. A progressively trained CNN and transfer learning were developed for image sentiment analysis on these datasets. Several manually label Twitter images were incorporated and an accuracy of 78.60% was noted on the Twitter dataset. Baecchi et al. [58] have used multimodal features such as text and images and developed a sentiment analysis system for social network data. Neural network based models were tested on the Twitter dataset and achieved 80.00% accuracy. Kumar et al. [59] have developed the sentiment analysis for Flickr and Twitter images in two phases. In the first phase ANP was generated using the CNN model and in the second phase sentiment prediction was carried out using the SVM classifier. In their experiments, prediction results were given specifically to ANP as sentiment analysis was carried out in two phases. Various CNN-based models were designed [60] for automated visual content analysis. Several variations such as CaffeNet and Oversampling on the CNN models were performed and a maximum accuracy of 84.40% was obtained. Kumar et al. in [61] designed a multimodal-based sentiment analysis model. The region CNN was used for image sentiment classification which gave 76.04% accuracy on SentiBank. Most of the works in recent years have focused on using CNN models or ML models for sentiment prediction of images. A lower accuracy of 56.18% was obtained in [57] using the unsupervised method. The highest accuracy of 84.40% was observed in [60] utilizing CNN CaffeNet. A summary of recent works on sentiment image classification is presented in Table 1.
A detailed survey on the visual sentiment analysis methods has been carried out by Ji et al. in [3] and a review on the use of DL is provided by Ain et al. in [4]. Luan et al. [62] have used Gabor filters in deep CNN for object recognition with rotation and scale variations. In their experiments, Gabor-based CNN outperformed regular CNN. A handwritten recognition using Gabor filters based on CNN was presented in [63] and a speech recognition model was presented in [25]. Liu et al. have incorporated a hybrid Gabor filter binarization technique to improve the memory efficiency of CNN in [64]. Sarwar et al. [26] developed the CNN model with two Gaborbased convolutional layers. Gabor feature identifier and CNN were utilized to detect the parts of face images and extract the intrinsic features from the face images in [65]. In our research work, Gabor filters are integrated, and two new CNN models are developed on a cloud environment for visual sentiment recognition. CNN integration with Gabor filters has two main benefits. Firstly, decrease in computation energy [26], [28] and time [28], [29]. Second benefit is an improvement in performance [27], [29]. We also developed a modified 3D information diagram method to design Gabor filters that are suitable for integration with CNN.

III. DATABASES FOR SENTIMENT IMAGES
Five databases for sentiment images mentioned in [19] are used in this work. Table 2 shows the database, total number of images, and number of positive and negative images. SentiBank [12] is a benchmark database used for visual VOLUME 10, 2022  sentiment analysis which is prepared using Twitter images. This database includes 603 tweets on 21 various topics, built using 21 hashtags on various topics such as human, social, event, location, technology, and people. The ground truth for 2000 tweet images is prepared from Amazon Mechanical Turk (AMT) annotation. Three independent turkers have performed the task of labeling on image only, text only, and both image and text into positive, negative, and neutral classes. This database consists of 603 images which are unanimously agreed upon by all the three turkers. The Twitter database present in [13] is gathered from Twitter messages which have images in them. This database consists of 1269 images then all the images are labeled by five AMTs. The images considered for this database are such that all the five AMTs agree, thus there are 581 positive and 301 negative sentiment images in the Twitter database. MultiView_I and MultiView_II databases are prepared from [30] and collected from the tweets containing 406 emotional words. These emotional words cover most human feelings. The text and images are labeled by annotators separately in positive, negative, and neutral. Both MultiView_I and MultiView_II databases include images for which all annotators have assigned the same label as that of text. Total number of images received same label as text are 4109. We prepared two databases MultiView_I and MultiView_II with a total number of images in both databases of 702. There are 351 negative images after the annotator assigned the labels, and they were included in MultiView_I and MultiView_II. The remaining 351 positive images were randomly selected from 4109 images to be included in MultiView_I and MultiView_II. Thus, we have an equal number of positive and negative images in MultiView_I and MultiView_II.
Multilingual Visual Sentiment Ontology (MVSO) database images are collected from Flickr [12]. Around 3316 ANPs are prepared based on the various human emotions. For each ANP, 1000 images are gathered from Flickr, and overall there are more than one million images. ANP-associated Flickr images are given to three AMTs who have assigned values between -2 to 2 independently. The turker evaluated database consisting of 11733 images and in their work images for which at least two turker assigned the same polarity value are selected. In our research, the database consists of 4911 images with 4504 positive and 407 negative images. In SentiBank, Twitter, and MVSO more positive samples than negatives are available in datasets. For MultiView_I and MultiView_II there is an equal number of positive and negative samples available for training and testing. Image augmentation [66] is one of the techniques to address the class imbalance however our main focus of this work is to develop Gabor filters integrated CNN model and show performance on the sentiment image datasets. Except for MultiView_I and MultiView_II, other databases have a different number of positive and negative samples. The databases SentiBank, Twitter, and MVSO have class imbalance as shown in Table 2. Also, the existing state-of-the-art methods were developed on a similar sentiment image dataset. In our experiments, the databases with class imbalance are used to show the comparison with existing methods. The database preparation is presented in Algorithm 1 and details of the five databases are given in Table 2.

IV. A MODIFIED METHOD TO DESIGN GABOR FILTER SUITABLE FOR CNN
In this section, a modified method based on the information diagram [67] is presented for the Gabor filter design. In our approach, the 3D information diagrams are generated from the Gabor responses, which are then utilized to select suitable parameters. A Gabor filter is a sinusoidal plane wave with a specific frequency modulated by a Gaussian envelope. Gabor filters have properties such as optimal localization in spatial, tunable to specific frequency, and orientation sensitiveness. A Gabor filter is represented using equation (1). The Gabor filter has parameters such as σ is the standard deviation of the Gaussian function, θ is the orientation of the normal to the parallel stripes of the Gabor function, λ is the wavelength of the sinusoidal factor, γ is the spatial aspect ratio, ψ is the phase offset and the size of the kernel is κ.
Gabor filters have to be designed by selecting the parameters to utilize them in the convolutional layers of CNN. Many researchers have fixed the Gabor parameters [68] and few authors have selected parameters manually [69]. However, selecting parameters manually has the disadvantage that Gabor responses may not capture features from input space. The effective parameter must be selected such that important features are represented during the convolution operation in G-CNN. In [67], the Gabor parameter selection is discussed. They introduced the concept of an information diagram, which represents the magnitude of Gabor responses. The magnitude of the Gabor response is obtained at a particular point on the image after the convolution of filters with the image. In [67] authors have proposed an extended information diagram, in which several parameters are varied to construct the Gabor filters. Then the magnitude of the Gabor responses for various parameters are obtained to form the information diagram. In this study, the 3D information diagrams are prepared by varying standard deviations of the Gaussian function σ , spatial aspect ratio γ , and the phase offset ψ. The sinusoidal wavelength λ is considered relative to the size of the filters λ = κ − 2. In our method, n is the number of Gabor filters to be designed, then the orientation of Gabor filters are chosen using (2), For each of θ i a 3D information diagram is generated with standard deviation varying between κ/2 ≤ σ i ≤ 3κ/2 in steps of (3κ/2 − κ/2)/n. The other parameter spatial ratio is between 0 ≤ γ i ≤ 150 in steps of 150/n and phase offset is between 0 ≤ ψ i ≤ 2π in steps of 2π/n and obtain the Gabor response for an image ξ . The Gabor response r ξ is produced using (3) convolution operation as The 3D information diagram for θ i is formed as given in (4), Thus, a 3D information diagram is generated for each θ i , 0 ≤ θ i ≤ 2(n − 1)π /n and n the number of 3D information diagrams obtained as (5).
For each, θ i the parameters such as standard deviation σ , spatial aspect ratio γ , and the phase offset ψ are selected ( σ , γ , ψ) using information diagram (5) such that ID θ i has the maximum values using (6).

V. NEW G-CNN ARCHITECTURES AND CLOUD ENVIRONMENT
After designing suitable Gabor filters as discussed in section 4, they are integrated with CNN. Two new Gabor integrated G-CNN architectures are developed for this study. These two architectures are developed based on CNN discussed in [70], [71], [72], and [73]. The linear stack of the layered architecture of CNN [29], [74] is considered while developing G-CNN models. Both the proposed G-CNN architecture have max-pooling and normalization layers in different configurations. The max-pooling layers perform the downsampling of the input image and thus help to reduce the number of parameters to learn during the training of G-CNN. The use of max-pooling layers can also reduce overfitting [75]. The normalization layers are used to reduce the internal covariate shift [76]. The first model consists of six max-pooling layers after every convolution layer as shown in Figure 1. This model is denoted as G-CNN-6M. Figure 2 shows the second architecture which employs both max pooling and normalization layers in between the convolution layers and this model is represented as G-CNN-6MN. The CNN is a DL model which has been successfully applied to spatial data. CNN is essentially a multilayer neural network inspired by the visual perception of animals. One of the earlier successes of CNN models is AlexNet [77] thereafter many different CNN models are developed to apply for real-world problems [78]. Usually, a CNN model has multiple layers such as a convolutional layer, pooling layer, normalization layer, and at the end a few fully connected layers. Typically, in a CNN architecture, the convolutional layer and pooling are arranged one after another to extract feature representation from the images. Different kernels or filters are present in a convolutional layer whose primary task is to compute the feature maps [50]. Normalization standardizes the input before the layer, which is useful to accelerate the learning process [76]. The flattening and fully connected layers are the last layers of CNN. The fully connected layers consist of multiple layers of neurons connected in succession. In the flattening layer, a single-dimensional feature vector is created, and the final classification task is carried out by the fully connected layer. On the input sentiment image, the feature representation at layer l is computed using convolution operation as the following equation (7).
where C l feature representation is computed for layer l using convolution operation ⊗ on C l−1 input feature map and N number of filters Q l and number Ch is the number of channels. A typical convolution layer can be defined using equation (8) similar to that given in [27].
The i th feature representation matrix for the convolutional layer l is c l i computed using (8) with n number of input feature representations, c l−1 j is j th feature representation of l−1 layer, q l ji filters, b l i initialized to zero, f (·) nonlinear function. In our G-CNN, the filters are of the form Q l = [q l ji ] initialized by Gabor filters for the first layer and Glorot for the remaining layers. During the CNN learning phase, the weights of filters Q l are updated using the backpropagation process by the partial derivatives δ on the loss function and learning rate η VOLUME 10, 2022  as given in (9), Cloud computing offers several advantages such as quick deployment, flexibility, reliability, etc [79]. Moreover, deploying CNN architecture on the cloud has the advantage of high availability and usability [79]. However, most CNN architectures have high computation complexity and required high computation resources [80]. The CNN architecture which has lesser utilization of computational resources is highly required. In recent years, Gabor filters integration with CNN has shown many advantages. Firstly, Gabor filters integrated with CNN layers have been found to reduce computation energy and time [25], [26]. In [28] study showed a decrease in computation energy and time of 17-19% when Gabor filters are used with CNN. Secondly, it is shown by authors [77], [81] that, filters of CNN after training with real-life images tend to resemble Gabor-like filters. Hence introducing the selected Gabor filters in the first layer of CNN will make the model learn more effectively and eliminate the need for transfer learning [29]. Lastly in [29], the integration of Gabor with CNN can yield better image classification performance as reported using MNIST [82], CIFAR-10, and CIFAR-100 [83] databases. In this research, Gabor filters are integrated with CNN and two architectures G-CNN-6M and G-CNN-6MN are developed. The integration of Gabor with CNN for image sentiment classification in the cloud environment is depicted in Figure 3.    Table 3.
θ, σ , γ , ψ chosen based on the 3D information diagram. Table 3 shows the parameters selected for n = 16 filters. Figure 4 shows all 16 Gabor filters for the selected parameters. Figure 4(a) is the Gabor filter for λ=9, θ=0, σ =7.8, γ =1, and ψ=2.36. Figure 4(b) to Figure 4(p) show the Gabor filters for parameters of Table 3 from row 2 to row 16. It can be observed from Table 3 and Figure 4 for filter size κ, wavelength λ = κ − 2, and θ between 0 to 15π/8 different values of σ , γ , ψ are chosen based on equation (6). Moreover, the shape of the Gabor filters is varying according to the respective parameters.
An image from SentiBank is convoluted with each filter of Figure 4 and the responses are generated. The Gabor responses are depicted in Figure 5. In this figure, different frequencies and spatial content of the image have appeared in the filtered response. Figure 6 shows the 3D representation  Table 3.
of Gabor responses for 16 filters of Table 3. The response of Gabor is shown along the z-axis. It is evident from Figure 6 that, the varying spectral and spatial image content is highlighted in the Gabor responses corresponding to the parameters of Table 3.
The proposed two G-CNN models, G-CNN-6M and G-CNN-6MN, are trained and evaluated using five databases using 10-fold CV techniques. Before the training process, each image of the database undergoes several pre-processing steps. We prepare the size of each image to resemble the same shape as the input shape of the first convolutional layer. Also, the pixels of the images are normalized between values of 0 to 1. The various performance metrics such as accuracy, sensitivity, precision, specificity, and F-score are obtained using a 10-fold CV.

VII. EXPERIMENT RESULTS
In this section, we elaborate on the experimental results obtained on the two proposed G-CNN architectures for sentiment image classification. The results obtained for G-CNN-6M and G-CNN-6MN are compared with the CNN model without Gabor filters (CNN-6MN) that is trained and evaluated on the same databases. Table 4 shows the results of the 10-fold CV obtained for models CNN-6MN, G-CNN-6M, and G-CNN-6MN on the SentiBank database.
Each G-CNN model is combined with 16, 32, 48, and 96 Gabor filters at the first convolutional layer, and performance is tabulated in Table 4. During both training and testing phases, the G-CNN-6MN model has yielded better accuracies compared to the G-CNN-6M models due to the normalization layer of G-CNN-6MN. A lower accuracy of 84.08% and 77.12% are obtained using training and test datasets respectively with the G-CNN-6M model. In most  Table 3.   and 16.09% respectively. As the number of filters increased from 16 to 96, which increased the overall performance of the model.  As the number of filters is more in G-CNN-6MN with 96 filters it could reduce the error during the learning phase to the lowest value compared to other models. Table 5 shows the results with a10-fold CV for CNN-6MN, G-CNN-6M, and G-CNN-6MN models using the Twitter database. A minimum accuracy of 66.67% for G-CNN-6M with 32 filters and maximum accuracy of 92.52% for G-CNN-6MN with 48 filters is obtained using the test dataset. Specificity varied from 26.91% for G-CNN-6M with 16 filters to 87.38% for G-CNN-6MN with 32 filters. The increase in the number of true positives and then reduction in false positives and false negatives improve the F-score. For the G-CNN-6MN model using 48 Gabor filters, the highest F-score of 94.41% is obtained. The CNN-6MN model without Gabor filters that have achieved the test accuracies of 83.45%, 82.43%, 82.43%, and 83.22% with 16, 32, 48, and 96 filters respectively which are lower compared to G-CNN-6MN.
Using the Twitter database, the performance of the G-CNN-6MN model with different filters is presented in terms of the confusion matrix in Figure 9.   The CV results for two G-CNN models using the MVSO database are given in Table 6. The G-CNN-6M with 16 filters has provided an accuracy of 92.04% and 91.61% using training and testing datasets respectively. For the same model sensitivity of 99.67%, specificity of 2.46%, and a F-score of 95.61% are obtained. The G-CNN-6MN models have yielded better results compared to G-CNN-6M. The CNN-6MN has obtained accuracies between 81.43% to 82.37% using the test dataset. Among all G-CNN-6M models, maximum accuracy of 94.26% is obtained using the test dataset for the G-CNN-6M model with 96 filters. Maximum accuracy of 99.96% is VOLUME 10, 2022   obtained with training and 97.39% with the testing dataset for the G-CNN-6MN model with 48 filters. Figure 11 shows the confusion matrix obtained for the G-CNN-6MN model using the MVSO database. The true positive rate of 91.19%, a false positive of 0.53%, a false negative of 2.08%, and a true negative of 6.22% are obtained for G-CNN-6MN with 48 filters. Figure 12 shows the training and validation losses obtained for the model during the 25 epochs of the training phase.  the MultiView_I database. These results are tabulated in Table 7. G-CNN-6M with 32 filters obtained an accuracy of 57.41% with the test dataset. As the number of filters increased from 16 to 96, the accuracy of classification gradually increased. The maximum classification accuracy of 76.07% is obtained using CNN-6MN with 96 filters. All the models of G-CNN-6MN have obtained above 99% train accuracies. Maximum accuracy of 90.88% is obtained using the test dataset for G-CNN-6MN with 32 filters. The confusion matrix for the developed G-CNN-6MN model using the MultiView_I dataset is shown in Figure 13.   For the MultiView_I database two models with a different number of filters are trained and the losses obtained during training and validation are depicted in Figure 14. Table 8 shows the results obtained using the G-CNN model with a 10-fold CV using the MultiView_II database. The CNN-6MN has given classification accuracy between 75.08% to 75.79% using the test dataset. Compared to G-CNN-6M, the G-CNN-6MN has provided better accuracies for train and test datasets. The model G-CNN-6MN with 96 filters has obtained a maximum accuracy of 91.31% and a F-score of 91.22% using the test dataset with the Mul-  tiView_II database. The confusion matrix obtained for the G-CNN-6MN model is shown in Figure 15. The training and validation losses for the G-CNN model with the Multi-View_II database are shown in Figure 16.

VIII. DISCUSSION
The two G-CNN models G-CNN-6M and G-CNN-6MN are developed in this research. The G-CNN architectures are depicted in Fig. 1 and Fig. 2. Visualization of the feature maps helps to understand the internal representation of the convolutional layers in a CNN model [84]. The feature maps of convolutional layers provide insights into the learning and VOLUME 10, 2022 functioning of the layer [85]. We carried out the visualization of feature maps of the G-CNN-6MN to explore the functions of the layers. The second model G-CNN-6MN has seven convolutional layers in it. In Fig. 17 the feature maps for the G-CNN-6MN are shown. Fig. 17(a) depicts the feature map of the convolutional layer 1, for convolutional layer 4 the feature map is presented in Fig. 17(b). These feature maps are produced at the end of the 25 epoch training of the G-CNN-6MN model on the database SentiBank. Each layer in the G-CNN-6MN receives the input from the previous layer and performs convolution, max-pooling, and normalization operations to detect features. The feature representation at the end of 25 epochs for convolutional layer 1 is shown in Fig. 17(a). Further, the feature representations for intermediate and higher layers from Convolution Layer-4 to Convolution Layer-16 are given in Fig. 17(b) to Fig. 17(f). As observed by these figures, the lower layers represent simple features while the complex features are represented by higher layers [86].
The classification results obtained on the developed two Gabor CNN architectures G-CNN-6M and G-CNN-6MN along with CNN-6MN are presented in section 7. The comparison between G-CNN-6M, G-CNN-6MN, and CNN-6MN is made both in terms of computational cost and classification accuracy. For the computational cost, fewer filters involving convolutional operations require less computational cost than more number of filters. The number of filters used in CNN-6MN, G-CNN-6M, and G-CNN-6MN varied from 16 to 96. For SentiBank database results shown in Table 4, CNN-6MN with 96 non-Gabor filters has given a maximum accuracy of 84.08% on the test set. However, G-CNN-6MN has achieved a classification accuracy of more than 84.08% which is 90.05% with 16 filters in the convolutional layer. The only difference between CNN-6MN and G-CNN-6MN is the application of Gabor filters in the convolutional layer while the rest of the architecture of both these CNNs is the same. The use of Gabor filters in G-CNN models has improved the classification accuracy and even with fewer filters and better accuracy is obtained. Moreover, as the number of filters increased from 16, 32, 48, and 96 improvements in the classification accuracy of 90.05%, 90.55%, 91.05%, and 91.71% on the test set are observed for G-CNN-6MN as shown in Table 4 for SentiBank. For the Twitter database in Table 5, the G-CNN-6MN produced classification accuracy of 90.70%, 92.06%, 92.52%, and 89.00% with 16, 32, 48, and 96 filters respectively while CNN-6MN have given 83.45%, 82.43%, 82.43% and 83.22% accuracies on the test set. Similar observations are made on the results obtained for the MVSO database in Table 6. The maximum classification accuracy of 82.37% on the test set is obtained for CNN-6MN with 96 filters while G-CNN-6MN gave 96.72% classification accuracy on the test using 16 filters. The G-CNN-6M has 6 max-pooling layers and CNN-6MN includes both 6 max-pooling and normalization layers in its architecture. It is observed that the classification accuracy on the test set in some instances is better for CNN-6MN than G-CNN-6M. In Table 4 CNN-6MN with 16 filters has given 80.10% accuracy but G-CNN-6M with 16 filters only gave 77.12% accuracy on the test set. A maximum of 85.41% test accuracy is obtained for G-CNN-6M with 96 filters on the SentiBank database. Even the test classification accuracy for G-CNN-6M with filters 16, 32, 48, and 96 have achieved 57.55%, 57.41%, 65.24%, and 64.39% respectively on the test set of the MultiView_I database as presented in Table 7. For the same dataset MultiView_I, G-CNN-6MN has produced classification accuracy on the test set of 89.60%, 90.88%, 87.75%, and 90.74% with the number of filters 16, 32, 48, and 96 respectively. For the MultiView_II database, as shown in Table 8, G-CNN-6MN with 96 filters has reached a maximum classification accuracy of 91.31%, but classification accuracy for CNN-6MN of 75.79% and G-CNN-6M of 73.93% on the test set are obtained. The utilization of the max-pooling layer along with the normalization layer has a higher impact on the accuracy results both in G -CNN-6MN and CNN-6MN. The training and validation losses during the learning process of G-CNN-6M and G-CNN-6MN are depicted in Fig. 8,  Fig. 10, Fig. 12, Fig. 14, and Fig. 16 for SentiBank, Twitter, MVSO, MultiView_I, and MultiView_II databases respectively. Higher training and validation losses are observed for G-CNN-6M compared to G-CNN-6MN in these figures. The use of max-pooling and normalization layer is evident for reducing the training losses in G-CNN-6MN.  The metric such as sensitivity, precision, specificity, and F-score indicate the performance of the G-CNN models. These measurements on the test set are required to gauge and compare the effectiveness of the prediction models. The sensitivity metric for image sentiment classification signifies the performance of the model on the positive sentiment instances. While specificity measures how well a model predicted the negative sentiment instances. The G-CNN-6MN has achieved a better sensitivity rate between 96.39% to 97.88% while the specificity rate is observed between 64.67% to 72.94% on the SentiBank database as shown in Table 4. For the SentiBank database, the model CNN-6MN has a maximum specificity of 54.89% and G-CNN-6M has given a maximum specificity of 39.85%. For the Twitter database as shown in Table 5, the specificity rates are better as compared to the specificity rates of the SentiBank database. A maximum specificity rate of 87.38% is obtained for G-CNN-6MN with 32 filters and a sensitivity rate of 95.87% is given by G-CNN-6MN with 48 filters. The number of samples for the positive sentiment is 470 while the negative samples are 133 for the SentiBank database. For the Twitter database, positive samples are 581 and negative samples are 301. The specificity rate on the Twitter database has improved due to the availability of more negative samples as compared to the SentiBank database. The CNN models have shown better performance to individual classes when there are more samples used for that class during the training. There are 351 positive and negative samples available in both MultiView_I and MultiView_II databases. In Table 7 and Table 8 the sensitivity rate and specificity rates for various CNN models are presented. In Table 7 of Multi-View_I, the sensitivity rates in the range of 87.18% to 90.60% and specificity rates in the range of 88.32% to 91.74% are obtained using G-CNN-6MN. Similar results as presented in Table 8, indicate that an equal number of samples in both classes will provide comparable performance for sensitivity and specificity rates. F-score represents the harmonic mean of precision and sensitivity values and it measures the overall accuracy of the models on the database. For the SentiBank database, the model CNN-6MN with 96 filters has given a 90.09% F-score on the test set. F-score has improved with the use of Gabor filters as in G-CNN-6MN with 96 filters and obtained a F-score of 94.81%. Similar improvements in the F-score are observed for G-CNN-6MN with 96 filters on the Twitter database of 91.90%, MVSO database of 98.42%, MultiView_I database of 90.65%, and MultiView_II database of 91.22%.  Then experiments are conducted to compare the image sentiment classification results with two dilated CNN models. The CNN-Dilated models are constructed with a similar structure as that of G-CNN-6M and G-CNN-6MN, but with a few modifications. The max-pooling and normalization layers are removed and then dilations are introduced. For the first dilated CNN model (CNN-Dilated_2) we have introduced a constant dilation rate of 2 to all the convolutional layers. For the second CNN with dilation, a progressively (CNN-Dilated_P) increasing dilation rate of 1, 2, and 4 then decreasing dilation rate 2,1 are followed as in [87] and [45]. To achieve a fair comparison with our proposed model (G-CNN-6MN) the same number of filters are used in both CNN-Dilated models. The results are collected on the SentiBank for 10-fold cross-validation and these results are summarized in Table 9. For CNN-Dilated_2 with 16 filters the training accuracy of 92.21% and test accuracy of 81.43% is obtained. The CNN-Dilated_P with 48 filters has given accuracy of 96.32% for the training set and 83.26% for the test set. A maximum accuracy on the test set of 85.08% is obtained for CNN-Dilated_P with 16 filters, but our proposed G-CNN-6MN has achieved 91.71% for the same database.
The summary of best accuracies obtained using the proposed G-CNN-6MN model on various databases is shown in Table 10. Maximum accuracy of 97.39% is obtained on the MVSO database while a minimum of 90.88% accuracy is observed on MultiView_I. The recent state-of-the-art works for sentiment image recognition using ML and CNN models are presented in Table 11. In [12] the methods used for the classification task are Linear SVM and LR, which have given classification accuracies between 67.0% to 70.0%. The CNN models were employed in [13]. They used two models, PCNN produced a classification accuracy of 77.3% and CNN an accuracy of 78.3% obtained. In [60] several deep learning models were developed. A fine-tuning was carried out for the CaffeNet-fc9 model in [60] and this model has produced a classification accuracy of 79.50%. Fine-tuning and oversampling were performed for the model fine-tuned CaffeNet with oversampling, which has achieved a classification accuracy of 83.00%. The oversampled MVSO [EN] with oversampling has given 84.4% accuracy in their work. The image sentiment classification was carried out in [61] using region CNN. They reported 76.04% classification accuracy on the SentiBank database. In our work, we integrated Gabor filters with CNN and developed G-CNN-6M and G-CNN-6MN models. Using our developed G-CNN-6MN model an average of 92.76% accuracy is achieved as given in Table 10.

IX. CONCLUSION
Understanding the user's emotions through sentiment analysis of text, images, and videos from social media has wide applications in marketing research. The sentiment recognition of images is a difficult task because emotional features are abstract in nature. The CNN models have been applied successfully to many image recognition and analysis tasks due to their capability to learn prominent features. The visual data analysis and recognition using CNN models involves deep layers and demands high computational costs [20], [21]. In recent years cloud computing environment has become popular due to its high reliability, mobility, and software integration. While developing CNN applications in the cloud environment, the computational cost is important due to remote server location, the need for resource optimization, and infrastructure cost reduction. Hence CNN models in the cloud environment require efficacious in both performance and computational cost. Integrating Gabor filters with CNN models have improved the performance [27], [29], the decreased time required [28], [29], and computation energy [26], [28].
In this research, we developed two new CNN models by integrating Gabor filters with them. The Gabor integrated models G-CNN-6M and G-CNN-6MN are developed for image sentiment analysis in the cloud environment. A modified method to design Gabor filters using 3D information diagrams is devised. This method involved selecting Gabor filters that produced the maximum responses in the 3D information diagram. The selected filters are integrated with the convolutional layers of G-CNN-6M and G-CNN-6MN. A different number of layers of max-pooling and normalization are included in G-CNN architecture to improve their performance. Our developed G-CNN-6M has six maxpooling layers and G-CNN-6MN has six max-pooling and normalization layers. The image sentiment experiments are carried out on five databases: SentiBank, Twitter, MVSO, MultiView_I, and MultiView_II. In order to have a fair comparison, a CNN model (CNN-6MN) with the same number of layers but without Gabor filters is developed. The experiment results of G-CNN-6M and G-CNN-6MN are compared with CNN-6MN. On the SentiBank database, CNN-6MN with 96 filters has given maximum accuracy of 84.08% but using G-CNN-6MN with 16 filters has achieved 90.05% accuracy on the test set. Similar results are obtained in the experiments showing the results of G-CNN-6MN better compared to CNN-6MN even with fewer filters. Thus, reducing the computational cost for the G-CNN-6MN model. The image classification accuracy has been improved in our developed G-CNN models. On SentiBank, maximum classification accuracies on test set of 85.41% and 91.71% are obtained for models G-CNN-6M and G-CNN-6MN respectively. A maximum accuracy on the test set of 97.39% is obtained on the MVSO database for the G-CNN-6MN model with 48 filters. From the developed model G-CNN-6MN an average classification accuracy of 92.76% is obtained, which is better compared to the previous works. As future work, optimization techniques such as genetic algorithms or gravitation search algorithms could be applied to select parameters for Gabor filters. Another possible improvement would be the semantic tagging of images during sentiment image classification.

DATA AVAILABILITY
Previously reported data sets for the sentiment images were used to support this study and these studies are SentiBank [12], Twitter [13], MultiView_I [30], Multi-View_II [30], and MVSO [12]. The detailed description of the data set is covered in section 4.

FUNDING STATEMENT
No research grant is received to carry out this research.
Declarations: This paper presents original work not previously published in a similar form and not currently under consideration by another Journal. This research work has been carried out at the above-mentioned institutes.

Conflicts of interest/Competing interests:
There is no conflict of interest in this work. Highlights • Cloud based intelligent system is developed for image sentiment analysis Professor with the BITS-Pilani, Dubai, United Arab Emirates. He published around 27 international journal articles, 59 international conference papers, and five book chapters. His research interests include machine learning, data science, deep learning, NLP, text mining, and image recognition. His research has been recognized as the second best research at Manipal University. He has been granted the AICTE fund for an application of document image analysis to translate a scanned document to Braille equivalent during his Ph.D. He has been awarded the Ph.D. degree in computer science and engineering by the President of India Hon'ble Smt. Prathibha Patil, in 2010. He has given invited talks in various forums. Currently, he is a recognized guide in BITS-Pilani and Vishwesharya Technical University, Belgaum, and guiding five Ph.D. students.
JAGADISH NAYAK (Senior Member, IEEE) received the bachelor's degree in electronics and communication engineering from Karnataka University, the Master of Technology degree in digital electronics and communication from the Manipal Academy of Higher Education, Manipal, and the Ph.D. degree from the National Institute of Technology Karnataka Surathkal for his Thesis entitled ''Automated Detection of Eye Abnormalities and Patient Data Handling.'' Earlier to present position, he worked as an Associate Professor at the Manipal Academy of Higher Education and as a Faculty Member at NITK Surathkal. He also worked at a renowned company Bradma of India Ltd., Bengaluru, as a Customer Support Engineer. He is currently working as an Associate Professor with the Birla Institute of Technology and Science-Pilani, Dubai Campus. He has guided one Ph.D. student. He has a total of 27 years of experience both in industry, teaching, and research. He also handled some projects in the field of medical image processing. Recently, he handled couple of projects in the Internet of Things. He has published around 18 research articles in reputed international journals, three book chapters, 20 papers in international conferences, and ten papers in national conferences. His research interests include signal processing and its application, machine learning methods for medical signals and images, microelectronics, VLSI design, embedded systems design, and the Internet of Things. He is a member of several professional bodies. He is also a Reviewer for many international journals, specifically Journal of Medical Systems.
U. RAJENDRA ACHARYA received the Ph.D., D.Eng., and D.Sc. degrees. He is currently a Senior Faculty Member with Ngee Ann Polytechnic, Singapore; a Distinguished Professor with the International Research Organization for Advanced Science and Technology, Kumamoto University, Japan; an Adjunct Professor with the University of Malaya, Malaysia; Asia University, Taiwan; and University of Southern Queensland, Australia; and an Associate Faculty with the Singapore University of Social Sciences, Singapore. He has authored over 500 publications, including 345 in refereed international journals, 42 in international conference proceedings, and 17 books. His research interests include biomedical imaging and signal processing, data mining, and visualization, as well as applications of biophysics for better healthcare design and delivery. His funded research has accrued cumulative grants exceeding six million Singapore dollars. He has received over 60,000 citations on Google Scholar, with an H-index of 123. According to the Essential Science Indicators by Thomson, he consistently ranked among the top 1% of Highly Cited Researchers in Computer Science for the last seven years, from 2016 to 2022. He sits on the editorial boards of multiple journals and has served as a guest editor on several AI-related issues. VOLUME 10, 2022