A WeChat Official Account Reading Quantity Prediction Model Based on Text and Image Feature Extraction

This paper describes a study that built a neural network prediction model based on feature extraction, focusing on text analysis and image analysis of WeChat official accounts reading quantity. Based on the embedding method of the deep learning model, we extracted the text features in the title and the image features in the cover picture, explored the relationship between these features and the reading quantity, and built a neural network model based on these features to predict the reading quantity. The results show that there is a phenomenon of sentiment fusion in the text, and a sentence vector model based on Doc2Vec and a neural network model both had a good performance. This paper proposes a tool that can predict the reading quantity in advance and help administrators adjust the titles and images according to the predicted results.

this research gap. Specifically, this study uses crawler technology to capture the WeChat official account data of the pet type. We extracted the text features in the title and the image features in the cover picture through the embedding method based on the deep learning model and explore the relationship between these features and the reading quantity. We also built a neural network model based on these features to predict the reading quantity of articles.
The study makes several contributions. Firstly, this paper is one of the first works to study the field of WeChat official accounts and effectively combine social media user engagement and multimedia elements. Secondly, a method is proposed to combine the text and image features of the title and cover image which achieves a breakthrough in text and image analysis in the field of WeChat official accounts. Moreover, the significance of our study is in the development of a tool that can predict the reading quantity in advance and help administrators adjust the titles and images according to the prediction. The results of this study can provide administrators with feasible suggestions, help administrators better manage and improve WeChat official accounts, and expand the influence and communication power of accounts.
The remainder of this article is structured as follows: Section 2 provides a literature review that outlines some of the related previous studies. Section 3 introduces the research methods used in this study, followed by an analysis and discussion in Sections 4 and 5. Finally, we present the conclusion and future research directions in Section 6.

II. LITERATURE REVIEW
This section briefly introduces current studies on social media engagement, multimedia elements, mainstream social media analysis methods, and WeChat official accounts.

A. SOCIAL MEDIA ENGAGEMENT
Twitter, Facebook, and Instagram are typical social media platforms on which users can browse for information, express opinions, and follow or like accounts and posts. These user behaviors are forms of user engagement. Previous research has studied the three aspects of user engagement from the perspective of social media, namely, cognition, affect, and behavior [12][13][14]. Khan [15] summarized the forms of user engagement on social media platforms into two categories, participation and consumption, including like, comment, share, view, and so on. In another example of this, Devereux et al. [16] used the number of likes and comments as attributes of social media posts to study the engagement of small business consumers to better integrate marketing strategies. Moran et al. [17] measured the brand engagement of online consumers through Facebook participation indicators, namely clicks, likes, shares, and comments. Lim et al. [18] proposed a framework for studying social media engagement that focused on the influence of time and media type on user engagement. The research framework of Harrigan et al. [19] confirmed the multi-dimensionality of user engagement, and they found that three of them are of great significance to tourism social media engagement. Kim and Dennis [20] found that the credibility of an article affects user engagement in reads, likes, shares, and comments.
The importance of social media engagement has been continuously confirmed and discussed by previous research. However, the current research objects are more concentrated on foreign social media platforms, such as Facebook and Twitter, and little research has been conducted in detail regarding the social media engagement of WeChat official accounts [14] [21][22][23]. Therefore, this study focuses on social media user engagement of WeChat official accounts, taking reading as a form of user engagement, and we measure the impact of social media engagement through the index of the reading quantity.

B. MULTIMEDIA ELEMENTS
Multimedia elements include text, video, sound, graphics, animation, and interactivity, all of which play an important role in social media [24]. A post on social media platforms consists of one or more multimedia elements, of which text elements are most used for social media analysis. For example, text elements in Twitter posts have been discussed to identify eyewitness information when a disaster occurred, and text elements in microblogs have been used to predict the stock market [25][26]. Moreover, previous research has studied the influence of text elements on social media engagement and the close relationship between them [27][28][29]. Some studies have used the forms of user engagement in social media, such as likes and comments, as variables, to explore strategies or mechanisms. At the same time, the relationship between image and video elements in social media and user engagement has also been gradually studied by some scholars. Kordzadeh and Young [30] believed that the vividness of posts in multimedia elements like images and videos can help increase user engagement. Moran et al. [17] emphasized that the richness of media has a strong influence on all participative behaviors. They also found that visual images attract the most users to participate in behaviors.

C. SOCIAL MEDIA ANALYTICAL METHODS
Social media analytics aims to collect, monitor, analyze, summarize and visualize social media data, and it is usually driven by specific requirements from a target application [31]. In recent years, social media analytics have been applied to politics [32][33][34], the economy [35][36], culture [37][38], the natural environment [39], health, and so on [40]. Commonly used social media analysis methods include topic modeling, semantic analysis, text classification, and sentiment analysis. Other machine learning methods, such as natural language processing, neural networks, and computer vision are gradually being applied to social media, making breakthroughs in the research methods. At the level of social media methods, the existing research reserves are very rich. Qiang et al. [41] proposed a geographical topic modeling method based on the Latent Dirichlet Allocation (LDA) model to find topics that start at a specific time. Munuswamy et al. [42] gave a new sentiment analysis rating prediction method and generated a new recommendation system. In the field of natural language processing, Kim et al. [43] used the Doc2Vec method to extract the feature vectors representing the technical meaning from the document text of an acquired company and estimated the company's technical similarity score with a start-up company. Sanz [44] used Doc2Vec to convert the report of each region into a numerical vector as a logistic regression feature to evaluate the sovereign credit risk of different regions. Furthermore, some research has combined social media and a neural network to achieve geographic location prediction [45].
Previous studies on WeChat have focused either on user behaviors and attitudes or on the influence of WeChat as a social media platform, and text analysis and image analysis of WeChat official accounts are very rare. Little research has been conducted on the deeper excavation and exploration of WeChat official accounts text through natural language processing. As far as we know, there is also very little research on the analysis of the cover image of WeChat official accounts. The title in the form of text and the cover image in the form of pictures are all the information of the article for users. The attractiveness of the text and pictures will directly affect the user's interest in clicking to read the article. How to adjust the title and cover image to achieve a higher reading quantity and improve the operating effect and dissemination of the WeChat official account has become a content worthy of research. With this as an entry point, this study started from the title and cover image of the article, and focused on the text analysis and image analysis of the WeChat official account reading quantity, which contributed to the research in this field and made up for this research gap.

III. WECHAT OFFICIAL ACCOUNTS
This section introduces the characteristics and importance of WeChat official accounts in detail.
There is a difference between WeChat and WeChat official accounts. WeChat is a cross-platform communication tool that was launched by Tencent in January of 2011. It supports single and multiple people to send voices, pictures, videos, and texts through a mobile phone network. However, WeChat is not only a chat tool but also a social media platform with strong communication and influence abilities. It provides functions like WeChat official accounts, WeChat Moments, and WeChat Pay. Users can send posts through WeChat Moments, add friends, and follow official accounts by searching for their number, scanning QR codes, and so on, and information can also be spread and shared through official accounts. By the second quarter of 2016, WeChat had covered more than 94% of China's smartphones, with 806 million monthly active users, reaching more than 200 countries and more than 20 languages.
As a function of WeChat, a WeChat official account is a service platform provided by WeChat for individuals and businesses. An individual or company can apply for an official account on the WeChat platform. After verifying identify information, the administrator of the account has the right to use the WeChat official account. Administrators can send, communicate, and share information through the official account, while users who follow the account can view information, participate in interactions, and give feedback through the official account. By November of 2017, WeChat had gathered more than 10 million official accounts, including 3.5 million monthly active official accounts [1]. The types of official accounts are diverse, including information inquiry, sharing, professional answers, interactive communication, and so on, involving education, tourism, security, news, governmental affairs, life, economy, and many other fields. The content of the article can be original or the integration of information.

FIGURE 1. The two interfaces of a WeChat official account
A WeChat official account is divided into internal and external. In the internal interfaces, users can send messages to the official account, and the account will provide users with some functional modules in the menu bar at the bottom for users to select and view. The external interface displays the historical articles of the official account, all the articles that have been published will be displayed here, and the latest articles appear first. To more intuitively understand the difference between the two interfaces of WeChat official accounts, Figure 1 shows a simple figure of a tourist official account, which is shown in Figure 1. The picture on the left is the internal interface of the official account. Only users who follow the account can see this interface. The WeChat official account automatically sends a welcome message to the user and gives a brief introduction. At the bottom of the interface is a toolbar, and the far left is keyboard input. The user can input information when clicking, and the information will appear in this interface. The three modules on the right are the three functions provided by the official account for users, including "Hot topic," "Learn about," and "Contact us." Among them, the second function includes four subfunctions, which further refines the needs of users. Each function is a hyperlink, which leads the user to enter a new link or obtain new information. The picture on the right is the external interface of the WeChat official account, and the historical articles are displayed in vertical order. The article format users see is composed of the title and cover image. Only when users choose to read the article can they click on the link to view the full text. The number of clicks on the article link is the page views of the article, which is also the reading quantity mentioned in this study. Furthermore, one official account can send multiple articles at a time, and the remaining articles will be folded.

IV. RESEARCH METHODOLOGY
This section discusses the research framework used in this study.
This study focused on text analysis and image analysis on the reading quantity of WeChat official accounts. The research object selected in this study was the pet type, which is one of the different kinds of account types. Most of the "protagonists" in the pet-type articles are cats and dogs, and the WeChat official accounts selected in this study are of these two pets. The articles of the pet type have a wide range of content, including fun facts about pets, popularization of pet knowledge, etc. The data collected from the accounts was used to extract features through text analysis and image analysis. These data are described in the Data subsection. The study included the following steps: • Acquire the title, cover picture, and the reading quantity of each article on the WeChat official accounts.
• Extract text features through natural language processing, such as sentiment analysis, text embedding, and other algorithms. • Extract image features from the perspective of color and image recognition. • Build a neural network based on these features and optimize the model to effectively predict the reading quantity.

A. DATA
The pet-type official account was chosen as the research object because the official accounts under this type have many users, and the users view articles mainly based on curiosity and interest rather than an information platform that must be consulted, which was in line with our research purpose. More than 15 thousand data were collected from eight WeChat official accounts that were popular among the pet type, especially dogs and cats. The average reading quantity of these official accounts' articles ranged from 10 thousand to 70 thousand. We collected the title, cover picture, and link of each article as the original data, in which the title was represented as a text string, and the cover picture was represented as a link to the picture.

FIGURE 2. Detailed information of the text features and image features
This study cleaned the original data through python, checked whether there were missing values under the three attributes of title, cover picture, and reading volume, and checked whether there were duplicate values under the attribute of title. After inspection, there was a duplicate item in the 15225 pieces of original data obtained, and there is no missing value. After data cleaning, there were 15224 data available in this study. In addition, since the need of subsequent text feature extraction, this study also segmented the text and removed stop words about the pre-processing of the data. After removing missing values and inappropriate data, the total amount of data collected was 15,224. These data were used to extract text and image features.

B. TEXT FEATURE EXTRACTION
A total of 512 text features were extracted in this study, including the length of title, whether special symbols and numbers existed in the title, seven sentiment categories based on sentiment analysis, and 500-dimensional sentence vectors extracted based on text embedding. The detailed information of the text features is described in Figure 2. The special symbols here included "!," "?," and "…"The existence of these special symbols and the number in the title was a Boolean value, that is, if it existed, it was 1; otherwise, it was 0. Additionally, the seven sentiment categories were also Boolean values that represent the existence of certain sentiment words in the title. The 500-dimensional sentence vector based on text embedding is the expression of the text. Each dimension represents a word in the dictionary. Sentiment analysis and text embedding are discussed below.

1) SENTIMENT ANALYSIS
Sentiment analysis is a common scenario in natural language processing including commodity evaluation, public opinion analysis, and sentiment classification, which plays an important role in guiding sentiment mining [46][47]. Traditional sentiment analysis methods based on a sentiment dictionary are mostly used for public opinion analysis, and the sentiment tendencies are mainly positive and negative. However, the sentiment color of an article title is rich, and it is impossible to explore the sentiments contained in the sentence itself from only the positive and negative aspects.
The sentiment dictionary used in this study was the Chinese Sentiment Vocabulary Ontology Database of Dalian University of Technology, which is a Chinese Ontology resource organized and annotated by all the members of the Information Retrieval Research Office of Dalian University of Technology under the guidance of Professor Hongfei Lin. The dictionary describes Chinese vocabulary in terms of parts of speech, sentiment category, and sentiment intensity and polarity. Most importantly, the dictionary divides the sentiment color of vocabulary into seven categories: anger, disgust, fear, sadness, surprise, goodness and happiness. Compared with positive and negative sentiments, it greatly expands the range of sentiments. The above seven sentiment categories were taken as the seven variables of text features.
This study considered sentiment fusion, which refers to a collection of words with different sentiment colors in the title. Specifically, the dictionary lists the sentiment categories to which each word belongs, and we used the sentiment dictionary to find all the words with sentiment colors in the title of each article. There may have been several words in the title with different sentiments, or none. However, this does not mean the title had no sentiment colors but rather that the used words were not in the dictionary. The result of the sentiment categories is expressed in the form of a Boolean value, that is, if it existed, it was 1; otherwise, it was 0.

2) TEXT EMBEDDING
Unprocessed text cannot be used to quantitatively express the relationship and semantic information between texts. To achieve this, the text must be converted into a vector. The vector representation of words goes through the stage of transition for the one-hot encoding to distributed representation. The traditional one-hot encoding represents each word as an n-dimensional vector, where the length of the dimension is the size of the entire dictionary library, and each dimension represents a word in the dictionary. In the one-hot encoding representation of each word, the dimensions are all 0 except for one dimension which is 1. The dimension of 1 represents the word itself. However, any two words in this encoding method are isolated and cannot reflect semantic information. Moreover, it is easy to cause memory disasters because the dimensionality of the vocabulary is usually very large.
To solve this problem, the words are made into a distributed representation. Each word is mapped to a shorter word vector, and all the word vectors form a vector space. This process of embedding high-dimensional word vectors into a lowdimensional space is called word embedding. A word embedding is a type of text embedding that has gradually become an important part of natural language processing systems based on deep learning. They encode words and sentences in fixed-length vectors to greatly improve the processing performance of text data. The most used word embedding methods are Word2Vec and GloVe [48][49]. The Word2Vec embedding method is a word vector computing tool launched by Google in 2013. It can vectorize all the words so that the relationship between words can be mined and measured quantitatively. Using the idea of machine learning, Word2Vec can simplify the text content into vector operations in the k-dimensional vector space through training, and the similarity in the vector space can be used to express the semantic similarity of text [50]. There are two models of Word2Vec, Continuous Bag of Words (CBOW) and Skipgram. In the CBOW model, the word vectors of the context are cascaded or summed as features to predict the probability of the target word. The objective function is shown in Formula (1). Since the task of prediction is a multi-classification problem, the loss function uses SoftMax which is shown in Formula (2).
Sentence and document embedding are the other forms of text embedding. Although Word2Vec can be used to represent a sentence vector, it ignores the influence of the sequence of words on the sentence of text information. Doc2Vec is an improvement of Word2Vec, which not only considers the semantics between words but also word order by adding a paragraph vector [48]. There are two models of Doc2Vec, Distributed Memory version of Paragraph Vector (PV-DM) and Distributed Bag of Words version of Paragraph Vector (PV-DBOW) [51]. In this study, we used the PV-DM model to extract sentence vectors. The model slides fixed-length words from a sentence at a time and takes one of the words as the predicted word, and the others as the input words. The word vector corresponding to the input words and the sentence-id vector serve as the input layer, and then the probability of the predicted word in the window is predicted. Window size refers to the maximum distance between the current word and the predicted word. For example, the current sentence could be "I played basketball last weekend" and the window size could be 2. In a certain slide, the prediction is "basketball," and thus the input words are "I," "played," "last," and "weekend." The PV-DM model is shown in Figure 3.
It should be noted that, although the DM model is used to output the probability of the predicted word, the purpose of our study was to obtain the parameters in the model training process to obtain the sentence vector distribution expression of each text. In this study, the window size of the model was 3 and the sentence vector dimension was 500. To evaluate the performance of the model, we randomly selected a sample and set some sentences like the sample to measure the similarity between these sentences and the sample. The above process of randomly selecting samples was repeated many times. Then, we obtained the sentence vector distribution expressions of all the samples through the model. Each one was a 500-dimensional vector, which was the unique expression of sentence semantics, that is, the hidden layer vector in the model.

C. IMAGE FEATURE EXTRACTION
A total of six image features were extracted in this study, including the number of colors, whether text and faces existed in the cover picture, and three red, green, and blue (RGB) values of the main color in the picture. The detailed information of the image features is described in Figure 2. The number of colors describes the richness of the colors in the picture. The higher the number, the richer the colors. The existence of the text in the picture was a Boolean value, that is, if it existed, it was 1; otherwise, it was 0. Additionally, the existence of the face in the picture was also a Boolean value. The existence of the text and face in picture was the result of image recognition. Due to the limited accuracy of the recognition method, our variables did not emphasize the specific situation of the recognition text and the number of faces but focused on whether it existed. The main color refers to the color with the largest proportion in the picture, which indirectly describes the dominant tone of the picture. This study used the RGB values of the main color.

D. CONSTRUCTION OF THE NEURAL NETWORK
The previous steps obtained different text features and image features, which were used as the input variables of the neural network model, and the reading quantity was used as the output variable. It should be noted that, before building a neural network, all the features needed to be normalized because the order of magnitude difference between the features was very large. The variables to be normalized included the length of the title, number of colors, red (RGB-R), green (RGB-G), and blue (RGB-B). Moreover, the reading quantity was degraded by a factor of 100,000. The neural network model used in this study was a regression prediction model, as shown in Figure 4. In this study, we used different features to predict the reading quantity of the article. Because the reading quantity is a continuous value, this problem is a regression problem, which is why we chose the regression prediction model. There were 518 input variables in the model, including 512 text features and the six image features mentioned above, and the output variable was the reading quantity of the article. The purpose of this study was to learn the impact of text and image features on the reading quantity, in other words, the impact of text and image elements on user engagement. This pattern is a way in which the title and cover picture are used to attract users to read the article. The result of the model can help administrators predict the reading quantity in advance and then make appropriate adjustments to the title and cover picture based on the predicted results to achieve a higher reading quantity.
We divided the data set into a training set, validation set, and test set. Specifically, the model was trained on the train set, the parameters were adjusted on the validation set, and then the model performance was evaluated through the test set. Although the eight WeChat official accounts were of the same type, each had its own unique attributes that affected the accuracy of the neural network model results. To better realize the prediction of the reading quantity, we selected one of the WeChat official accounts as the data source of the neural network model. The official account had a total of 3,057 pieces of data. The result of the final data set division was 2,000 pieces of data in the training set, 500 pieces of data in the test set, and 557 pieces of data in the validation set. Among them, the ratio of the training set to the validation set was 4:1, and the ratio of the test set to the validation set was approximately 1:1. Finally, the mean square error (MSE), mean average error (MAE), and relative error were selected as the evaluation indicators to measure the performance of the regression neural network. The MAE and MSE are commonly used and typical evaluation indicators in regression problems. The relative error refers to the ratio of the absolute value of the error to the actual value, as shown in Formula (3). The smaller the relative error, the better the fitting effect. The parameters needed to be adjusted in the model include learning rate η, batch size, regularization parameter λ, number of layers, epoch, activation, and so on. The selection of parameters affects the training speed and fitting result of the neural network. For example, too high of a learning rate will lead to loss oscillation and a failure to converge, while too low of a learning rate will reduce the learning speed of the model. Moreover, to ensure that the learning results on the training data can be better applied to the test data, it is necessary to consider over-fitting and under-fitting and add some measures to suppress over-fitting like early stopping, regularization, and dropout strategy. In short, the selection and adjustment of parameters is a process of continuous testing. We needed to determine the appropriate parameters through the loss change during the training process and the results of the evaluation indicators. (3)

V. ANALYSIS AND RESULTS
This section discusses the data analysis results of the above methods.

A. SENTIMENT ANALYSIS RESULTS
The sentiment analysis results are shown in Table 1. It was found that the two sentiment words of disgust and sadness existed simultaneously in the 32nd sample, which was the expression of sentiment fusion in this study. The 30th, 31st, and 34th samples had only one sentiment word, while the 33rd sample did not detect any sentiment words. However, as previously discussed, this does not mean that the text had no sentiment color but rather that the words in the sample did not belong to the sentiment words in the dictionary. In addition to the sentiment distribution of each sample, we also counted the occurrence of the representative sentiment words in all the samples. Table 2 lists the number of each sentiment category and the top five representative sentiment words in each category. The results show that the words with sentiment colors of good and disgust existed the most times, which were 4106 and 3392, respectively. The sentiment words of anger existed the least, at only 133. Additionally, the top five sentiment words reflected the focus of the article under the seven sentiment categories. For example, in the good category, "cute," "smart," and "health" were keywords that often appeared in articles. "Harmful" and "careful" were found in the fear category, which indicates that the topic of these articles was to remind readers of what pets needed to pay attention to and what was bad for pets.    Table 3 describes the result of the text embedding feature through Doc2Vev between a sample and a set of similar sentences. The correlation between sentences decreases sequentially. Thus, the six similar sentences in Table 3 are increasingly less semantically relevant to the sample, and the prediction results of the model show that the similarity gradually decreased, which is consistent with this characteristic. Since the order of the results confirms to the actual situation, the performance of the sentence vector model was good. Table 4 shows the first, second, third, 499th, and 500th dimensional vectors.

B. THE TEXT EMBEDDING RESULTS
After feature extraction, this study process was divided into two steps. The first step studied the relationship between independent variables and dependent variables through correlation analysis and cluster analysis. Specifically, the relationship between text features, image features, and the reading quantity are discussed in the form of a heatmap through correlation analysis. Then, the reading quantity was divided into five categories through clustering analysis to find the characteristics of each category and the differences among different categories. The second step learned the features through a neural network to achieve the prediction of the reading quantity. Since there were 518 input variables, we cannot display them all here; however, some of the normalized data is described in Table 5.

C. THE RESULTS OF CORRELATION ANALYSIS AND CLUSTERING ANALYSIS
The variable correlation results are shown in Figure 5. It is indicated that length of title, existence of "…" in title, and number in title are positively correlated with the reading quantity, while the existences of "?" and "!" in the title are not significantly correlated. Furthermore, text in picture, RGB-R, and RGB-G are negatively correlated with the reading quantity, while the number of colors, face in picture, and RGB-B are positively correlated. The number of cluster categories and corresponding categories are shown in Table 6. Among them, Category 3 had the highest reading quantity at 96,950 with 777 pieces of data, which should be used as a key case to study the characteristics of the articles. Category 1 had the higher reading quantity at 61,078 with 1388 pieces of data while category 4 has the medium reading quantity at 39,012 with 2272 pieces of data. Categories 0 and 2 have the lower and lowest reading quantity at 22,803 and 7701, respectively. There were 3093 pieces of data in Category 0 and 7764 pieces of data in Category 2 with the largest number, which means focus should be placed on the attributes of articles under the two categories to improve the reading quantity. The clustering results show that the proportion of the category with the highest reading quantity was only 5%, while the proportion of the categories with the lower and lowest reading quantity was as high as 71%. How to reduce the proportion of these two categories, improve the reading quantity, and give reasonable suggestions is one of the purposes of this study. To study the characteristics of each category and the differences among different categories, we calculated the mean values of different features in each category, respectively. For example, the mean value of the number of colors in Category 3 was 43,701.27, the mean value of sentiment-disgust in Category 1 was 0.22, while the mean value of the length of title in Category 2 was 21.07. Category 3 is excluded while studying the differences among categories since the proportion of the Category 3 was only 5%. Figure 6 shows a comparison of the variables under different categories. As the mean value of the length of title goes from high to low (shown in Figure 6 (a)) and the existence of text in pictures increases (shown in Figure 6 (b)), the reading quantity decreased (category 1>4>0>2). These results are consistent with the correlation results in Figure 5, that is, as the length of title increased and the existence of text in pictures decreased, the reading quantity showed an increasing trend.

D. THE RESULTS OF THE NEURAL NETWORK MODEL
The results of the neural network model are shown in Figure  7, which describes the visualization of partial parameters selection, final parameters, and the evaluation indicators results. Figure 7 (a) shows the changes of train loss during 100 epochs under different learning rates when the other parameters were fixed. As a gradient explosion occurred when the learning rate was greater than 0.5, only some learning rates below 0.5 are drawn in the figure. As shown, as the learning rate increased, the training loss kept a trend of gradual decline and the became increasingly smaller. When the learning rate reached 0.25, the training loss dropped sharply in the first few epochs, then slowly declined, and later reached saturation. The learning rate was finally fixed to 0.1 according to the threshold of the learning rate and the actual situation. Figure 7 (b) shows the changes of training loss of different batch sizes during 200 epochs. The selection of batch size also affected the speed and performance of the model. Too small a batch size may cause loss oscillation and a failure to converge, which is shown as the green curve in the figure; too large of batch size will increase the time cost. The changes of the other two curves were compared, in which the cyan curve represents the batch size of 32 and the blue represents the batch size of 64. Their training loss eventually drops to the same size and the two curves had a slight vibration during the whole epoch. Finally, the batch size was fixed to 64 in this study. Figure 7 (c) describes all the parameters that were finally determined. The standard deviation of the normal distribution of the weight initialization was 0.1, the learning rate η was 0.1, and the batch size was 64 as mentioned above. The regularization parameter λ was 0.0001, drop probability was 0.5, and the epoch was 200. Besides, the optimizer used in the model was a gradient descent optimizer. Figure 7 (d) describes the results of the evaluation indicators under the above parameters. The results show that the MAE was 0.207, the MSE was 0.059, and the relative error was 34.9%, which indicates that the model had a good performance.

VI. DISCUSSION
The results reveal that there was a phenomenon of sentiment fusion in the titles of some samples, in which the words with sentiment colors of good and disgust existed the most times, while the sentiment words of anger existed the least. To further explore the expression forms of sentiment colors, we counted the top five representative sentiment words that occurred the most frequently in each sentiment category. The results show that "cute," "smart," "healthy," and other words associated with good characters and the health of pets were more popular with the administrators, who published a very high number of articles in this category. Surprisingly, disgust, as a negative sentiment category, ranked second only to good in terms of the number of sentiment words. The findings reveal that "breakdown," "abandon," and "doubt" were the most common words in this category, which indicate that administrators pay more attention to the stories with this kind of sentiment and share them with readers. This study also explored the sentence vector that can be used as a unique textual semantic representation. In this study, each text was extracted as a 500-dimensional vector, and each dimension represented a lexical word in the dictionary. The result of the text embedding based on Doc2Vec shows that our sentence vector model had a good performance in relative similarity and could predict the semantic information of new texts. This is probably because the research object of our official accounts was fixed to the pet type, and a single type is more concentrated in the use of words as well as being easier to build a dictionary of the same type.
The results of correlation and cluster analysis indicate that the increase of length of title as well as the existence of "…" in the title were more likely to promote user engagement and thereby increase the reading quantity of pet articles, but the existence of text in a picture may inhibit user engagement and reduce users' interests in reading, thereby decrease the reading quantity of articles. However, other features did not show a significant correlation, which may be due to the difference in styles of different WeChat official accounts. Moreover, the study found that the proportion of articles with the highest reading quantity was only 5%, while the proportion with the lower and lowest reading quantity was as high as 71%, which reveal that there was still a large difference in the reading quantity among samples. We hope to help administrators reduce the proportion of these two categories and increase the reading quantity.
Therefore, we proposed a neural network prediction model based on text and image features, which makes up for the lack of basic data analysis and can effectively help administrators to predict the reading quantity in advance through machine learning and adjust the title and cover picture according to the predicted results to achieve higher reading quantity. This study adjusted and optimized different parameters and added some approaches to suppress over-fitting. The more appropriate parameters are eventually determined through the change of training loss during the training process and the results of evaluation indicators. However, the model results of the regression neural network must be viewed cautiously given the high level of relative error (34.9%).

A. LIMITATIONS AND FUTURE RESEARCH
This study had certain limitations that provided opportunities for future research. First, we only applied the proposed model to the pet type, whereas each type of WeChat official account may have its own style and development in the process of article sharing and publishing. Future research can expand the investigation of other types of WeChat official accounts and further explore the characteristics of articles in different fields. Second, there are many forms of user engagement in social media, of which only one was discussed in this study, namely, reading quantity. However, different forms of user engagement have different communication power and influence on social media platforms. Future research can try to explore the influence of multimedia elements on other user engagement behaviors, such as share and comment. Furthermore, we proposed a neural network model based on text and image feature extraction, which is a preliminary attempt to provide a prediction approach for administrators. However, both image feature extraction and model algorithms have room for improvement. Future research can delve deeper into more advanced methods and further extract image features to enrich the independent variables and improve the data diversity and accuracy of the model itself. In the selection of prediction models, algorithms like a convolutional neural network can be combined to achieve a performance comparison among algorithms.

VII. CONCLUSION
This study effectively combined social media user engagement and multimedia elements, explicated the factors that may affect the reading quantity of WeChat official accounts articles through text analysis and image analysis, and proposed a neural network prediction model based on text features and image features. By collecting the text and image information from eight pet WeChat official accounts, we extracted 518 features through sentiment analysis and sentence vector and image recognition, and finally embedded them into the machine learning model to achieve this goal.
The results show that both the sentence vector model based on Doc2Vec and the neural network model had a good performance. This article proposed a comprehensive tool that can help administrators predict the reading quantity in advance and make appropriate adjustments to the title and cover picture according to the predicted results to guide administrators to better manage and improve the WeChat official accounts, attract the attention of readers, and expand the communication power and influence of the accounts. Moreover, the results of this study provide a strong basis for future research.