Evaluation of the Development Value of Emotional Data Mining in Mass Media Using the RBTM Model

With the proliferation of the internet, many social platforms continue to emerge, giving rise to a surge in user-generated content. Consequently, abundant textual information permeates the virtual realm, wielding a certain impact on public opinion orientation. One can discern the prevailing sentiment prevailing in society by scrutinizing the latent emotional undercurrents embedded within vast news texts. This article presents an RBTM model adept at extracting and analyzing emotional information from mass media, thus lending invaluable insights into the evolution of the intelligent news industry. The empirical findings substantiate the RBTM model’s preeminence over its counterparts, evinced by its superior training duration and predictive prowess. Notably, the RBTM model efficaciously deciphers emotional inclinations within news content during practical applications, obviating the need for extensive manual inspection while curtailing analysis time by 49% across departmental endeavors. As an outcome, this paper deliberates upon the tantalizing prospects of intelligent news analysis methods contingent upon emotional information extraction, thereby paving the way for a formidable future in media comprehension.


I. INTRODUCTION
Bfore the popularity of the internet, most of the media information people could access came from newspapers and news agencies. These institutions had absolute control over the dissemination of social information, and this form of information dissemination was a one-way transmission from media to readers. With the development of the internet, various social networking platforms have emerged. The popularity of the internet has made it very convenient for everyone to communicate. With the development of internet information platforms, today's mass media forms are different from past ones. Currently, online news services have become a significant carrier of network information. More and more news readers are abandoning traditional media and instead using online news to keep up with real-time events [1]. This type of communication is fast-paced compared to other The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Asif . forms, such as newspapers; it is more timely and interactive as users can express their opinions through commenting or sharing emotions on these sites. Through interaction on these sites, users produce more text content on blogs or comment sections, forming multiple types of textual information available via websites. Internet news has become an effective way for people to obtain necessary information and their first choice [2]. The shift in news consumption from traditional media to online news services has led to increased text content produced by users on social networking platforms. It has resulted in multiple types of textual information being available on websites, which can significantly impact public opinion. However, the general opinion tendency of specific field news can also affect the emotions and intentions of the public, and news content without regulation and review may guide the public to recognize the author's personal views to a certain extent. It highlights the importance of monitoring and guiding online public opinion, especially for young internet users at a critical stage in developing their values and worldview [3]. In addition, according to statistics, there are 158 million internet users aged [6][7][8][9][10][11][12][13][14][15][16][17][18][19] in China. Most of these young netizens are at a critical stage in developing their values and worldview and do not have an established ability to discern right from wrong. When making relevant decisions or actions, they are more or less influenced by the external environment and emotional tendencies; malicious people will likely incite them into doing things that violate morality and law [4]. Young people are the future and hope of our country; therefore, we should correctly guide and protect young internet users, deliver positive online content to them more effectively, and better guide them towards healthy internet use habits to create a green and healthy online environment for young people.
Text is considered an essential medium of information exchange, with its primary function being to convey emotions. News text, as a genre that records and disseminates information, generally has a longer length because the author wants to convey complete information about an event through the text. In addition, news texts mostly contain objective descriptions and have fewer contents that strongly express emotional information. Sometimes, authors' evaluations of events also include more than one emotion. These characteristics pose significant challenges in extracting dynamic information from news content [5].
In today's era of information explosion, faced with countless news and information, relying solely on manual analysis would consume a lot of workforce and time. Therefore, automatic and accurate identification of the emotional tendencies hidden in internet news is of great theoretical significance and practical value for effective monitoring, early warning, and guidance of online public opinion and sustainable development of the public opinion ecosystem. Internet news sentiment analysis can be abstracted as a text three-classification problem in natural language processing. Based on the title and content of news articles determines the emotional polarity behind them [6]. There are three types of sentiment analysis methods: those based on sentiment dictionaries, traditional machine learning-based methods, and deep learning-based methods. In recent years, with the development and widespread use of deep learning techniques, new solutions have emerged for sentiment analysis tasks in news articles. The latest deep learning-based approaches mostly use pre-trained models widely applied to natural language processing (NLP) tasks, including text sentiment analysis, where they perform better than traditional machine learning methods [7]. This article presents a sentiment analysis model for internet news, proposing a method to analyze news content's sentiment and semantic orientation. Based on the pre-trained RoBERTa model, this study uses a Bi-LSTM network to construct the RBTM deep learning model to analyze internet news sentiment effectively. A series of comparative experiments were conducted to verify the performance of this model. The experimental results show that the Macro-F1 value can reach 0.8172, which saves 32% personnel allocation compared with manual inspection in practical applications and reduces 49% analysis time for news content in departmental overall work.
The main contributions of this article are summarized as follows: • A new model for analyzing the emotional tendency of news content based on the RoBERTa pretraining model has been proposed. This model can analyze news's emotional trends by combining the news's original text and readers' comments.
• An intelligent news and public opinion management system was implemented using a new sentiment analysis model, which can help managers timely detect potential risks in the public opinion of news content by analyzing news and comment contents based on the model.
• The present study delved into the potential of the intelligent news industry by examining the utilization of sentiment information extracted from mass media. We contend that the news industry, particularly in the Internet era, should advocate for the widespread adoption of artificial intelligence technology to comprehend public opinion and readers' preferences effectively. Our firm belief is that the intelligent news industry has a bright future ahead.

II. LITERATURE REVIEW
Many researchers at home and abroad have focused their attention on the field of text sentiment analysis. With the emergence of related theories and technologies in deep learning, more and more people are choosing deep learning models as text sentiment classification models when conducting text sentiment analysis. Compared with traditional methods, the most prominent feature of deep learning is to use vectors to represent elements at different levels. Deep learning uses vectors to represent words, phrases, logical expressions, and sentence patterns and constructs multi-level neural networks to achieve autonomous learning [8]. Li et al. [9] used recursive neural networks and recursive tensor neural network models to process text sentiment analysis without needing any pre-defined sentiment lexicon or polarity conversion rule. Recursive neural networks (RNNs) are a type of neural network architecture that recursively apply the same neural network function to input data with a recursive structure. Recursive tensor neural network (RTNN) models are a type of RNN that extends the idea of RNNs to handle higher-order interactions among the input data. Jang et al. [10] had previously used essential convolutional neural networks to build their model. Jeffrey [11] proposed a recurrent neural network (RNN) model that can thoroughly learn contextual information from text, but it brought about the vanishing gradient problem. To solve this problem, Lin [12] introduced a new effective gradient-based method called long short-term memory (LSTM). Long short-term memory (LSTM) is a recurrent neural network (RNN) architecture designed to capture long-term dependencies in sequential data. LSTM networks have been successfully applied to various tasks such as speech recognition, natural VOLUME 11, 2023 language processing, and sequence prediction. They can capture long-term dependencies and are less prone to the vanishing gradient problem than traditional RNNs. However, they can be computationally expensive and require careful parameter tuning and initialization. Deep learning methods can automatically extract and learn text features, and the learned features are more complex, which can improve the accuracy of text classification. Li et al. [13] improved the LSTM model. They proposed a deep learning-based CLSTM model for sentiment prediction on context-level word vector sequences, further enhancing the accuracy of sentiment polarity judgment. However, since news text sentiment analysis is different from the general text in that it is longer and contains a large amount of neutral expression without emotion, these models perform generally in news text sentiment analysis. BERT and its related variants are popular large-scale pre-trained language models in recent years, widely used in short text tasks [14]. However, as the length of the text increases, the number of model parameters increases and computational power consumption rises. Therefore, they are rarely applied to long text tasks. These methods require a lot of computation and it is difficult to achieve good results in both training speed and prediction accuracy when computing resources are limited. These studies indicate that text sentiment analysis has excellent application value, but problems still need to be solved. Researchers can propose improvements on these large-scale pre-trained models, which can achieve model performance improvement with lower computing power. This article proposes a new news text sentiment analysis method based on the RoBERTa model, which achieves good results in training speed and prediction accuracy.

III. METHODOLOGY
News is a literary form that records society, disseminates information and reflects the times. Unlike general texts, most news texts are structured with complete elements and rigorous logic, possessing unique norms and structures [15]. The emotions conveyed in news text are usually implicit and restrained. By analyzing the emotions in news text, different emotions can be distinguished more precisely and comprehensively, thus improving the accuracy of sentiment analysis in other texts [16]. The RoBERTa pretraining model used in this article is based on the BERT model with some adjustments. BERT stands for Bidirectional Encoder Representation from Transformers, which is a pre-trained language model. The pretraining architecture of BERT is shown in Figure 1.
BERT uses a bidirectional Transformer, which can consider contextual information from both the left and right sides of each word. BERT has two pretraining stages: Masked Language Model (MLM) and Next Sentence Prediction (NSP).
Compared to BERT, RoBERTa uses a more complex masking strategy during the MLM phase. Instead of randomly masking some tokens, it masks all WordPiece tokens. It allows the model to learn relationships between tokens more accurately during pretraining. RoBERTa also uses larger, richer, more diverse datasets for pretraining and trains for more iterations. In addition, it employs a loop learning algorithm that effectively avoids overfitting problems. RoBERTa does not use a fixed vocabulary size as the WordPiece symbol table during pretraining; instead, it dynamically adjusts the size based on the frequency and updates it during training. This method enables RoBERTa to model low-frequency words encountered in current tasks effectively and accurately. RoBERTa has removed the NSP pretraining task from BERT's pretraining phase because it believes that the NSP task doesn't directly improve the performance of the BERT model and, in some cases, may even reduce its performance. This benefit allows the model to focus more on learning the MLM task, improving the model's performance on downstream tasks, and reducing the training time and computational resources required. Additionally, removing the NSP task can make the pretraining model more generalizable because it no longer needs to train across sentences, making it more effective at processing individual sentences. These improvements enable RoBERTa to capture relevant text data information better and be more effective and robust in various natural language processing tasks.
This article proposes a deep learning model called RBTM for effective sentiment analysis in internet news, based on a pre-trained RoBERTa model and fusion with a bidirectional long short-term memory network (Bi-LSTM). The performance of the model is validated through a series of comparative experiments. As news text has an organized structure consisting of a title, summary, main body, and user comments, this article processes these four parts separately to reduce hardware requirements during training. The processed features are then combined into a feature matrix and inputted into the Bi-LSTM layer to extract text features. Finally, using max pooling and softmax activation functions, the text information is classified into three categories: negative (−1), neutral (0), or positive (1). Figure 2 shows the processing flow of news information using our proposed RBTM model.
After being processed by RoBERTa, the word vectors are combined into a feature matrix. At this point, the feature 93270 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  matrix contains all information from a news article's title, summary, main text, and comments. This information comes from converting text information to word vectors during data processing. To analyze the emotional information of each sentence in a news article's text, it is necessary to use the emotional tendency of the word vector as the emotional attribute of each sentence. In this paper, we define the emo function as the emotional attribute of individual words in news articles with formula (1).
Among them, word i,k represents the k − th word in s i , where m represents the number of words in s i . When word i,k is a sentiment word, estimate(word_i, k) = 1; otherwise, it is 0. After calculating through the emo function, each sentence contains a series of emotional attributes of words. Then by adding up these emotional attributes according to formula (2), we can obtain the quantitative result of single sentence sentiment tendency.
E i Represents the sentiment of the i − th sentence. The sentiment result of the calculated sentence must be inputted into a Bi-LSTM network for feature extraction along with the feature matrix composed of word vectors. In a unidirectional recurrent neural network, the model only utilizes information from the ''previous context'' and does not involve ''subsequent context.'' In practical applications where sequences are predicted or classified, it is necessary to consider information from the entire sequence. Therefore, bidirectional recurrent neural networks have gradually entered people's vision and are widely used. As an extension of recurrent neural networks, long short-term memory networks can naturally be combined with a reverse sequence to form a bidirectional long short-term memory network. The content of news also has certain contextual factors, and each sentence needs to combine the context of the entire article to determine its emotional inclination. Based on these points, we selected bidirectional long short-term memory networks as our classification model in order to mine text contexts effectively. Bidirectional long short-term memory networks consist of two independent LSTM units that process data simultaneously in both forward and backward directions; combining output data from both directions serves as input for subsequent layers so that contextual information can be considered comprehensively. The structure diagram of Bi-LSTM is shown in Figure 3.
In the figure, the first row of LSTM blocks is forward LSTM and its output h t depends on the current input x t and the previous output h t−1 . The second row of LSTM blocks in the figure is backward LSTM and its output h t depends on the current input x t and the next output h t+1 . Therefore, it can be seen that the current output h t is jointly determined by x t ,h t−1 , and h t+1 . The calculation formula for h t is shown in equation (5).
The α and β in the formula are weight coefficients. The pre-trained model RoBERTa is already a mature neural network model that can be directly applied to sentiment analysis tasks. However, this article uses the output of the pre-trained model RoBERTa as input for Bi-LSTM to extract high-dimensional features from text through a deeper network model. In addition, it learns text features by further contextualizing them. One characteristic of pooling is to output a fixed-size matrix, which can reduce the dimensionality of the output while retaining important features and fuse the output features, effectively solving the problem of model overfitting. In this article, the max pooling operation takes the maximum value along news data length and embedding dimensions, with the calculation formula shown in equation (6). maxpooling = max(X hidden , d i = seq l en) ∈ R batch s ize×dim (6) Among them, maxpooling represents maximum pooling. X hidden is the hidden layer sequence, seq l en is the length of the news text, d i is the vector dimension, dim is the embedding dimension, and batch s ize is the number of news texts. The max pooling layer in this article involves taking the output of the last hidden layer of a Bi-LSTM and performing   max pooling on it. It is done to highlight key information contained within each neuron's hidden state output vector, focusing on essential details within the text because the sentiment polarity expressed through critical information primarily reflects the overall sentiment tendency of the entire text. The text data used in this article comes from Twitter. As Twitter provides an API, we can quickly obtain a large amount of tweet information and comment content related to news texts. Considering factors such as dissemination timeliness and comment quantity, we automatically obtained 6,124 news-related tweets through programming within the time span of one month ago to one week ago. We divided them into the training set and test set in a ratio of 7:3, with the number of annotated samples being 4,287 for the training set and 1,837 for the test set. We selected three categories with relatively balanced information quantities to avoid imbalanced data affecting training results during dataset acquisition and annotation. The loss function used in this article is the cross-entropy loss function, as shown in formula (7).

IV. EXPERIMENTS
Following the method introduced in the previous section, this article uses news content on Twitter as a dataset. First, the annotated text information in the dataset is converted 93272 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  into word vectors for subsequent input into RoBERTa for information extraction. The transformed word vectors include news headlines, summaries, body text, and comment content. This vector information is mapped to a two-dimensional plane using the t-SNE algorithm and displayed in Figure 4. The figure shows that there are more data points in the lower part, the data points in the upper left corner are clustered into small clusters, and the data points in the upper right corner are dense and isolated. Some gaps in the overall distribution of data can be temporarily classified using these gaps as partition boundaries. We use 70% of the data from the dataset as a training set, which includes category information for news articles. We input word vectors belonging to different parts of news articles into RoBERTa separately and then combine them as inputs to Bi-LSTM. This way, we can obtain contextual text correlation information and improve classification accuracy by combining contextual content throughout an article. We also compared several other models for sentiment analysis on news texts; their loss functions during training concerning epochs are shown in Figure 5.  Figure 5 shows that the loss function values for each model decrease gradually with increasing training epochs but at different speeds. RoBERTa-GRU has a slower optimization speed during training and reaches a stable value after about 55 iterations. The optimization speed then slows down and achieves good results after 119 iterations. BERT-LSTM has a faster optimization speed than RoBERTa-GRU. Still, the loss function value does not continue to decrease after reaching good results at around 43 iterations, possibly due to being stuck in a local optimum. It starts optimizing again at about 51 iterations and achieves good results at 64 iterations. RBTM has the fastest optimization speed during training as its loss function immediately decreases from the start of movement without any apparent signs of getting stuck in local optima; its loss function continues to decrease until it reaches similar optimal results as the other two models at approximately 49 iterations. After completing the training, use the test set data to test the model. The confusion matrix of the test results is shown in Figure 5.
The figure shows that the text sentiment classification problem in this article belongs to a three-class problem, and the confusion matrix size is 3*3. Based on the confusion matrix in Figure 6, it can be converted into three binary classification confusion matrices to calculate Precision, Recall, and Macro-F1 separately. Then averaging the evaluation values of these three binary classification confusion matrices can serve as an evaluation standard for the three-class problem. Table 1 shows the classification performance of the RBTM model on the test set based on the three binary confusion matrices mentioned above. The test set consists  of 1837 samples, and the evaluation metrics for each of the three categories are presented in the table. The results show that this model has a slightly lower accuracy in analyzing negative sentiment compared to neutral and positive sentiment analysis. This is because language expressing negative sentiment tends to be more subtle and indirect, making using explicit vocabulary in formal writing inappropriate. Table 2 shows the results of using Macro-F1 as the evaluation metric for three different methods. From the results, it can be seen that our RBTM model has better training speed and prediction performance than the other two models. The training time of RBTM model is only 41.18% of RoBERTa-GRU's training time.
Because there is a certain degree of randomness in the optimization process of the model, in practical applications of intelligent news analysis, we tend to focus on improving the recall rate of the model's predicted results. Generally speaking, as shown in Figure 7, this method can effectively analyze the emotional tendency of news content in practical applications. Compared with manual inspection, it can save 32% personnel allocation and reduce analysis time for 49% of news content in departmental work.

V. DISCUSSION
The digitization of the information environment has revolutionized the news industry, transforming the traditional value chain of content production, distribution, and consumption.
With the advancement of internet technology, artificial intelligence (AI) has emerged as a new paradigm, continuously reshaping the news industry's value chain. Previously, news articles were crafted by journalists who reported on events through field interviews and information gathering, requiring professional training and corresponding occupational qualities. These personal experiences and craftsmanship led to industry norms such as journalism professionalism to regulate journalists' practices. However, these standards are not mandatory, and authors have different perspectives, leading to varying content quality. Therefore, analyzing and supervising news content is necessary to prevent adverse effects on institutions or society. Integrating AI technology and journalism has demonstrated AI's growing influence on journalism, especially with technological innovation iterations over time. This article's experimental results show that the RBTM model has better predictive performance with higher accuracy and shorter training time, making it an effective method for monitoring public opinion trends related to news issues. The integration between AI technology and journalism demonstrates how AI's influence on journalism deepens with technological innovation iterations over time following a flow from ''news production -distributionconsumption'' rather than reverse order due to its natural attribute in underlying information production processes closely related to AI technology's impact on the three stages mentioned above.

VI. CONCLUSION
This article focuses on the sentiment information extraction method based on mass media and discusses the development value of the intelligent news industry. Specifically, we constructed a deep learning-based internet news sentiment analysis model and verified its effectiveness through relevant comparative experiments. Our experiment demonstrated the superiority of the RBTM model proposed in this paper regarding training time and prediction effect, with a Macro-F1 evaluation index value of 0.8172. This method can effectively analyze the emotional tendency of news content in practical applications, resulting in a 32% reduction in personnel configuration compared to manual inspection and a 49% reduction in news content analysis time in overall departmental work. Furthermore, this method can assist in identifying the emotional tendencies hidden in the news and provide a theoretical basis for public opinion monitoring and early warning by relevant institutions. However, in our study of the three-classification research on news sentiment, we found that the classification effect on neutral emotion headlines was not outstanding, mainly due to the inadequate classification effect on implicit emotion short sentences. We also noted that data size and ratio are crucial in model performance. As a result, our next step will focus on optimizing long-short text processing, adjusting model parameters for sample imbalance, and exploring better internet news sentiment analysis models.

ACKNOWLEDGMENT
The author would like to thank the anonymous reviewers whose comments and suggestions helped to improve the manuscript.