Exploiting Long-Term Dependency for Topic Sentiment Analysis

Most existing unsupervised approaches to detect topic sentiment in social texts consider only the text sequences in corpus and put aside social dynamics, as leads to algorithm’s disability to discover true sentiment of social users. To address the issue, a probabilistic graphical model LDTSM (Long-term Dependence Topic-Sentiment Mixture) is proposed, which introduces dependency distance and uses the dynamics of social media to achieve the perfect combination of inheriting historical topic sentiment and fitting topic sentiment distribution underlying in current social texts. Extensive experiments on real-world SinaWeibo datasets show that LDTSM significantly outperforms JST, TUS-LDA and dNJST in terms of sentiment classification accuracy, with better inference convergence, and topic and sentiment evolution analysis results demonstrate that our approach is promising.


I. INTRODUCTION
With the rapid development of social media and intelligent mobile devices, the public can conveniently express their opinions and share their feelings regarding social, economic or political issues through online social platforms like Twitter and Facebook. The various types of user generated contents(UGCs), such as microblogs and online product reviews, are usually opinionated and topic-oriented, and present a valuable source of information for human intelligent decisions. For instance, government departments would like to effectively manage public opinions of important political events via mining text content in social media systems. Manufacturing enterprises may improve their production strategy through analyzing users' valuable feedback from online product reviews. In this context, sentiment analysis in social media has received significant attentions from governments, enterprises and academic institutions.
Great efforts in techniques to automate sentiment identification in diversified text data have flourished in the recent years [1]- [3]. Most of the existing works treat text sentiment identification as a text classification problem and exploit supervised machine learning techniques to estimate The associate editor coordinating the review of this manuscript and approving it for publication was Feng Xia . the sentiment distribution of texts. Those supervised methods usually do not take text topic into consideration while analyzing text sentiment, as ignores the fact that sentiment polarities of texts are closely related to their topics in social media. For instance, the adjective ''complicated'' may be negative in sentiment space when it occurs in a review about a movie character, whereas it conveys positive sentiment in a message commenting on movie plot. The ignorance tends to degrade the performance of text sentiment analysis. To alleviate the drawback, generative models are proposed to model the joint distribution of sentiment and topic with parameters that can be interpreted as reflecting latent associations between topics and sentiments in the data. These generative models are robust and require no or little human effort in topic sentiment distribution estimation, thus are widely used in sentiment analysis tasks.
In online social networks, public sentiments towards entities such as events, services and organizations are usually time-varying. The social media data shares some characteristics of time series, as makes it appealing for researchers to leverage time series analysis tools for sentiment tracking. Various sliding time window strategies are proposed, which can obtain sentiment distribution of different time periods by applying some off-the-shelf sentiment analysis methods to UGCs in different windows and further acquire sentiment VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ evolution patterns via time series analysis techniques. Most of the sliding-window-based sentiment analysis approaches take an assumption that sentiment patterns in different time windows are independent and identically distributed. However, the assumption is often not true. For example, sentiments of Twitter posts in adjacent sliding time windows are to some extent dependent. When a typical generative model, such as JST [4] or TSM [5], is chosen to estimate sentiment distribution of Twitter posts in a given time window, it may not achieve satisfactory results. Specially, true topic sentiment patterns of social media content in current time window may not be discovered, because how sentiments of UGCs in current time window impact those in current time window is not taken into account in most of the existing generative models. And on the other hand, prior Dirichlet distributions initialized with random hyperparameters may be not good start points to search for the optimal parameters of the generative models, and thus slow down convergence of model inference.
Considering an example as shown in Fig 1, there are three tweets posted by the same twitter user in different days. According to a given sliding window size, tweets M1, M2 and M3 belong respectively to windows W1, W2 and W3. If utilize a traditional generative models to determine sentiment polarities of the three tweets, we may have that M2 and M3 are negative and positive respectively, since sentiment polarity of word ''pitiful'' in M2 is negative, and sentiment polarity of words ''lovely'' and ''sunny'' in M3 is positive, and there is no explicit sentiment word in M1. However, if taking into consideration context of tweet M3, i.e., sentiment polarity of its previous tweet M2 is inherited while inferring sentiment polarity of tweet M3, it is not difficult to determine that sentiment polarity of M3 is negative.
From the example we can see that the sentiment topic independency assumption about UGCs in different time windows may lead to unsatisfactory sentiment analysis results.
To address the issue, in this paper we propose a weaklysupervised generative approach based on sentiment consistency in sociology [6]. In particular, motivated by natural immediacy of social media and subtle dependency between text topics and text sentiments, and based on assumptions that 1) topics of texts are generated according to sentiment distributions of texts; 2) roughly similar is sentiment and topic conveyed in social media content posted by the same user over adjacent periods of time. We devise a novel topic sentiment model dubbed LDTSM (Long-term time-Dependent Topic Sentiment Model), which embeds an additional sentiment layer into LDA (Latent Dirichlet Allocation) [7] and takes topic and sentiment dependency existing in adjacent periods of time into consideration while inferring text sentiment. We estimate the model parameters using the Gibbs sampling approach. For performance evaluation, we apply the LDTSM model to four real-life datasets from SinaWeibo (http://weibo.com/) and compare its performance with stateof-the-art sentiment analysis models JST, TUS-LDA and dNJST [38]. The experimental results indicate that the proposed LDTSM outperforms the other models in terms of sentiment classification accuracy and can efficiently discover topics with good interpretability and efficiency. Moreover, LDTSM can effectively achieve evolution analysis of topics and sentiment.
Our contributions in the paper mainly include • Based on sentiment consistence theory and natural immediacy of social text stream, a unified weaklysupervised generative learning framework is proposed for sentiment analysis in social media.
• Based on the rationale of social text sentiment longdistance dependency, we invent a novel generative model dubbed LDTSM under the proposed framework and make inferences about model LDTSM with Gibbs sampling techniques.
• Through an extensive set of experiments on real-world data sets, we verify that our method can give very promising results in text sentiment analysis in social media. The rest of the paper is organized as follows. We review related work in Section 2. Related definitions are simply introduced in Section 3. Section 4 and Section 5 describe the proposed framework and LDTSM model respectively. Experimental results are discussed in Section 6. Finally, we conclude the paper in Section 7.

II. RELATED WORK
Our approach is closely related to sentiment evolution analysis and generative approaches for text sentiment analysis, and we review some of the most relevant work here.

A. SENTIMENT EVOLUTION ANALYSIS
Sentiment evolution analysis has received significant attention due to its increasing importance, and numerous methods for sentiment evolution analysis have emerged recently. Existing methods for sentiment evolution analysis are roughly classified as online sentiment analysis and offline sentiment analysis.
Due to the demands of real time analysis, majority of online sentiment analysis methods is lexicon-based. Core idea of lexicon-based sentiment analysis is to tag discrete labels such as positive, negative and neutral or assign a continuous sentiment intensity value for each word in documents to be handled via retrieving a certain handy sentiment lexicon, and compute a final sentiment label or sentiment intensity for the documents with some fusion strategies such as highvoting or linear average. It is a formidable challenge to construct a high-quality sentiment lexicon. Taboada et al. [8] proposed a lexicon-based approach called SO-CAL, which takes into account valence shifters (intensifiers, downtoners, negation, and irrealis markers) when calculating semantic orientation of the extracted sentiment-bearing words. Ortega et al. [9] utilized WordNet and SentiWordNet for sentiment polarity detection and managed to achieve good results on the SemEval-2013 dataset. Al-Ayyoub et al. [10] construct a sentiment lexicon of about 120,000 Arabic terms and built a lexicon-based sentiment analysis tool. Trinh et al. [11] proposed a lexicon-based method for sentiment analysis in Facebook comments in Vietnamese language. Hazimeh et al. [12] applied a lexicon-based approach to track sentiment of large-scale social events with essential features and auxiliary features. When determining sentiment scores of social text sequences with lexicon-based approaches, each word is assigned with a fixed sentiment. However, the actual sentiment behind a word may be depended on its context of the complete sentence and may change across time.
Considering that social media data is usually time-ordered, researchers attempted to make the utmost out of timestamp information from social media for offline sentiment analysis. Specially, it mainly consists of steps as follows. First, the original social corpus is partitioned into several text subsets based on a particular time granular; secondly, a suitable machine-learning method is chosen to compute sentiment score of each text subset; finally, sentiment time series is constructed with the computed sentiment scores and used for subsequent sentiment analysis tasks such as prediction, clustering and so. To explore quantifiable relationship between overall public mood and macroscopic social and economic indicators, Bollen et al. [13] constructed two sentiment time series to analyze long-term sentiment drifts and short-term sentiment fluctuations. Based on the different propensity users have on disclosing positive and extreme feelings. Guerra et al. [14] proposed a feature representation strategy that focuses on terms which appear at spikes in the social stream for sentiment evolution analysis. Bifet and Frank [15] developed a sliding window based on Kappa statistic for quantifying the predictive performance of twitter text stream sentiment classifiers. Si et al. [16] applied a continuous Dirichlet Process Mixture model and a moving prediction process under sliding windows to regress the stock index and the twitter sentiment time series. Sentiment time series is constructed using sentiment scores of sentiment words over a sliding time window to achieve multi-document sentiment prediction [17]. Ranco et al. [18] explored dependency between the sentiment polarities of twitter event windows and stock price returns. Giachanou et al. [19] attempted to utilize conventional time series analysis techniques, such as frequency analysis, outlier detection and time series decomposition, to track sentiment in social media. Yang et al. [20] designed a location-based dynamic sentiment-topic model which can jointly model topic, sentiment, time and geolocation information, with an attempt to track sentiment shifts in different geographical regions. Giachanou et al. [21] proposed to leverage time series outlier detection technique for sentiment spike identification, and combine LDA and relative entropy to extract the topics and compute their contribution to the sentiment spikes. Wang et al. [22] proposed an iterative algorithm SentiDiff to improve Twitter sentiment analysis using sentiment diffusion patterns. All the above-mentioned offline sentiment analysis approaches are efficient and intuitive, but are built on an unconvincing assumption that social data in different time windows are independent and identically distributed.

B. GENERATIVE MODELS FOR SENTIMENT ANALYSIS
As a statistical learning paradigm, generative model excels at expressing directly observable and even indirect relationships between the observed and target variables. This property is of particular importance in sentiment analysis, since there exists subtle dependency between sentiment and text sequence content.
Various generative models have been proposed for sentiment analysis of different granularity levels, including document-level, topic-level, aspect-level and term-level, based on characteristics of social media such as small world effect of social network, preference of social content users, multi-modality of social media, to name a few. Zhao et al. [23] proposed a MaxEnt-LDA hybrid model to jointly discover both aspects and aspect-specific opinion words. Lin et al. [4] designed a joint sentiment-topic model JST by adding sentiment lay in LDA to reflect dependency between word topic and word sentiment. Aiming to discover users' interests about different sentiment topics in texts, Almars et al. [24] presented a hierarchical user sentiment topic model HUSTM, where each word in a document is associated with three latent variables: a user, a topic, and a sentiment. He et al. [25] proposed a weakly-supervised latent sentiment model, which replaces topic layer in LDA with sentiment layer and acquires sentiment prior information from English sentiment lexicons via machine translation. Hai et al. [26] presented a supervised joint topic model SJASM, which leverages the inter-dependency between the aspect-based sentiment and overall sentiment, and estimate the overall sentiments of reviews via a normal linear model. Liang et al. [27] designed a universal affective model, consisting of topic-level and term-level sub-models, to identify readers' emotions hidden in short texts. Motivated by the item response theory, Lin et al. [28] proposed a topical user item response model, which assumes that user's review towards a specific item is jointly determined by the individuality of the user and attribute of the item. By taking into account user preferences and thought patterns, Poddar et al. [29] built an author-aware aspect topic sentiment model Author-ATS, which treats a word as a combination of a hierarchy of aspect, topic and sentiment. Rahman et al. [30] came up with a hidden topic sentiment model HTSM to extract latent aspects and corresponding sentiment polarities by imposing constraints topic coherence and sentiment consistency. Tang et al. [31] proposed a joint aspect-based sentiment topic model JABST to simultaneously extract multi-grained aspects and corresponding sentiment polarities. Borrowing the idea of lifelong machine learning, Wang et al. [32] developed jointly aspect sentiment model LAST (Lifelong Aspectbased Sentiment Topic), which can simultaneously identify aspects, opinions, opinion polarity and opinion generality, and achieve knowledge transfer in different domains. With an attempt to address data sparse problem in social short texts, Xiong et al. [33] constructed a word-pair sentimenttopic model WSTM, where generation elements are wordpairs from the corpus, rather than words in traditional models. Inspired by the observations that users in the same community usually travel on nearby regions and share common activities and topics, Zhang and Chow [34] proposed a LDA-based model called CRATS to jointly mine the latent communities, regions, activities, topics, and sentiments based on the important dependencies among these latent variables. Wang et al. [35] provided a probabilistic model PSEM (Public Sentiment Evolution Model) to simultaneously model the evolution of opinion for social incidents. Huang et al. [36] devised a multimodal joint sentiment topic model, which utilizes emoticons to perform unsupervised sentiment analysis. Gui et al. [37] proposed a multitask learning framework which jointly learns a sentiment classifier and a topic model by making the word level latent topic distributions in the topic model to be similar to the word level attention vectors in sentiment classifiers through mutual learning.
The aforesaid generative approaches do not take natural dynamic of social media into consideration, thus cannot succeed in revealing true sentiment patterns hidden in the social media data to be analyzed. Very few efforts have been paid for utilizing time information related with social media content to detect text sentiment polarities. Fu et al. [38] proposed a dynamic non-parametric joint sentiment topic mixture model dNJST, which adds a sentiment layer and time decay dependencies of historical epochs to the current epochs in hierarchical Dirichlet process topic model. Different from dNJST, which only takes into account historical topic distribution and stick-breaking construction sentiment parameters while inferring topic sentiment distribution of social media data in current time window, our model LDTSM consider both sentiment distributions, topic-sentiment distributions, and topicword distributions from historical time windows.

III. DEFINITION OF TERMINOLOGIES
To facilitate presentation, we define the basic terminologies we will use in this paper.
Definition (Timeslice): Each user-generated post has a timeslice that specifies when the user publishes the post, e.g., an hour or a day. Size of timeslice is determined by knowledge engineers according to the handy task.

Definition (Topic):
A topic is a subject of conversation or discussion, which is mathematically formalized as a set of words. It is also represented as a discrete distribution over words in a fixed vocabulary.
Definition (Sentiment): Sentiment represents a user's emotional feelings about a particular entity. It can be denoted by a discrete random variable taking value from S = {l 1 , l 2 , · · · , l |S| }, e.g., positive or negative, or a random continuous variable in some value range, e.g., [0,1] Definition (Sentiment-aware topic): A sentiment-aware topic is a topic labeled with a sentiment polarity. For example, the overall sentiment of the topic ''riots in Hong Kong'' is negative, hence the topic ''riots in Hong Kong'' is a negative topic.
Based on the above definitions, we strive to develop a probabilistic generative framework (Section IV) and design a long-term time-dependent topic sentiment model (Section V) based on weakly-supervised learning for automatic detection of sentiment-aware topics in social media.

IV. THE PROPOSED FRAMEWORK
In this section, we illustrate our sentiment analysis framework in Figure 2.
For a sentiment analysis task, for instance, to investigate into public opinions of political event ''Ethiopian Airlines Boeing 737 MAX crash'', we first collect the relevant posts and their timestamps from various sources such as Tweeter and Facebook with the help of API (Application Programming Interface) provided by the social media service or the web crawler from some third party. Following that, some noisy posts are removed from the collected posts with preprocessing techniques like keyword-based matching. The preprocessed posts and sentiment priors (sentiment dictionaries) are then saved in database systems. Afterwards, according to post's timeslice, sliding window technique is applied to divide the collected data in the database into several small datasets. Finally, some probabilistic generative approach is chosen to learn the topic sentiment hidden in each small dataset. It is well worthy of highlighting that parameters of historical generative models exert some influence on corresponding ones of the current model. Specially, parameters of model M t is initialized with parameters of previous L models {M t−1 , M t−2 , . . . , M t−L }.

V. THE PROPOSED MODEL
In this section, we firstly introduce the notation and formally formulate our model. Then, we give the method for learning parameters. Finally, we present the method to conduct topic sentiment analysis with the learned distributions. To allow for detailed illustration, we list the relevant symbols and notations in Table 1.

A. MODEL DESCRIPTION
Key assumption of JST model is that a document is a compound distribution generated by different member probability distributions over words, and each member probability distribution corresponds to a sentiment. The existing framework of JST consists of hierarchical layers as follows: document layer, sentiment layer, topic layer and word layer. Through the four layers, documents are assigned with sentiment labels, under which topics are associated with sentiment labels and words are associated with both sentiment labels and topics. Although JST has received a great success in text sentiment analysis, estimating JST model parameters is very timeconsuming. Moreover, time information is not taken into consideration when JST attempts to learn topic sentiment patterns in texts.
With the aim of better modeling document sentiments, a long-term time-dependent topic sentiment model (LDTSM) model (Figure 3) is proposed and integrated into the above framework. From Figure 3  sentiment distribution π t and the sentiment-topic distribution θ t respectively. It is assumed that there exists a social corpus D consisting of timestamped posts, denoted as D = {d 1 , d 2 , · · · , d |D| }. Vocabulary of D is denoted as V . Each post d i in the corpus is generated by a user within a timeslice and the post d i contains a bag of words, denoted as With sliding window preprocessing operation, D is represented as D = {D 1 , · · · , D t , · · · , D W }, where D t denotes the posts posted in timeslice t, W denotes index of the last time window. Formally, the generative process for each post in time window t is formalized as a procedure named Post_Generator.
From the graphical representation and generative process of LDTSM, we can see that LDTSM has the following characteristics: (1) In terms of topic detection and sentiment identification, LDTSM takes into social inheritance of sentiment and topic when learning sentiment topic patterns of posts in current time window, which is different from traditional probabilistic generative models; (2) As far as efficiency of model parameter estimation, LDTSM may be more efficient to find optimal parameter configuration, since parameter initialization based on historical information is better than completely random guess.

7.
Draw π t,d,l ∼ Dir approximate inference methods mainly include sampling (Monte Carlo) methods, variational methods and loopy belief propagation. Here Gibbs sampling technique is adopted to estimate parameters of LDTSM due to its intuitive simplicity and high efficiency. According to the probability graphic model (depicted in Figure 3) and Bayes chain rule, the joint probability of words, topics and sentiments can be factored in Eq. (1). For the three terms in right hand of Eq.(1), we can obtain the three probabilities computation by integrating out π t , φ t and θ t , as can be formalized as equations (2), (3) and (4).
where n t,l,z,v denotes the frequency of word v assigned to both sentiment l and topic z in timeslice t, n t,l,z denotes the total frequency of all words assigned to both sentiment l and topic z in timeslice t.
where n t,l denotes total frequency of all words with sentiment label being l in timeslice t.
where n t denotes the word-granular size of document d.
With the joint distribution, posterior distribution P( (5), where the subscript −i denotes a count excluding the current assignment.
With formula (5), document-sentiment distribution π t,d,l in timeslice t, document sentiment-topic distribution θ t,d,l,z in timeslice t, and sentiment-topic word distribution φ t,l,z,v in timeslice t (formulae (6), (7) and (8)) can be approximated respectively using samples obtained from the Markov chain.
Gibbs sampling is one of the simplest Monte Carlo sampling procedures. It starts with a random setting of hidden states and then updates each hidden state, according to the probability distribution conditioned on all the other states and the fixed parameters. Similar to JST_Gibbs [4], a complete overview of Gibbs sampling procedure corresponding to LDTSM is given in Algorithm 1.

Algorithm 1: LDTSM_Gibbs
Input: Corpus D t , historical parameters (π t−m , θ t−m , φ t−m ) Output: Current model parameters π t , θ t,l and φ t,l,z 1 Iter = 1; 2 while Iter < MaxIterations do 3 for each document d ∈ D t do 4 for each word w ∈ [1, V ] do 5 Exclude word w with being given sentiment polarity and topic, and update count variables n t,l − = 1, n t,l,z − = 1 and n t,l,z,w − = 1;

C. SENTIMENT TOPIC ANALYSIS
Based on the proposed framework and model, we formalize the procedure of sentiment topic analysis as an algorithm, named LDTSM_Analyzer (Algorithm 2). LDTSM_Analyzer mainly consists of three subprocedures: 1) text preprocessing and corpus partition (step1-step2), 2) estimating LDTSM parameters via Gibbs sampling (step4-step9), and 3) topic detection, sentiment classification and evolution analysis of topic-sentiment with the estimated sentiment distribution and sentiment-topic distribution (step10-step11).

Algorithm 2: LDTSM_Analyzer
Input: Corpus D, timeslice granularity g, dependency distance M Output: topic label and sentiment polarity of each document in D 1 Construct vocabulary for D via NLP preprocessing techniques such as word segmentation and stemming ; 2 Partition D into D = {D 1 , D 2 , · · · , D p } with parameter g ; 3 for i = 1 to |D| do 4 Utilize sentiment dictionary to initialize sentiment distribution for words in D i ; 5 if t < M then 6 Apply JST_Gibbs to estimate parameters π t , θ t,l and φ t,l,z ; 7 else 8 Apply LDTSM_Gibbs to estimate parameters π t , θ t,l and φ t,l,z ; 9 end 10 Use the estimated sentiment distribution π to determine the sentiment polarity of each document d in D i : if π d1 > π d2 , then d is positive, otherwise d is negative ;

11
Use the estimated φ and θ to extract topics and conduct evolution analyzation topic and sentiment ; 12 end Time complexity of LDTSM_Analyzer can be computed as follows. Vocabulary construction (step 1) and corpus partition (step 2) are one-shot deals for LDTSM_Analyzer, for simplicity, and can be excluded as data preprocessing during time complexity analyzation of LDTSM_Analyzer. Time complexity of document sentiment polarity determination (step 10) and topic extraction (step 11) is O(|D |). Parameter estimation(step5-step9) is the most computation-intensive. By comparing the computation process of JST_Gibbs and LDTSM_Gibbs, it is not difficult to find that computation of LDTSM_Gibbs is more expensive. With the formula (6), (7) and (8), size of tensors π, θ and φ is L * |D| * S, L * S * T and L * S * T * V, respectively. Apparently, the inequality L * S * T L * |D| * S L * S * T * V is true. Therefore, time complexity of parameter estimation (step5-step9) is O(L * S * T * V). From the above analysis, we conclude that time complexity of LDTSM_Analyzer is O(L * S * T * V).

VI. EXPERIMENTAL STUDY A. DATASETS
Considering few manually-labeled corpuses with temporal information are available, we constructed four real-world datasets to evaluate the proposed model. Specially, message content, update time and message author are automatically collected through using the search API of Sinaweibo.
For the collected microblogs, we perform some preprocessing steps such as removing duplicate microblogs and VOLUME 8, 2020  zombie users, filtering out the too-short microblogs. To obtain convincingly sentiment label of the collected microblogs, we employ three volunteers to independently assign a sentiment polarity tag for each message, and conduct check consistency for the labelled result with Kappa test (Table 3). From Table 3, we can see that, sentiment distribution of data1 is fuzzier than the others, and compared to the other two datasets, consistency of sentiment annotation is higher in the hot-topic-focused datasets linda and thaad.
For the inconsistent annotations from the volunteers, we determine the final sentiment polarity labels, according to high-voting principle. The four datasets are detailed in Table 2, where PNRatio column denotes the ratio of the number of positive instances to the number of negative instances, and column ''Focused'' denotes the fetching method in constructing the four datasets. In particular, dataset lindan and thaad is topic-driven and the other two are not topic-focused. Dataset lindan is focused on the event ''Extramarital affair of badminton great Lin Dan'', and dataset thaad is focused on the event ''THAAD Deployment in South Korea''.

B. SENTIMENT CLASSIFICATION
In this section, we evaluate sentiment classification performance of our approach: 1) comparative analysis between LDTSM and other state-of-the-art methods in terms of sentiment classification accuracy; 2) LDTSM classification performance's sensitiveness to the number of topics and dependency distance.

1) METRIC
To evaluate performance of methods for text sentiment classification, we adopt the widely-used metric accuracy.
where M is the number of correctly classified samples, N is the size of corpus to be analyzed.

2) SENTIMENT CLASSIFICATION ACCURACY ANALYSIS
In view of non-supervision, nature of LDTSM, we select three state-of-the-art probabilistic graphical models ( [4], dNJST [38], TUS-LDA [39]) and a typical libSVM-based (features: 1-grams and 2-grams) supervised learning model as competitors. Experimental results on four datasets are listed in Table 4. According to Table 4, we have the following findings: 1) In the four probabilistic graphical models, sentiment accuracy of JST is lower than those of the other three, namely, dNJST, TUS-LDA and LDTSM. This demonstrates that suitable utilization of time information is beneficial to improve sentiment detection performance.
2) LDTSM outperforms all the other unsupervised learning competitors in terms of sentiment classification accuracy on the four datasets, especially dNJST and TUS-LDA, as indicates that, compared to dNJST and TUS-LDA, sentiment inheritance is proposed in LDTSM to make time information get more reasonable utilization. 3) There is a small gap in classification accuracy between LDTSM and SVM. If take into consideration high cost of acquiring sentiment polarity labeled training corpuses, the gap is acceptable in most cases. It is worthy of noting that this observation agrees with Pang's conclusion [40], i.e., SVMs based on bag-of-unigram features can make excellent result in sentiment classification in short text corpus.

3) SENTIMENT CLASSIFICATION ACCURACY VS. NUMBER OF TOPICS
Considering that text sentiment polarities are intertwined with its topics and the number of topics is predetermined in LDTSM, we attempt to explore how the predetermined parameter exerts influence on sentiment classification of LDTSM. A group of experiments on LDTSM with topic number T ∈ {1, 5, 10, 20, 30, 40} is conducted and the performances of sentiment classification are shown in Figure 4. In Figure 4, topic number has diverse effects on sentiment classification performance of LDTSM on different datasets. Specially, classification accuracy of LDTSM reaches its maximum when topic number is set as 5 in datasets lindan and thaad, 30 and 40 in datasets data1 and data2. The observations may be explained as follows: too small T pays less attention to the correlation between topic and sentiment and may degrade LDTSM into LDA focused on sentiment detection, and leads to low sentiment classification accuracy. And too great T may arbitrarily break some intact topics into some fragments of noisy topics, and make LDTSM ineffectively recognize  underlying sentiment patterns in text sequences. Examination of Figure 4 indicates that, accuracy of LDTSM on datasets lindan and thaad gradually ascends and descends with the increase of the number of topics, and in datasets data1 and data2, classification accuracy slowly grows with the greater T , except for a few cases: from T = 20 to T = 30 in data2, and a sharp increase from T = 1 to T = 5 in data1 and data2. This is also explained like this: datasets lindan and thaad are topic-focused and may contain less latent subtopics than datasets data1 and data2. It is difficult to accurately set T for a social text sentiment analysis task, but the above experimental results may hint that T should be set as a less value in a topic-focused corpus and vice versa.

4) SENTIMENT CLASSIFICATION ACCURACY VS. DEPENDENCY DISTANCE
Long-term dependency mechanism is one of LDTSM key characteristics, this section aims to explore the impact of dependency distance on sentiment classification accuracy. The experimental results in the first 6 timeslices of four datasets are shown in Figure 5.
As shown in the Figure 5, we can observe that, 1) classification accuracy is highest when dependency distance is set as 3 or 4. Too large dependency distance may prevent LDTSM from effectively inheriting historical sentiment knowledge, and too small one may introduce noisy sentiment, both cases will evidently incur loss of sentiment classification performance. 2) in datasets lindan and thaad, classification performance is basically stable after LDTSM reaches its maximal classification accuracy, and yet decreases significantly in datasets data1 and data2. That topic-focused datasets lindan and thaad are often highly emotional contagious may make sense this observation.

C. TOPIC DETECTION
In this section, we concern about the quality of topics with different sentiment polarities detected by LDTSM. We run LDTSM on the topic-focused dataset lindan and topicdiversified dataset data2, and select top10 topic words from each sentiment polarity category. The experimental results are listed in Table 5. From Table 5, it is not difficult to conclude that the extracted topics seem to be fairly informative and coherent, and can reflect the underlying concerns of microbloggers.

D. EVOLUTION ANALYSIS OF TOPIC AND SENTIMENT 1) TOPIC EVOLUTION
Tracking topic evolution helps decision makers to understand how hot topics produce and develop, and make reasonable predictions. Taking dataset lindan as a sample, we analyze how the extracted topics evolve through Jaccard similarity (formula (10)). Similarity evolution and content evolution of an example topic (topic 4, which is randomly chosen from all the topics extracted from dataset lindan) are shown in Figure 6 and Table 6. From Figure 6, we can see that topic similarity from day 2 to day 5 is high and stable, and there is a sharp fluctuation in time buckets(1-2 and 5-6). Shift of topic words in Table 6 may give a rational explanation for this: topic words '' (having an affair), (man), (wife), (apology), (Lin Dan)'' in day 1 indicates that microbloggers firstly discuss the event ''LinDan apologized for betraying his wife'', ''Apology from Lin Dan'' becomes a continuous focus in the next 4 days, and yet the concerned topic changes from ''LinDan's betrayal'' to ''hating the mistress Zhao Yaqi( )'' in day 6.
where A t denotes the set consisting of probabilistically top 100 key words in topics extracted from timeslice t.

2) SENTIMENT EVOLUTION
Generally, microbloggers' attitudes to different topics develop along different traces. Understanding topic content evolution alone is insufficient for a decision maker, grasping VOLUME 8, 2020   situations of public sentiment hidden in the hot topics is also indispensable. Similar to the topic evolution analysis, here we analyze how microbloggers' sentiment evolves in the hot event ''Extramarital affair of badminton great Lin Dan'' via sentiment knowledge discovery in dataset lindan. Experimental results are depicted in Figure 7. From Figure 7, we can see that, on the whole, sentiment polarity of microbloggers is negative, and negative intensity gradually increases in the event lifecycle, and yet positive intensity gradually decreases, but there is an exception for timeslice 6, where the intensity of sentiment is reduced for negative polarity and enlarged for positive polarity. The fluctuations of sentiment intensity are attributed to the discussed event: it is widely believed that having an extramarital affair is immoral, so most microbloggers hold negative attitudes to Lin Dan throughout the episode. From timeslice 1 to timeslice 3, intensity of negative sentiment climbs slowly, as may be that the anger sentiment of netizens was gradually fermenting, and from timeslice 3 to timeslice 5, microbloggers gradually reconcile to the fact ''Lin Dan's having an extramarital affair'', and key words in Row 6 of Table 6 indicates that apology from the mistress Zhao Yaqi( ) stung the injured heart of netizens again, and so negative sentiment intensity sharply increases at day 6.

E. EFFICIENCY ANALYSIS OF LDTSM
In this section, we investigate efficiency of LDTSM, by comparing running time of LDTSM and the other three probabilistic graphical models, i.e., JST, dNJST and TUS-LDA. Obviously, the number of iterations cannot be used as a time measure, since parameter estimations of different models demand different amount of work in their inner loops. For the sake of fairness, we choose the elapsed CPU time as a measure instead of number of iterations, and record the running time of the four algorithms according to the following rule: if the classification accuracy of an algorithm has achieved the corresponding value in Table 4 before the user-defined maximum number of iterations is met, then we break the loop and record the time used in the loop.
The experiment results are listed in Table 7. From the table, we can see that, 1) compared to JST and TUS-LDA, LDTSM performs betters on the four datasets in terms of inference efficiency, 2) CPU time of LDTSM is as good as dNJST on datasets with easily discernable sentiment polarity patterns, such as lindan and thaad, and yet LDTSM exhibits remarkable advantages on data1, which is with indistinguishable boundary of sentiment polarities, 3) LDTSM and dNJST surpass JST and TUS-LDA in time efficiency of estimating model parameters. Explanation to this observation is obvious. Inheritance of sentiment is taken into account while dNJST and LDTSM learn sentiment topic distribution of social texts posted in current timeslice, as gives a better initialization of model parameters and further produces higher-quality sampling convergence, and datasets with complex and ambiguous sentiment structure will benefit the optimization ability of LDTSM and dNJST. Especially for LDTSM, the advantage is more significant, since LDTSM takes into consideration both sentiment inheritance and topic inheritance, and only sentiment inheritance is considered in dNJST.

VII. CONCLUSION
In this paper, we have presented a new topic-sentiment modelling framework with long-term dependency for both sentiment classification and topic extraction. We evaluate our model on four real-world microblog datasets and show that our model achieves superior performance compared with several strong baselines on sentiment classification. Evolution analysis results on experimental datasets demonstrate that our approach is effective and promising. In future work, we will explore extending this model for sentiment community dynamics analysis, and will apply heterogeneous scheduling algorithms to speed up text sentiment analysis in distributed computing environment.