Challenges and Recommended Solutions in Multi-Source and Multi-Domain Sentiment Analysis

The massive availability of online reviews and postings in social media offers invaluable feedback for businesses to make better informed decisions in steering their marketing strategies towards users’ interests and preferences. Sentiment analysis is, therefore, essential for determining the public’s opinion towards a particular topic, product or service. Traditionally, sentiment analysis is performed on a single data source, for instance, online product reviews or Tweets. However, the need to develop a more precise, and more comprehensive result has steered the move towards performing sentiment analysis on multiple data sources. The use of multiple data sources for a particular domain of interest can increase the amount of datasets needed for training a sentiment classifier. Till now, the problem of insufficient datasets for training the classifier is only addressed by multi-domain sentiment analysis. Aiming to equip researchers with a thorough understanding on both multi-source and multi-domain sentiment analysis, this paper aims to identify the underlying challenges of multi-source and multi-domain sentiment analysis, and discuss the solutions applied by the researchers concerned. This paper also offers an insightful discussion of the findings derived from past studies, and based on these, propose some useful suggestions for the future direction of this research area. Findings derived from our review would be beneficial towards guiding researchers towards the future progress and advancement of multi-source and multi-domain sentiment analysis.


I. INTRODUCTION
The rapid rise of the social media has populated the Internet with online reviews, and user generated contents. Internet users post their contents or views onto social media or websites, either for information sharing, or for personal expressions on various topics -from political issues to products, and services. Among the popular social media used as platforms are Facebook, Twitter, and Instagram. Since user-generated contents are usually produced based on the actual experiences of the users, their opinions are often perceived to be genuine, and reliable by the public in general. This is in accordance The associate editor coordinating the review of this manuscript and approving it for publication was Bin Liu .
to [1] who pointed out that consumers rely on other consumers' experiences to validate the performance of certain products, and services. Previous consumers' experiences, recommendations, ratings, and comments about those products, or services, can influence new consumers' purchase decision. According to [2], online consumers' reviews are more influential than those reviews generated by professionals. This verdict is in line with [3] who said that 90% of the customers' purchase decision depends on online consumers' reviews and comments.
Based on the above, it is deduced that knowing the sentiment polarity of these online reviews, and comments of a product, is therefore, important for businesses. The information can be used to make better-informed decisions so as to VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ steer their marketing strategies towards users' interests, and preferences, thereby generating more profits [4]. This need has given rise to a new branch of information processing, typically known as sentiment analysis [5]. Reference [6] define sentiment analysis as ''the computational process of recognizing and classifying various opinions (thoughts or judgements) expressed in texts''. In accordance with the definition, the purpose of sentiment analysis is, thus to analyse online reviews and comments, and to examine their sentiments' polarity so as to obtain the public's opinion on an issue or a product, whether positive, negative or neutral [7]. However, most existing research works on sentiments analysis rely on a single data source, such as online review sites or Twitter. Such datasets can be bias, and also inadequate to represent the whole public's view on a topic, product, or service [4], [7], [8]. This has led to the recent progress in sentiment analysis which has transitioned from the use of single data sources to multiple data sources. The use of multiple data sources for a particular domain of interest can increase the amount of labelled datasets needed for training a sentiment classifier. Insufficient amount of labelled datasets is a common problem faced by sentiment analysis. Currently, the problem is addressed by conducting multi-domain sentiment analysis. In multi-domain sentiment analysis, a resource-rich domain will be used to transfer sentiment knowledge to a resource-limited domain. However, the involvement of different domains in the sentiment analysis process often incur many problems such as feature mismatch, polarity divergence, polysemy, and sparsity, as demonstrated in previous works [9]- [11], [13], [14]. The need to address these problems, has created the need to develop better techniques and solutions for managing sentiment analysis. Worth noting that the upside of using multi-source for sentiment analysis is that it may also be able to tackle the issue of inadequate labelled datasets for training a classifier, a commonly known problem in sentiment analysis when using the machine learning approach [9], [15]- [19].
Until today, review articles which focussed on the challenges faced in multi-source and multi-domain sentiment analysis has been scant. Most currently available reviews tend to include a survey on tasks, approaches, and applications of sentiment analysis, as can be seen in [20]. Thus far, only [21] had focussed on approaches in cross-domain sentiment analysis. Aiming to address this gap, this paper strives to review and examine past studies involving multi-source and multidomain sentiment analysis. Specifically, this paper focuses on the challenges faced by past researchers and their recommended solutions to circumvent the challenges. The aim for doing this review article is therefore, to equip researchers with a reference or benchmark for future development in multisource and multi-domain sentiment analysis.
The remainder of this paper is organised as follows. Section 2 gives the background to sentiment analysis, multi-source, and multi-domain sentiment analysis. Section 3 reports on the challenges and recommended solutions noted to ease the problems while Section 4 discusses the findings. Section 5 suggests some future research directions in this area. Section 6 concludes the article.

A. SENTIMENT ANALYSIS
Sentiment analysis is the process of analysing text from online reviews or social media postings, with the aim of extracting the public's opinions on various topics [5], and then classifying them into a sentiment polarity which can be positive, negative, or neutral [18], [22]. There are various techniques to do a sentiment analysis. Generally, the techniques can be categorised into the following approaches: lexiconbased, machine learning-based, and a hybrid of both [5].
The lexicon-based approach obtains the sentiment of the analysed text by examining the frequencies, and the polarities of the negative and positive words used in the text [9], [18], [22], [23]. It requires a predefined dictionary of words, such as the SentiWordNet, which is annotated with three sentiment scores -positivity, negativity, and neutrality. The text captured from the social media, and review sites will first be pre-cleaned so as to filter the data, and to extract the relevant stem features. The semantic polarity of the stem will then be calculated, and classified, based on the sentiment scores defined in the sentiment dictionary. The approach behaves well when the texts are well-formed, and are grammatically correct [4]. The lexicon-based approach bears several benefits. First, this approach does not require a labelled training set for classifying the text [24]. Second, the approach defines the lexicons independently of the data, thereby preventing any instance of overfitting. Third, it is often used to perform sentiment analysis on multiple datasets [22]. The approach also has some disadvantages. For instance, the lexicons need to be predefined otherwise, the approach is unable to adapt the lexicon to specific forms of expressions which are associated with formal language. This approach is also unable to detect non-standard abbreviations, which are commonly used in posts published on the social media platforms, such as Twitter [4].
The machine learning-based approach of sentiment analysis caters to a standard classification problem [22]. Therefore, it is more suitable for extracting sentiments from unstructured contents, and less formal texts, such as tweets from Twitter. It also eliminates pre-defined lexicons, thereby providing room for greater flexibility to be applied in any domain [22]. However, as this approach entails the construction of a text classification model to train a sentiment classifier, large labelled training dataset are necessary for the sentiment analysis to be effective [18], [19]. Unfortunately, there are always instances of insufficiently labelled data, and manual annotating is laborious, costly, and time consuming [11], [19]. Moreover, a sentiment classifier trained for a specific domain may not produce accurate result when it is directly applied to different domains [17].
The third approach is the hybrid approach which involves using the lexicon database, and machine learning together.
[7] introduced a hybrid sentiment analysis which employed a machine learning theory, and a method based on polarity lexicon for analysing Chinese sentiment phrases. Another study [24] combined the lexicon-based approach with the machine-learning technique by implementing it on Facebook. The result of its sentiment analysis was found to be highly accurate, at 83.2% precision.

B. MULTI-SOURCE SENTIMENT ANALYSIS
Among the most common data source used for sentiment analysis are online reviews and tweets [22]. In the context of this paper, multiple data sources mean data extracted from more than one sources. Among the common data sources are online product reviews, blogs, e-news, Twitter, Facebook, Instagram, and Google data. Most approaches used in sentiment analysis rely on a single data source. Few have ventured into using multiple data sources [25]. The dependency on a single data source is not healthy because it could lead to biased conclusions. The process of performing a sentiment analysis on datasets obtained from multiple data sources is known as multi-source sentiment analysis or cross-source sentiment analysis [4].
Multi-source or cross-source sentiment analysis has been commercially used for the monitoring of products and brands in social media channels [4]. In [4], a decision support system was developed so as to manage the promotion of products on multiple social media channels (Facebook, Twitter, and Instagram). The support system embodies a social media listener that contains a sentiment analysis engine which monitors multiple social channels. The engine gathers data from the social media users' postings that are related to the promotions and marketing campaigns. It then performs the sentiment analysis of the collected data so as to determine the users' opinions about the campaign.
The importance of performing sentiment analysis on multiple data sources has been emphasised in previous studies [7], [8]. For instance, [7] performed sentiment analysis on a combined dataset comprising online reviews, and tweets so as to develop an improved performance of sentiment predictions. Reference [8] conducted a sentiment analysis of multiple data sources comprising online news, Google search, and Twitter. The aim was to obtain a better event prediction of the stock market. Similarly, [27] also performed a sentiment analysis on online news, tweets from Twitter, and quantitative historical data so as to better predict the stock market. In their study, [28] noted that event detections using multiple data sources were more meaningful, and they also bore valuable outcomes. All these aforementioned studies have provided concrete evidence which proved that using multi-source or cross-source sentiment analysis provides a better prediction outcome than using just a single data source.

C. MULTI-DOMAIN SENTIMENT ANALYSIS
Sentiment analysis is a domain-dependent process [17] because different domains express sentiments differently [2], [9], [15], [18], [19]. For instance, the word 'easy' in the Electronics domain is normally positive, but in the Books domain, it carries a negative sentiment. The easiest way to address this problem is thus, to prepare enough labelled datasets for each domain for training a classifier [2]. The disadvantage of this method is that it is costly to manually label sufficient samples in every possible domain of interest [2], [19]. Moreover, a domain dependent classifier also requires a large amount of labelled datasets in order to be accurate [7], [15], [17], [22].
A better solution to resolve this issue is by considering resource-rich labelled data from other domains, and by designing a more robust sentiment classifier which can work across different domains [15], [17], [19], [22]. This scenario is called the multi-domain or cross-domain sentiment analysis [19], [21], [22]. In [22], the cross-domain sentiment analysis was described as beneficial when labelled datasets were difficult to obtain.
A multi-domain or cross-domain sentiment analysis is normally performed when there are similarities in features between the two domains [7]. In a typical domain adaptation process, the sentiment knowledge from a domain with sufficient annotated data (i.e., source domain) is transferred to a new domain with limited, or no labelled data (i.e., target domain) [15], [17]. Prior to the knowledge transfer, analysis of the similarity between the source domain, and the target domain must be performed so as to reduce the differences in the feature distributions between the domains [17].
In the following section, the findings of the review on challenges in multi-source and multi-domain sentiment analysis, followed by the recommended solutions is further elaborated.

III. CHALLENGES AND SOLUTIONS
With the knowledge one has about the public's sentiment towards a product or service, one can then make a decision on whether to purchase or not to purchase a product or service. The polarity of the sentiment can be obtained by processing, and analysing the public's sentiments gathered from online reviews, and social media. Literature [4], [7], [8], [26], [27] has shown that sentiment analysis resulting from multiple data sources tend to produce a more concrete prediction. Reference [4] highlighted that the use of multiple data sources has generated an overwhelming amount of data for analysis, hence it is quite demanding to identify the relevant opinions from the irrelevant ones. They also pointed out that data coming from various sources also display different characteristics. This is another challenge that needs to be properly addressed [4], [28]. Multi-domain sentiment analysis, on the other hand, focuses more on the challenges of adapting to the different domains [15], [17]- [19], [26], and in enabling computers to process issues related to the natural language which is embodied in those different domains as demonstrated in [9]- [15]. These challenges linked to natural language processing (NLP), as stated by Al-Moslmi et al. [21], encompass feature divergence, polarity divergence, sparsity, and polysemy. Based on this understanding, this review paper will present a classification of the basic challenges   When multiple data sources are used in sentiment analysis, the amount of data that needs to be processed, and analysed also increases. This can be an overwhelming, and difficult process [4]. Therefore, it is crucial to extract only the relevant opinions from the massive online reviews, or social media posts. This means that the extracted reviews or posts coming from multiple data sources must be syntactically bound by a common subject of interest.
The traditional way to detect a common subject of interest from a pool of information is by describing a topic as a set of keywords. These are mostly composed of verbs, or nouns. A common way to detect a topic in a tweet, for example, is via hashtags. Reference [29] used tweets with hashtags, and emoticons, and a non-parametric supervised model which consists of multiple features, such as the tweet volume feature, and sentiment variation features, to detect trending topics on Twitter. They used the MapReduce to accelerate the hashtag extractions, and their analysis. The correct interpretation of the hashtag makes it possible to link the related contents gathered from other sources, such as online product review websites, or online news articles. On the other hand, [40] evaluated the data mining techniques which were used to detect topics, or events from the input data streams of several online review websites. The techniques were divided into supervised learning, and unsupervised learning approaches. The results indicated that keyword clustering provided the best accuracy for detecting topics, when compared to other approaches.
In another study [28], a microblog theme crawler named Sina was used to find the associated microblogs. The crawler extracts events related microblogs by searching for the events' descriptions from online news. The text similarity between the microblogs and the events' descriptions is then calculated. This helps to remove the irrelevant microblogs data. Reference [8] used three different data sources to predict the public's opinions of the stock market. Their data sources were Bloomberg's online news article, Google search data, and Twitter. The researchers first determined the relevant economic topics from the news articles by modelling the topic of interest in the news by using the Latent Dirichlet Allocation (LDA) algorithm. Upon the execution of the LDA, 30 most interesting topics were identified. Each topic was then manually analysed so as to select the topics that were most relevant. These were traced to the finance and economic domains. News articles which contained any of the selected topics were then considered as relevant. A natural language processing tool, and a customised economics feature dictionary were then used to derive the sentiment of the selected news articles. Data derived from the Google search volumes were then processed with Lasso regression. The aim was to select the most informative features that represented the public's sentiment towards certain stock indices. The next step was to detect burst events on the tweeter data so as to select the features that represent any dramatic movement in society with regards to financial market concerns.

2) DIFFERENT DATA CHARACTERISTICS
Multi-source sentiment analysis contains various characteristics of the data which come from different data sources, such as online reviews, online news, and social media. For instance, Twitter messages are short and noisy, and they use casual language that often contains massive userinvented acronyms, emoji expressions, users' opinions, and sarcasms [18], [28]. Each tweet contains only 140 characters, or less when compared to other media, such as Facebook or online reviews. In contrast, online reviews are lengthy, use more formal language, and appear more authoritative [28]. This was verified by [30] who explained that data from different sources have different feature distributions, therefore, simply merging the data from these sources may not yield optimal results [30]. This was also iterated by [28] who stressed that the differences in data characteristics must be properly addressed when merging users' opinions, or comments gathered from multiple data sources.
The review of literature conducted for this paper noted that there were several ways to address this challenge. For instance, in order to obtain a warning information for an event, [28] used the Analytic Hierarchy Process (AHP) to analyse information and sentiments extracted from the news articles, and microblogs on food safety, and weigh each indicators. These indicators were used to represent, or to describe the target event, such as the food safety event. The AHP is a technique that is used for structuring related information so as to analyse the complex problems, prior to making decisions [31]. The weighted sum of these indicators would imply the event's level of importance and its sentiment. If the event was identified as very important, and its sentiment value was negative, then an alert on the possible issue on food safety would be delivered to the target users.
In [8], a data integration and prediction model, based on Delta Naïve Bayes, was proposed for predicting financial market. They combined features from Google search, Bloomberg news, and Twitter together. The Google search volumes were used to obtain the public's sentiment towards a particular stock index, and the futures of the financial market. The online financial news articles were used to determine the public's opinions on the stock market indices in various countries. Analysis was performed through natural language processing, using a customized economic feature dictionary. Following this, recent dramatic societal movements of the stock market have been determined through the analysis of Twitter bursts, and contents. Each feature was assumed independent of the other, and that each feature must be processed separately, and normalized, before each can be combined to determine the prediction. In another study that also aimed to predict stock markets, [27] proposed the Multi-source Multiple Instance framework as a means to integrate the historical trading data, investors' sentiment, with the economic news events that were extracted from three separate sources. The framework extended the use of the Multiple Instance Learning paradigm, based on the supervised learning made on groups of instances.
A simpler approach to process and merge data from multiple sources was conducted by [4], and [7]. Reference [4] built a uniform data structure to represent the users' posts, and comments which were extracted from multiple social media networks. The data structure was made up of a string of texts, and a set of features that represent the users, and their interactions with the posted comments. Data collected through the uniform data structure were then pre-processed, according to the NLP procedure. Following this, [7] created a training dataset of instances which were obtained from a combination of tweets (the sentiment140 tweet corpus), and online product reviews (i.e. Amazon reviews datasets). A classifier was then trained on the combined datasets.

B. MULTI-DOMAIN SENTIMENT ANALYSIS 1) DOMAIN ADAPTATION
Domain adaptation is the process of training a sentiment classifier for one domain by using information taken from another similar domain [11]. The main purpose for doing this is to provide sufficient datasets for the trained sentiment classifier to deliver accurate analysis. Even though domains involved in the adaptation process shared some similar features, discrepancy among them may still occur due to the variety of expressions used to convey opinions. Therefore, the process of adapting sentiment knowledge from two different domains is a non-trivial and challenging task. Failure to correctly adapt the knowledge would result in a negative transfer [19]. In that regard, domain adaptation is presented as one of the challenges in multi-domain sentiment analysis. As a result of this, our literature review can focus on the existing solutions used in adapting the sentiment knowledge obtained from different domains. To aid understanding of the existing solution for domain adaptations, we thus categorised them based on the techniques that they used.

a: TRANSFER LEARNING
In the process of domain adaptation, there must exist at least one source domain, and one target domain [9], [15], [17]. The source domain is assumed to contain sufficient amounts of labelled datasets containing the sentiment knowledge as compared to the target domain. With this assumption, a transfer learning model can be developed. This outlines a way for the sentiment knowledge to be transferred from the source domain to the target domain. Therefore, a domain adaptation method which is based on the transfer learning approach must seek to identify features which are shared between the domains. This helps to bridge the source domain with the target domain.
Among the earliest domain adaptation methods used is structural correspondence learning (SCL) by [15]. The SCL uses pivot features on unlabelled datasets from both domains to represent words that occur frequently, and behaved similarly in the two domains. By modelling their correlations with the non-pivot features from both domains, the correspondence features among the pivot and non-pivot features can be deduced for training a classifier. For their experiment, Blitzer et al. [15] produced a benchmark dataset derived from online product reviews extracted from Amazon in the following domains: Books, DVDs, Electronics, and Kitchen appliances. Their result showed that the SCL outperformed other methods which were based on supervised and semi-supervised learning. Another notable work in this area was that of [16] who proposed spectral feature alignment (SFA) algorithms. The SFA aims to reduce the discrepancy between a source domain and a target domain by aligning the domain-specific and domain-independent features of both domains. The co-occurrence between these features was then modelled through a bipartite graph so as to produce clusters of connected features, for training a classifier. The proposed SFA outperformed other baseline methods during the experiment involving real-world datasets: the datasets by [15], the Amazon datasets on video games, electronics and software, Yelp dataset (hotel reviews), and Citysearch dataset (hotel reviews).
In [12], an algorithm called topical correspondence transfer (TCT) was proposed. This approach worked on two fundamental assumptions: a) some shared topics exist between the source domain and the target domain, and b) every domain must have domain-specific topics. Therefore, each review document must be represented by one matrix of shared topics, and another matrix of domain-specific topics. This makes it easier to cross reference the matrices when matching similar sentiment features from the source domain and the target domain. The TCT is different from the SCL, and the SFA in that it utilizes the correspondence between shared topics, and domain-specific topics for both domains. Reference [12] tested their method on the Amazon reviews datasets which were produced by [15]. They found that their method outperformed other similar methods used in cross domain sentiment analysis.
Most of the earlier methods used in domain adaptations mainly transferred sentiment knowledge from one source domain to a target domain [9], [17]. Even though the methods were proven successful, their adaptation performance tended to decline when there was a significant difference in distribution of features between both domains [9], [17]. In order to reduce this problem, multiple source domains were used in multi-domain sentiment analysis. For instance, [9] used labelled datasets from several source domains, and unlabelled datasets from both the source domain and the target domain, to create a sentiment sensitive thesaurus. The thesaurus contains different words which express the same sentiment for different domains. To reduce the differences in the feature distributions between both the source and target domains, related words from the thesaurus were used as additional features to represent both domains. The proposed method yielded an average accuracy of 80.9% when experimented on the datasets produced by [15]. It outperformed both the SCL, and the SFA, in many application domains.

b: MULTI-TASK LEARNING
Most multi-domain adaptation methods tend to build a transfer learning model so as to adapt the polarity of the terms' sentiments derived from the different domains. Even though the methods were able to adapt the relevant sentiment features between different domains, the transfer learning approach imposes the necessity to build a new transfer model, each time a new domain needs to be analysed. This limits its generalization's capability [14], [36].
Therefore, instead of using the transfer learning approach to adapt a source domain to a target domain, [33], [17] applied the multi-task learning approach so that both the general-and domain-specific sentiment knowledge from several source domains, can be learned simultaneously, without the need for a transfer learning model. They thus proposed a classifier which comprised a general, and a domain-specific classifier. The former captures common sentiment knowledge in the different domains while the latter captures the domain-specific sentiment knowledge of every domain. A domain similarity graph was then used by the domain specific classifier to determine the relatedness among the domains. The graph was built based on the terms' distribution (similarity in textual content), and the sentiment words' distribution (similarity in sentiment words) between each domain. The result of the case study in [33], [17] denoted that the proposed classifier can accurately capture global sentiment words as well as maintain the consistency of their sentiment polarities in different domains. The experimental result of the Amazon datasets [15], and Sanders' Twitter Sentiment datasets showed that approach by [33], [17] had significantly outperformed other baseline multi-task learning method.
In [19] a domain attention model was proposed, based on a multi-task learning model. The domain attention model uses neural network to simultaneously compute individual parts of a sentence so as to produce an output that can determine the most discriminative feature in a review text. The model has a domain module and a sentiment module. The domain module predicts which domain a word belongs to. As it works on multiple domains, it can also identify the common features, and the domain-specific features for each domain. The sentiment relatedness of the features between the different domains was then determined by the sentiment module. Their experimental results derived from each of the Amazon datasets produced by [15] showed that their approach had outperformed all the other baseline methods.

c: WORD EMBEDDINGS
Another attempt to address the limitation of using a transfer learning approach is traced to the works of [14]. Instead of using the transfer learning model, they proposed a neural word embeddings approach which exploits information overlaps between domains, and words were mapped to vectors. A deep learning approach was then used to classify the word vectors into two outputs. The first output represents the overlap degree between the word vector sequence of a review document, and the domain itself. The second output represents the polarity value of a review document with regards to each domain. The proposed approach was then implemented in the NeuroSent. The tool used the Dranziera datasets of online reviews obtained from twenty different domains, known as in-model domains (IMD), for training and testing the classifier. Another set of online datasets were derived from seven different domains called the out-model domains (OMD). These were mainly deployed for testing. The experimental evaluation of the NeuroSent on both the IMD and OMD datasets showed that it outperformed most of the baseline methods, including the Support Vector Machine, the Naive Bayes, and Maximum Entropy.
Further, [36] used fuzzy logic to address any uncertainties when assigning polarity values of concepts belonging to different domains. Reference [34], likewise, addressed the domain adaptation challenge by using word embeddings. They implemented canonical correlation analysis (CCA) so as to establish the correlations between the domainindependent words with the domain-dependent words. These were derived from the source domain, and the target domain. A learning method was then applied on the combination of word embedding features, the CCA features, and the raw features so as to produce a classification model. Their proposed classification model yielded an average accuracy of 77.8% when experimented on the datasets produced by [15].

d: SENTIMENT LEXICONS
Other interesting efforts which employed the lexicon-based approach to address the challenge of adapting different domains came from [35], [23], and [2]. Reference [35] supported domain adaptation by creating the ontological lexicon to achieve the contextualized, cross-domain lexicons. The datasets used were from online product reviews obtained from Amazon, hotel reviews obtained from Tri-pAdvisor.com, and movie reviews obtained from the Internet Movie Database. Their results showed that a contextualized lexicon that is trained on three different domains yielded better results than that trained on a single domain [35]. Their result also verified that contextual lexicons enhanced the performance of lexicon-based sentiment analysis.
Additionally, the combination of the contextualized lexicon, and the terms' knowledge from the WordNet, was also able to classify ambiguous terms in online reviews into positive or negative polarities. In another study, [23] adapted the differences among the different domains by integrating several sentiment dictionaries, such as WordNet, SentiWord-Net, SentiNet, SentiSense and Opinion Lexicon. They also proposed an algorithm to remove and/or shift the polarity of a sentiment word. The datasets used for the experiment were online product reviews from Amazon.com, including the Smartphones, Movies, and Books domains. The experimental result denoted an accuracy of 82.6%, 80.1%, and 81.8% respectively, for Smartphones, Movies, and books. In another study, [2] addressed domain differences on online product reviews in the Hindi language by creating a sentiment aware dictionary for the language. The datasets by [15] were also used in their experiment. Their results showed that the dictionary can accurately classify the unlabelled and unseen reviews into positive and negative polarities. In the works of [18], the general-purpose sentiment lexicons was proposed for domain adaptation. The justification for using the generalpurpose sentiment lexicon was to reduce the dependency on the labelled datasets taken from multiple source domains.
The general-purpose sentiment lexicon also has a better generalization capability when compared to the sentiment classifier trained in a source domain [18]. This general-purpose sentiment lexicon was developed based on the multi-level contextual sentiment relations that came from unlabelled datasets of a target domain. The sentiment knowledge, and polarity relations were derived from the target-, phrase-, sentence-, and document-levels. The extracted sentiment knowledge, and relations were then used to train the classifier. The Amazon datasets produced by [15] were likewise, used in their experiment. Their results depicted an improvement in the sentiment domain adaptation performance as compared to other baseline methods.
Similarly, [26] proposed a domain adaptation method that combined the sentiment knowledge obtained from the general-purpose sentiment lexicon, the labelled data of the target domain, the sentiment knowledge of several source domains, and the domain-specific sentiment relations. This sentiment knowledge was able to produce an improved domain adaptation result when evaluated using the [15].

2) SPECIFIC NATURAL LANGUAGE PROCESSING CHALLENGE
In performing sentiment analysis across multiple domains, challenges related to specific natural language processing (NLP) such as feature divergence, polarity divergence, sparsity, and polysemy, can influence the analysis [21]. Some of these challenges may also be encountered in multi-source sentiment analysis. The following sub-sections present the approaches observed in the literature review which specifically address each of the challenges noted.

a: FEATURE DIVERGENCE
Feature divergence refers to the mismatch in sentiment polarity values between the domain-specific features of different domains [21]. In the Electronics domain, for example, words such as ''compact'' and ''sharp'' were used to signify positive sentiments while the word, ''blurry'' signifies a negative sentiment. However, in the Video Games domain, words like ''hooked'', and ''realistic'' were used to signify positive sentiments whereas ''boring'' was used to signify a negative sentiment [16]. When a comparison of these domain-specific features was made, there was no similarity between both domains, hence the data distribution between both domains was very different [11]. This means that words which appeared in the reviews of the target domain do not always appear in the trained model [9], thereby resulting in the poor performance of the sentiment classifier when it was applied to a domain that was unlike the one it was trained in. VOLUME 7, 2019 Therefore, most research looking at cross-domain sentiment analysis tended to focus on addressing problems which were associated with feature divergence [9], [11], [15], [16]. Reference [9] tackled this problem by building a sentiment sensitive thesaurus that can identify the connections among the words in different domains. Another approach used new feature representations where the independent features were regarded as a bridge between the source domain, and the target domain. This has been noted in the work by [11], [15], and [16].

b: POLARITY DIVERGENCE
Polarity divergence represents a situation in which some features are positive in one domain, but are negative in another domain [11]. As an example, ''easy'' was a common word used in the Electronics domain to signify a positive sentiment, ''this smartphone is easy to use.'' However, ''easy'' was frequently used in the Movies domain to represent a negative sentiment, ''the ending of this movie is easy to guess'' [11]. It is therefore, a challenge to infer the polarity of texts belonging to domains which are different from those used for building the classification model [36].
To address polarity divergence, [10] proposed an ensemble model which combines sentiment analysis outputs from various algorithms, such as the combination of hybrid machinelearning classification approaches with the lexicon-based approach, in a weighted scheme, so as to classify tweets. The proposed ensemble model can achieve an average increase of 10.22% of accuracy over the main baseline model. Reference [11] then proposed an algorithm that transfers the polarity of features from a source domain to a target domain by using the independent features as a bridge. Independent features are features that are predictive in both domains even though they may have different polarities. Reference [13] addressed polarity divergence by considering the polarity strengths of the features rather than their absolute positivity or negativity. Their method consisted of two parts. First, they combined the score of the features taken from the sentiment lexicons. Then they used this score data to predict the domain-specific lexicon score. Similarly, [7] also considered the polarity strengths of the features. However, the strength was measured through the probability value instead.
On the other hand, [36] tackled polarity divergence with an approach that computed the polarity of texts which belonged to the domains that were different from the one used to train the classifier. The approach first captured the linguistic overlaps that occurred between the domains by using word embeddings, and deep learning. It then resolved any uncertainties in the sentiment polarities between different domains through the fuzzy logic.

c: SPARSITY
Sparsity refers to a situation where the target domain contains words or phrases that do not, or rarely appear in the source domain [21]. Sparsity can reduce the performance of the domain dependent sentiment classifier [9].
One way to overcome sparsity is to use a feature expansion method, and sensitive thesaurus, whereby a feature vector is augmented with additional related features taken from a sentiment-sensitive thesaurus [9]. In [37], the effect of data sparsity in the Hindi language was minimised by using bilingual word embeddings, and the deep learning approach. By applying the bilingual word embeddings on the English-Hindi and English-French language pairs, the language barrier between a resource-rich, and a resource-poor language is bridged in the shared vector space. Any hidden feature between the two languages was then learned through the deep learning approach. Their experimental results on datasets on Restaurant and Laptop domains, shows that their approach outperforms other approaches for multi-linguality and cross-linguality sentiment analysis. Noteworthy that the work by [37] is an example of multi-linguality and crosslinguality in sentiment analysis.

d: POLYSEMY
Polysemy is the coexistence of many possible meanings for a word, both in the source and the target domains due to the context of the respective domains [21]. For instance, the word, ''lie'' means put oneself in a resting position in the Exercise domain whereas ''lie'' means making an untrue statement in the Politics domain. To the best of our knowledge, not many studies have addressed the issue of polysemy in cross-domain sentiment analysis. The only work we detected was performed by [38], who attempted to tackle polysemy by using the meta-learning approach which combines different classical classifiers with knowledgebased classifiers. The knowledge-based classifiers were word sense disambiguation (WSD) based classifier, and vocabulary expansion-based classifier. The WSD-based classifiers were trained by using knowledge graphs which were built when implementing the WSD on the BabelNet, a multilingual semantic network [39]. This network classified a review document into sets of disambiguated words (nouns, adjectives, verbs and adverbs). These disambiguated words were then used by the vocabulary expansion-based classifier to identify the semantically-related concepts, so as to expand the vocabulary. It also used the BabelNet knowledge graph to facilitate the process.

IV. DISCUSSION
There is a growing interest among researchers to embark on multi-source or multi-domain sentiment analysis for an improved conclusion, and sentiment outcome. In this study, we reviewed and analysed 26 studies which were related to multi-source and multi-domain sentiment analysis. A quick identification of these studies is projected in Table 1 while  Table 3 and Table 4 provide a summary of the reviewed studies, highlighting their approaches in recommending solutions for the identified challenges. To aid understanding of the recommended solutions, the approaches were further categorised according to the techniques used. These findings are further elaborated so as to feature the contributions of the reviews.

A. PROGRESS IN MULTI-SOURCE AND MULTI-DOMAIN SENTIMENT ANALYSIS
We had reviewed 26 recent studies related to multi-source and multi-domain sentiment analysis, commencing year 2013 onwards until the present, with the exception of notable studies in cross domains sentiment analysis, such as [14], [10] and [9]. Table 1 shows that there is more research work done in the multi-domain sentiment analysis (20 studies) as compared to the multi-source sentiment analysis (6 studies). The results also showed that there were consistent research efforts made in the multi-domain sentiment analysis throughout the years. However, studies in sentiment analysis which used multiple data sources had only started to grow in recent years. This may be attributed to the widespread use of social media among the public.
An in-depth look into those studies (refer to Tables 3 and 4) showed that most of the data source were traced to online reviews and Tweets. Noteworthy from the 26 studies that we reviewed was the fact that only one study by [4] had included Facebook, and Instagram posts for sentiment analysis. Majority of the studies in multi-domain sentiment analysis had used the Amazon product reviews datasets produced by [14]. The recent progress in multi-source sentiment analysis shows a growing trend among researchers in using other sources of datasets, such as economics online news, Google search volume data, and historical quantitative data.
As the datasets of [14] had been commonly used in most studies, we can thus infer that the most analysed domains used for evaluating public sentiments were the Books, DVDs, Electronics, and Kitchen appliances domains. During the domain adaptation process, multi-domain sentiment analysis which used a single data source (e.g. Amazon online product reviews) can use either one domain (e.g. Book), or more than one domain (e.g. Book and Movies) as its source domain. Initial works in the multi-domain sentiment analysis, such as [14], and [10] had used a single domain as the source domain. However, from 2013 onwards, it was observed that majority of the researchers in this area were using multiple domains as their source domain. This effort was initiated by [9].   A further analysis of the reviews, as shown in Table 3, provided insights into the evolution of the research done in sentiment analysis, i.e., from just focussing on finding the best technique to producing accurate sentiment result, to focusing on finding the best approach to implement multi-source sentiment analysis for predicting future events. This trend was also noted in previous works [8] and [27] which had exploited sentiment analysis for predicting future trends in financial stock markets. Additionally, in [28], the contribution of sentiment knowledge was derived from  the public. This was used to monitor issues related to food safety, in which alert messages will be automatically disseminated to the public, in case of any violations. Current progress in sentiment analysis is believed to be imperative for a more impactful outcome. Obviously, the progress may not be possible if the process does not take into account datasets from various sources.

B. CHALLENGES AND RECOMMENDED SOLUTIONS IN MULTI-SOURCE SENTIMENT ANALYSIS
The use of multiple data sources allows for a thorough coverage of opinions on a particular topic, or product, hence providing a comprehensive and inclusive conclusion. The inclusion of multiple data sources, however, also poses several challenges to the process. First, with the overwhelming amount of data, and opinions coming from numerous sources [4], a situation termed as information overload can occur. When this happens, the analysis process is slowed down, thereby producing a delayed or inaccurate sentiment outcomes. Therefore, the right filtration, and selection mechanisms is necessary for filtering the relevant data, or opinion from the irrelevant ones, and to select the most relevant ones for further analysis. Most of the studies that we reviewed had adopted the two-layer mechanism. The first layer was to determine the topic, product, or event of interest by using the topic detection, or the event detection methods. The second layer was to fetch data related to the identified topic, product, or event of interest from all the data source involved, and then to classify the relevant opinions, based on the identified topics, products or events. For instance, [28] first identified the events from the online news, then based on the identified events, they built a microblog theme crawler to find the associated microblogs. Reference [8] also initially used a topic modelling algorithm to identify the interesting topics about the stock market from the economic news articles. Subsequently, for each topic identified, they used the burst detection, and burst event grouping algorithms to classify the related tweet messages into useful patterns. These were then used to represent the public's social movements and sentiments of the stock market.
The second challenge that has to be addressed by the multisource sentiment analysis is the presence of various characteristics of the data in the data sources. For instance, tweet messages are shorter than online reviews, and the language used is less formal. Our review noted that the divergence between these data had not been thoroughly studied, or discussed. Data can also be in a form of Google search volumes, or bursts of tweets over a duration of time [8], or in the form of news article, or quantitative data such as historical economic data [27]. With these differences, it is impossible to simply merge them for analysis. Our review also found that almost all the studies processed, and analysed data from the different sources independently, and the resulted outputs were then combined in a data integrated model for further analysis so as to determine the final conclusion, or prediction [8], [27], [28]. Reference [4] addressed the differences of the data characteristics by creating a uniform data structure to represent the data which were obtained from different sources. The uniform data structure was then created to represent data obtained from Twitter, Facebook and Instagram. Unfortunately, the work did not further describe the design of the uniform data structure.

C. CHALLENGES AND RECOMMENDED SOLUTIONS IN MULTI-DOMAIN SENTIMENT ANALYSIS
Generally, the recommended solutions for adapting the different domains found in our review could be divided into four categories, based on their techniques. They are: transfer learning, multi-task learning, word embeddings, and sentiment lexicon. The transfer learning approach had been widely used for domain adaptations [9], [14], [17]. The SCL [15] and the SFA [10] were among the earliest and notable solution used for domain adaptations. These adaptations were based on the transfer learning approach. Both adaptations demonstrated the transferring of sentiment knowledge from a source domain to a target domain by using a new feature representation termed as a bridge. Subsequently, [9], and [17] recommended the use of the multiple source domain to improve the adaptation performance of the sentiment classifier, across different domains. Despite the success of the transfer learning approach in domain adaptations, the approach requires a new transfer learning model to be developed each time a need to analyse a new domain arises. To tackle this limitation, several alternative solutions had been proposed, including those that were based on multi-task learning [17], [19], [33], word embeddings [14], [34], [36], and sentiment lexicon [9], [18], [22], [23]. The advantages and disadvantages of these approaches are summarised in Table 2.
In the context of sentiment analysis, the vocabularies of each domain would contain words that can be categorised into domain-specific words, or domain independent words. The domain-specific words are predictive in one domain, but not in another domain, as exemplified by the word, 'read' in the Books domain and 'play' in the Video Games domain. This situation would lead to a feature divergence problem when predictive features in a target domain cannot match the classifier built by the predictive features of the trained domain.
On the other hand, the domain independent words were predictive features in both the source and target domains. They can either represent the same polarity or the different polarity in both domains. For instance, 'good' is used to represent the positive polarity in many domains including the Books and Electronics domains whereas 'easy' signifies a positive polarity in the Books domain, but a negative polarity in the Movies domain. This condition is known as polarity divergence. Besides feature divergence, and polarity divergence, other components of the NLP challenge which can influence multi-domain sentiment analysis are sparsity and polysemy. The effectiveness of a sentiment classifier often depends on its ability to address these components of the NLP challenge [21]. Our review showed that among the components of the NLP challenge, polysemy and sparsity, were two areas least explored. Our review also showed that most of the methods used to address feature divergence, and polarity divergence were based on transfer learning, or a sentiment sensitive thesaurus.

V. FUTURE DIRECTION
Our review has shown that there is a growing need to implement sentiment analysis by using data from multiple data sources. Additionally, there has also been a continuous effort to improve the accuracy of sentiment analysis across multiple domains. Nonetheless, a comparative empirical study on multi-source sentiment analysis with or without multidomain is yet to be implemented. The availability of such a study would pave the way for a better evaluation of the following: 1) The advantages and disadvantages of implementing multi-source sentiment analysis with multi-domain, 2) The advantages and disadvantages of implementing multi-source sentiment analysis without multi-domain, 3) The significant impact of multi-domain on the accuracy and efficiency of sentiment analysis when implementing (i) and (ii), and the reasons. It appears that even though benchmark datasets like the Amazon product reviews produced by [15] datasets were used, the evaluation procedures were still proprietary. Based on this, it is difficult to perform benchmarking among the recommended approaches for multi-source and multi-domain sentiment analysis. Moreover, there is no work done on the unified performance evaluation model yet, in order to benchmark the accuracy of the recommended solutions, even though it has been suggested by [21]. In this regard, it is important to perform a comparative empirical study on the existing multisource data integration models so as to understand the factors that influenced their designs, and performances.
Also worthy of mention is that even though research works on multi-domain sentiment analysis had been done since a decade ago, the datasets used were still confined to online reviews, particularly the Amazon product reviews. Therefore, research in multi-domain sentiment analysis should consider using reviews from multiple data sources, such as online news, blogs, and social media channels. These can serve as the alternative approach for addressing the problem of insufficient datasets for training a sentiment classifier. Finally, more research work should be carried out so as to address the polysemy and scarcity issues noted in multi-domain sentiment analysis. This could provide a more accurate and humanized outcome for sentiment analysis.

VI. CONCLUSION
Sentiment analysis which relies on single data source is prone to biased conclusions, and inaccurate prediction outcomes. Realising this, recent studies in sentiment analysis has slowly transitioned from the use of one data source to multiple data sources. In this paper, we have reviewed, and analysed the challenges associated with this new progress. For each of the challenges found in both the multi-source and multidomain sentiment analysis, we also discussed the approaches taken by the researchers in their attempts to provide solutions to the challenges, which encompass the types of data sources and datasets used. This review has classified the challenges found in multi-source and multi-domain sentiment analysis into four main categories: opinion overload, different data characteristics, domain adaptations, and the NLP challenge. For ease of understanding, the recommended solutions derived from the literature for each of the challenges were then classified according to the techniques used. We believe that the findings derived from our review of 26 studies can serve as a useful guide for future research to facilitate further progress, and advancement in multi-source and multi-domain sentiment analysis.
NOR ANIZA ABDULLAH received the master's degree in interactive multimedia from Westminster University, London, and the Ph.D. degree in computer science from Southampton University, U.K. She is currently an Associate Professor with the Faculty of Computer Science and Information Technology, University of Malaya, Malaysia. She has published articles in both international and local journals, book chapters, and conferences. Her research interests include in personalized information retrieval, recommender systems, adaptive learning, sentiment analysis, and multimedia content-based retrieval and big data. She serves as a Reviewer for several ISI-indexed journals. He is currently an Associate Professor with the Faculty of Computer Science and Information Technology, University of Malaya. He has published a number of conference and journal papers locally and internationally. His research interests include information security (i.e., intrusion detection systems), data sciences, artificial intelligence, and library information systems. VOLUME 7, 2019