CD-SemMF: Cross-Domain Semantic Relatedness Based Matrix Factorization Model Enabled with Linked Open Data for User Cold Start Issue

Personalized recommendations to cold start user is one of the significant challenges in information filtering systems. Most of the existing systems inherited the idea of collaborative filtering (CF) and avoids the item metadata. For instance, consider a book domain whose metadata are author, publisher, language...etc. The elimination of item metadata is due to the diverse nature of item attributes in a cross-domain environment. Because of this ideology, it is hard for the filtering systems to provide better recommendations to cold start users. Cold start users are people who are new to a domain whose preferences are unknown. A novel model is proposed in this research work called Cross-Domain Semantic Relatedness based Matrix Factorization model (CD-SemMF) resolves the user cold start issue in collaborative filtering recommender system by exploiting Linked Open Data (LOD). “DBpedia” is a widely used LOD resource that contains semantic information about different domains, which is used in this research work to resolve the above-said problem. Here, the metadata available in LOD connects the items preferred by the target user from various domains. The fundamental knowledge graph links items from various domains and also benefits from cross-domain information. “LOD Semantic-Relatedness Measure” a new measure is proposed which calculates the closeness of items across domains. Semantic relatedness is measured instead of similarity in this research because the cross-domain item attributes are diverse. The Alternating Least Square method is applied here to learn the user preferences. The proposed model provides relevant, personalized recommendations for the target new user with the user preferences gained from the source domain and by exploiting item semantic relatedness. Experimental evaluation is done on Facebook and Amazon datasets. It is observed from the result that the proposed CD-SemMF model gives better recommendations in target domain for new users than the baseline methods.


I. INTRODUCTION
Providing personalized items to the target user is one of the key challenges in the recommender system. With the emergence of the Cross-Domain Recommender System (CDRS), the major challenge in information filtering is reduced especially in user cold start problem [1]. In an aim to alleviate the data sparseness, cross-domain recommendation exploits information like item attributes and user preferences from distinct but related target domains. Almost all the existing cross-domain recommendation models are working based on the collaborative filtering (CF) algorithm [2]. CF approach exploits user preferences to connect target and source domains but ignores item contents. Consequently, they benefit from the fact that there is no necessity to analyze the contents of an item that are highly heterogeneous across domains and their inter-domain relationships are also hard to create. Semantic Web initiative [3] helps to address the heterogeneity problem. In specific, its reference implementation Linked Open Data [4] creates a huge number of publicly available inter-linked knowledge repositories on the Web, followed by the Semantic Web standards for data access and data representation. Presently, the web contains a broad array of structured data with item contents of different domains like science, history, media, arts, sports, industry, and so on. It not only contains the multimedia content and its related metadata, but also the semantic relationship existing between items and their metadata. Due to the accessibility of item metadata and its semantic relationship from the LOD cloud, the major challenges of cross-domain recommendation are solved which are addressed here. This is achieved not only by focusing on item contents and user preferences but also by exploiting the content-based relationship between items of various domains.
The proposed model CD-SemMF uses the features of the LOD semantic web and its relationship as inter-domain links to transfer knowledge among different domains. It also enables cross-domain semantic relatedness between items which helps to generate relevant recommendations for new users in the target domain. Existing works like graph-based algorithm [5,6] addresses the recommendation challenge in heterogeneous datasets. They analyze the topology of the semantic network and jointly exploit both the item metadata and user preferences for effective recommendations. Unfortunately, they suffer from high computations due to increased semantic network size. But in the proposed model, the above-mentioned issue is resolved by introducing two novel steps. (1) Firstly, inter-domain semantic relatedness is computed from the semantic network which links items from various domains. (2) Secondly, the computed relatedness is used in the matrix factorization method to generate relevant recommendations. Hence, it is not necessary to deal with the complete semantic network for recommendation generation. Initially, each item in the dataset is mapped into entities of LOD to obtain the semantic metadata of various items in terms of SPARQL queries. Finally, attributes and their relations are extracted to improve the item profiles. A novel semantic relatedness metric is proposed called LOD semantic relatedness metric, which finds the inter-domain item relatedness. It is observed that the proposed model will be used effectively when the knowledge graph of fundamental LOD encodes direct/indirect connections existing between items of different domains. Relatively, the connections are general for knowledge domains like music, movies, and books with some amount of information overlap. Therefore, the key contribution of this research work is the development of LOD semantic relatedness metric with a novel and effective matrix factorization model "CD-SemMF" which exploits both item's metadata and user preferences for crossdomain recommendations in a cold start scenario. Further, Pilászy's fast learning algorithm [7] is adopted in this work to build the model more effectively. The proposed model is evaluated on Facebook and Amazon datasets. Based on the target and source domain involved in the recommendation process, it is demonstrated that the proposed model generates more accurate recommendations than the baseline approaches in severe cold start scenarios.
Section II discusses the related work concerning cold start issues and cross domain recommender systems. Section III elaborates the working of the proposed Cross-Domain Semantic Relatedness Based Matrix Factorization Model for Cold Start User. Section IV illustrates the experimental evaluation of methods in the cold start scenario. Finally, Section V summarizes the contributions made in this research work and its future scope.

II. RELATED WORKS
"Cross-Domain" has recently become a possible solution to the problem of cold start users and data sparsity in the "recommender system". In diverse research fields, crossdomain recommendation issues have been discussed from different perspectives. The system is adjusted in a userdefined simulation as a feasible solution to reduce cold start problems [8][9][10] and bottlenecks in the recommendation system [11], as well as the practical application of knowledge transfer in machine learning [12]. Fernandez [1] focuses on how to use knowledge in cross-domain recommendation systems, and classifies the existing work as two-levels: one is a knowledge base, and the other is a knowledge transfer relationship, which is discussed below.

A. KNOWLEDGE AGGREGATION FOR CROSS-DOMAIN RECOMMENDER SYSTEM
Information from various domains are combined to provide a better recommendation for a new user in the target domain.
Depending on the stage in the recommendation process where the aggregation is performed, it was distinguished into three situations. In the former case, these methods combine settings such as mark level, transaction log, and click data. The scoring matrix of multiple domains can be used to perform aggregation [13]. Generic representations are used for user preferences such as social tags or semantic concepts [14], which are linked to multiple domains through graphs [15]. User preferences are features that are not domainrelated, such as personality features or assigned user elements [16]. In another scenario, user modeling information is transferred from numerous recommender systems to improve target recommendations. For example, several researchers import the user-defined neighborhood and user similarity calculated in the original area to the target domain [17]. Finally, some methods combine recommendations for various fields, by evaluating estimates and ranking probability distributions [18].

B. KNOWLEDGE TRANSFER FOR CROSS-DOMAIN RECOMMENDER SYSTEM
Establishing communication or knowledge transfer among domains supports recommendations. In this scenario, various technologies have been identified that use common sense to link domains, such as (1) element attributes, association rules, semantic networks, and cross-domain relevance [19].
(2) The method of sharing hidden attributes between the factor models of the source domain and the target domain can be achieved by using the same model parameters in the two factorizations [20][21][22] or by introducing new parameters of the extended factorization [23][24][25]. (3) A method of transmitting the evaluation template extracted by grouping the evaluation matrices of the source domain and applying it to the target domain. Once the problem is defined, Pan [26] proposed three unique collaborative recommendation information transfer strategies with additional data (TL-CRAD): (1) adaptive, (2) collective, and (3) integrated knowledge transfer. For this work, the authors studied papers related to different knowledge strategies, focusing on dissemination through predictive rules, regularization, and restrictions.

C. USER COLD START ISSUE IN RECOMMENDER SYSTEM
Most of the existing models focus on improving accuracy by reducing data dispersion [27,28]. In many areas, the average number of comments per user and per article is smaller, which may negatively affect the quality of recommendations. Data collected outside the target area can increase the density of classification which improves recommendations quality. Researchers seek to improve the user models that may have personalization-oriented advantages, such as (1) recognizing new user preferences for target domains [29], (2) improving the similarity between users and elements [30], and (3) Measuring vulnerabilities in social networks [31,32]. By importing settings from different sources outside the target domain, cross-domain techniques have also been applied to guide recommendation systems, and it has been suggested to improve the types of recommendations by providing a better range of user settings. Liu [33] proposed a neighborhoodbased algorithm to resolve the user cold start issue but for a huge dataset, it struggles to give a good recommendation. A cluster-level algorithm was proposed which maintains highquality relationships in cold start scenarios [34]. First, the author used skew matrix decomposition to map the evaluation matrix into a smaller hidden space. After this process, the k-means grouping algorithm is applied to classify users and items. A new way of customization between systems is designed which is based on two assumptions: there is a custom model that can be shared between platforms, and a particular system can support the custom model created by its system [35].
The recommendation model is based on matrix factorization enabled with latent semantic analysis (LSA), which is widely used in natural language processing and information retrieval. LSA attempts to derive implicit concepts automatically in text documents by using truncated singular values to approximate the term-document matrix. The first recommended matrix factorization method takes the same idea and applies it to the custom element matrix in the rating prediction problem [36]. In contrast to LSA, singular value decomposition (SVD) is poorly defined for sparse matrices commonly found in recommendation systems. Therefore, previous methods rely on classification techniques to fill in the missing matrix elements before applying SVD. The subsequent methods aim to consider only the observed levels and not the entire matrix because it leads to poor results. The model proposed by Funk is a popular method in this direction [37]. Hu [38] proposed an alternating least squares (ALS) method, which can effectively deal with missing values. An important observation of ALS is that when a set of parameters is set and other parameters are estimated, the optimization problem becomes convex and can be solved analytically using traditional least-square estimation. Hu introduced a MF model called iMF, which is expanded and improved in subsequent work. Takács [39] extends ALS to the rank-based matrix factorization method, which learns to predict the relative order of items, not individual levels. Paquet [40] proposed a graph-based Bayesian model, which can record the meaning of missing values so that the users who don't like articles can be distinguished from users who don't know the article. Using ALS and gradient descent methods, the decomposition model can be easily trained to reduce prediction errors. Once the problem is formulated as the main goal of the recommendation, they can be trained using the "Bayesian personalized ranking criterion" (BPR) [41].
Kabbur [42] proposed the FISM model, which is used to study the similarity matrix of the object as the product of two low-dimensional latent factor matrices. To solve the problem of the first N suggestions, the author uses the Bayesian ranking heuristic ranking loss, which optimizes the area under the curve. Many models represented so far can't provide a better recommendation to a new user. This is due to the ignorance of item metadata in the matrix factorization process and the usage of similarity measures in a heterogeneous environment.

III. CROSS-DOMAIN SEMANTIC RELATEDNESS BASED MATRIX FACTORIZATION MODEL (CD-SEMMF)
Conventional methodologies used graph-based techniques to exploit semantic metadata of items, but the proposed model initially computes the inter-domain content-based semantic relatedness between items. With this relatedness data, the proposed model regularizes the parameters of the target and the source domains through matrix factorization. The proposed semantic relatedness-based matrix factorization model creates different assumptions about the target and source domains item latent factors' relationship. Concurrently, it exploits both the item metadata and user preferences. Furthermore, Pilászy's fast Alternating Least Square (ALS) training algorithm was adapted in MF to tackle the increased computational complexity of the proposed model, because it not only learns the user preferences for recommendations but also exploits source and target domains item metadata to connect various domains. Items of various domains are probably having varied attributes, so they are not directly associated with one another. For example, a book is described by the attributes author or book genres, while a movie is characterized by its director, movie genres, or cast. Content-based features are different for different domains, though they are conceptually associated with movie genres and book genres. Their features are not properly aligned, for example, humor books vs. funny movies. To address this weakness i.e. heterogeneity of item features from various domains, Linked Open Data is exploited in the proposed model to link entities from various domains. In specific, each item in the dataset is mapped to its corresponding DBpedia (LOD) entities. DBpedia is a multi-domain data storage, from there the structured and the semantic knowledge is acquired for processing. Mapping of item attributes to DBpedia semantic entities is described in section IV 1. After the successful completion of mapping, DBpedia is used to calculate the semantic relatedness between entities. With that available semantic relationship, the structure of the graph and its attributes are mined. In specific, information from DBpedia is used to compute the semantic relatedness matrix between the target (IT) and source domain items (IS). Here SR stands for semantic relatedness and SR ϵ R I T × I S where R is the user preference matrix. This proposed measure is called LOD semantic relatedness measure, its equation is shown below: (1) In equation 1 'rtn' is the relatedness calculation of each item's attributes between source and target domains taken from LOD which is discussed in section III 1. Then the resultant inter-domain relatedness between items is used to relate different domains to generate cross-domain recommendations. When there are only a few rated items in the target domain for a particular user or in the cold-start scenarios, the proposed recommender model will suggest items that are semantically analogous to previously preferred items by the same user in the source domain. Thus, the model will be efficient only if there are some user overlaps among domains. This is because even the new users of the target domain may have rated a few items (user preferences) in the source domain. The working principle of the proposed recommender model is explained in detail in the subsequent sections which exploit semantic relatedness to regularize the factors of an item in matrix factorization. With the intention that related items tend to have similar parameters though they are from different domains [1]. For example, consider utensil and eatables domains where spoon and ice scream are taken for comparison. Even though it belongs to different domains some properties like serve, eat, and health are common among them. Even user preferences are unknown in the target domain; the proposed recommender model will suggest items to cold start users which are based on the preference in the source domain by the same user. The big picture of the proposed model is shown below in figure 1. The working description of this model is given in the subsequent sections III A, III B and III C.

A. LINKED OPEN DATA SEMANTIC RELATEDNESS MEASURE
Various applications like automatic text correction, information retrieval, word sense disambiguation, automatic indexing, and so on uses similarity measures or relatedness measures. However, these two terms are different from each other. Semantic relatedness is described as the degree to which two different words are related to each other by means of any of the semantic relationships like synonymy, hyponymy associative, functional, hypernymy, or meronymy. While the semantic similarity is the special kind of relatedness which considers only hypernymy or hyponymy relationships. On the other hand, semantic relatedness between any two words is not discovered directly but by the set of related words which are well associated and are named as the set of determinative words. Determinative words are closely related to those exact words. Inspired by Mehmet's relatedness measure [43], with some improvisation a semantic relatedness measure is proposed which is incorporated in the proposed model to resolve the user cold start issue in CDRS. To calculate semantic relatedness, information like hierarchical relationships and other types of relationships between words are considered. Such information is collected from LOD and hence the model is termed as the LOD semantic relatedness measure. The relatedness measure uses the combination of word relations based on their importance or context. To demonstrate the variation among relatedness and similarity here presents the illustration of spoon and ice cream for semantic relatedness. Though they are not identical, they share some common features. In terms of functional perspective, both are closely related. For instance, people eating ice cream with a spoon. The proposed semantic relatedness measure working is described as follows.
Let A and B be two words, then relatedness between these two words are measured in the following ways:

STEP 1
Determine the set of related words of (A, B). This semantic information is taken from the Linked Open Data knowledge base "DBpedia".
) be the sets of determinative words obtained from LOD for the words A and B respectively. Then combine the determinative words and form a set which is shown below: X U Y To avoid complexity, the elements of combined determinative words are called as Z = {d1, d2, d3, …, dk), where k ≤ n + m. (n belongs to Aand m belongs to B) STEP 2 Then the normalized value of relatedness between determinatives and the words A and B are calculated as follows: Where represents the total number of pages in web link where the determinative di and the word A occurs together. Similarly, be the number of pages in web link where the determinative di and the word B occur together. Likewise, maxf1 and maxf2 are calculated as follows: If the determinative di is highly associated with the word A or B, then the possibility of occurrence of that determinative on the page of that web link is relatively high. Particularly, if the determinative di is synonymy or closely synonymy to the word A or B, then .Similarly, .

STEP 3
Finally, the relatedness between words is calculated as follows: Here Qi is estimated as follows Then the co-occurrence factor i is described as follows Finally, 's' denotes the synonymy factor which is described as follows: The term rtn(A,B) represents the LOD relatedness measure which calculates the closeness of the attributes of items in different domain. Here, the bag of words i.e the determinative words of the attributes are taken from LOD which occupies megabytes of memory for storage and processing.
is occurred in both words A and B 1, otherwise 1, A and B are synonymy or nearly synonymy 0, otherwise s ì = í î LOD repository contains words and their relationships as a graph database. It was built based on web pages and web links. If any two words were taken, it traverses the graph and fetches all the associated information. Likewise, the determinative words were gathered which are a synonym, hyponym, hypernym and so on. Here, the utensil is the synonym of a spoon, so wherever the word spoon appears, then in all the pages of web link the word utensil will also appear. In short, hits of (utensil, spoon) are equal to 1. Likewise, the words eat and health appear in both words, so they are the co-occurrence determinative words.
The co-occurrence factor for the determinative is equal to 2. In the following Table 1, determinative words and their hits for the pair of words (spoon, ice cream) are given. In Table  1, * indicates the common determinative words, and the rest of them indicates the related value and not the actual number of pages of web link where spoon and utensil are co-occurred. Hence spoon and utensil are synonyms, as hit number we took the maximum hits for the spoon on the determinative words. Based on the equations mentioned before, the relatedness computed between Spoon and Ice cream is 0.741. Relatedness measure value ranges from 0 to 1. Likewise, relatedness is computed for item attributes across domains which include in the MF process which helps in resolving the user cold start problem in CDRS. The CD-SemMF model suggests relevant items to new users efficiently.

C. REGULARIZATION THROUGH RELATEDNESS FORECASTING
The cross-domain semantic relatedness-based matrix factorization (CD-SemMF) model is proposed based on the hypothesis that latent vectors of interrelated items give the semantic relatedness information between items along with user preferences. The proposed model not only predicts the preferences of users but also assesses the inter-domain semantic relatedness of items SRij≈ (qi, qj), where items belong to the source domain, items belong to the target domain and SR represents semantic relatedness. Therefore, the proposed model factorizes both the inter-domain item relatedness matrix and item rating matrix which links the target and source domain. Assume that be the set of all users combined from source and target domain. In the proposed model the user overlaps between domains. Similarly, be the set of all items combined from both domains, but they are not overlapped. Now the model learns the latent vector for each user and represents each item separately from both target domain qj with and source domain qi with as follows: Where QT and QS are the matrices of target and source domains respectively with item latent vectors in rows. As seen in equation (2), summation iterates all items a ∈I from source and target domain as it is necessary to simultaneously factorize the matrices of user-item preferences. The parameter λC denotes the cross-domain regularization which is used to control the contribution of inter-domain semantic relatedness. Items will have too similar latent vectors if the value of λC is large; else it transfers the knowledge limitedly between domains if λC has a low value. Here 'λ' is used to avoid overfitting. Like other standard matrix factorization methods, the proposed model is trained using Alternating Least Squares. Initially, QS and QT are fixed and each pu is solved analytically by setting the value of gradient to 0. This is because user factors are not appearing on λC. The same results were obtained for the baseline Matrix Factorization model. (3) To make the notation simpler, the matrix Q is defined as the row-wise concatenation of matrices QS and QT, where C u is the diagonal matrix with Cua as the confidence value for all a ∈I, ru is the vector containing user preferences u for all items. Finally, the item factor QT and user factor P of the target ( ) , a ua u r » p q As mentioned earlier, ri is the vector of item preferences and SRi denotes the i th row of semantic relatedness of the interdomain matrix. As mentioned earlier P and QS are fixed to calculate the optimal value of item latent vectors of the target domain as follows: The optimal value computations are parallelized with each step, but more number of items is to be considered. So, additional steps are required for the source domain which in turn increases the training time than baseline matrix factorization approaches. To overcome such drawbacks, fast RR1 training algorithm is adapted for Alternating Least Square. Since the user factor computation equals the original matrix factorization approach, the process remains the same for P-step. From equations 2 and 4, it is noted that the Q-step of the source domain states that the extra expressions that originate from inter-domain semantic relatedness are considered just as user preferences as mentioned below: For each item i from the source domain 1. Like the baseline matrix factorization model, examples are generated for every rating rui 2. for every item j ∈IT of the target domain • An input example xj:= qj is produced.
• Relatedness is used as the dependent variable, yj:= SRi j. • Constant confidence value is used, cj:= λC.
• the factor to optimize is w := qj . The above-mentioned steps help to generate the similarity terms of equation 4, which is described by the confidence matrix as C i = λCI. At the same time, the steps involved for the target domain Q-step are the same as that of the source domain. Finally, the model works like this, if a new user came to a target domain his/her preferences are taken from the source domain. Inter-domain relatedness among items are calculated using LOD relatedness and it is added to the matrix factorization process. Then it is regularized and provides personalized items. That's why in the mapping process LOD and MF is integrated, and recommends items to cold start user.

IV. RESULTS AND DISCUSSION
Initial processing is to compare the proposed semantic relatedness metric with other conventional metrics to prove its efficiency and later it is used in cross domain recommendations. The experimental analysis demonstrates that the proposed relatedness measure gives better results than conventional measures for the Facebook dataset in the domain of movies, music, and books. Later, ranking precision is computed for the recommendation generated by the proposed model is evaluated. It is observed that based on the source and target domain involved, the proposed model generates more accurate recommendations than baseline approaches in severe cold start scenarios.
The proposed method was also applied in the real-time amazon datasets in the domain of movies, music, and books to prove its efficiency. The proposed technique is implemented using java language and evaluated on a core i5 3.1 GHZ processor with 8GB ram and 1 TB SSD. Metrics used in evaluations, Baseline methods for efficiency comparison, datasets, and preprocessing works are discussed in detail in the following sub-sections.

A. FACEBOOK DATASET
Initially, the dataset consists of a huge set of user likes on Facebook for different items. With the help of Facebook Graph API, user likes are retrieved as 4-tuple, namely, the identifier, name of the item, liked items' category, and timestamp of the like created. For instance, {754778727942843, The Martian, Movie, 2015-10-08T12:35:08+0000}. Here, "754778727942843" is the identifier, "The Martian" is the name of the item, Movie is the item category, and 2015-10-08T12:35:08+0000 is the time when the item is liked. The item name will be created by the user who opened the Facebook page for that particular item.
Hence, for the same item, it is possible for the existence of different names. For instance, The Martian, Martian: The Movie, The Martian -Mars Film series, etc. Therefore, a user may show his/her likes for the same item on different Facebook pages. In an aim to unite and merge the items extracted from Facebook likes, the method is developed which automatically maps the name of the item with unique URIs of the equivalent DBpedia entities (LOD knowledge base).
For example, unique URI: http://dbpedia.org/resource/The_Martian mapped for the item: The Martian.  To solve the ambiguity of names that equals the items of various domains, state the item type in each case that is needed to retrieve. In specific, the above-mentioned query contains the triplet clause with the property dbo:type or rdf:type. From the above-mentioned example, the subject "The Chronicles of Narnia 2" refers to the item type "film" which belongs to the class "dbo:film" in DBpedia. Item types were set based on the category of that item given in Facebook and their equivalent DBpedia entities. Their YAGO classes were discovered manually by the values of rdf:type of several entities. A list of items with their types and DBpedia/YAGO classes that are considered are given in Table 2 for three different domains, namely, music, movies, and books. Moreover, by executing the above query template, it is noted that several items are not linked to the entities of DBpedia. This is because the labels correspond to the Wikipedia redirection webpage. In such cases, the property dbo:wikiPage redirects are used by the query to attain the appropriate entities. The query executed for the movie "The Chronicles of Narnia 2" and its result is http://dbpedia.org/resource/The_Chronicles_of_Narnia: Prince Caspianwhich is the entity of DBpedia of the second movie "The Narnia" series. The page redirect component of Wikipedia helps to link items whose names are not direct syntactically match the label of DBpedia entity but with the label of redirected DBpedia entity. For example, the tile "The Narnia 2" matches with the entity "The Chronicles of Narnia: Prince Caspian". The URI which links to DBPedia entity is provided [51], and from there the information is extracted for processing.

2) SEMANTICALLY ANNOTATED FACEBOOK DATASET
DBpedia is finally accessed to obtain the metadata of every linked entity, which is then used as an input for the recommender system. In such a case, the SPARQL query is launched to get the object and property of triples which has the target entity as the subject. For instance, the query launched for the example mentioned for Narnia 2 is as follows: The query results in all the pairs of property-value from DBpedia of dbr recommend relevant items in a cold start scenario, the model exploits only the metadata that is relevant to narrate the common preferences of dissimilar users. The returns of the query are filtered in each domain by certain properties. Those properties have high impact on the recommendation process that's why it is considered in processing. Testing was done with all the received properties but it gives the same result as prominent properties. There is no significant improvement in the result with these additional properties instead it consumes more memory space and time. These properties were identified by factorizing the matrix. Table 3 presents the list of selected DBpedia properties for each domain of the Facebook dataset. For example, the item movie has the director, movie genres, and others as metadata. As shown in Table 3, items and their relationship represent the semantic network that is acquired automatically from the DBpedia. The statistics of the Facebook dataset considered for three preferred domains such as music, movies, and books are present in Table 4. It contains the number of items and users present in the Facebook dataset and also the number of likes given for each item. The sparsity ratio is estimated based on the total number of users, items present against the number of likes (likes/users*items), and the average number of items per user is given along with the average number of users per item in Table 4. It states that some or more users may show their interest in more than a single domain. Thus, user overlap exists between domains in this cross-domain recommender system. The number of users overlaps between domains for the Facebook dataset is shown in Table 5 with an overlap ratio.

3) SEMANTICALLY ENHANCED ITEM PROFILES FOR FACEBOOK DATASET
It is fixed that the target item to be recommended is movies, books, and music. Here the obtained item metadata from LOD are distinguished into three categories as follows: -Features, corresponds to the item-attribute entities are related to the considered item types in Table 3, which are distinct to the target item entities. For example, the director and genres of a specific movie. -Associated items, corresponds to the item-item property of Table 3 which derives associated entities. For example, on what basis the movie is? (dbo:basedOn property), or on which band the musicians are belonging to? (dbo:bandMember property), or what the sequel of the particular movie is? (dbo:subsequentWork properties). -Extended features, corresponds to the attributeattribute property which generates extended attributes of items and does not emerging originally as metadata. For example, the fusion of particular music (dbo:musicFusionGenre property).
Semantically enriched item profiles are present in the abovementioned item metadata types which are used in the proposed recommendation model. They are different from widely used item profiles of content-based methods. Those item profiles are made up of plain attributes. From experimental analysis, it is noted that the results attained from semantically enriched item profiles are far better than by using only plain item attributes which is given in table 8.

B. EVALUATION METHODOLOGY AND METRIC
The performance of the proposed recommendation model is validated using the improved user-based 5-fold crossvalidation method [44] for user cold-start scenarios. The main aim is to study how the proposed model performs when the number of likes increases in the target domain. For that, users are roughly divided into five equal subsets. Then in each cross-validation phase, data from four subsets are kept as a training dataset and the remaining subset as the test dataset. Finally, for each user u in the test subset, their likes are roughly divided into three subsets. 1. Initially, training data was filled with user likes and iteratively down-sampled discarding one at a time to simulate the sizes of various cold start profiles. 2. To tune the parameter, a set of likes contained in validation data are used. 3. To estimate performance metrics, testing data is used.
Based on cross-domain scenarios, the above steps are modified by expanding the training subset fully with a set of likes from supporting domains to acquire actual training data for the recommendation. The recommendation model is developed for each cold-start profile size using the resultant training data. Then for every user in the testing subset, topranked 10 items are suggested from the target domain which is present in the training subset. Those recommended items are not known earlier by the user. Then the performance of the model is evaluated from recommendations and test data using rank-based metrics. It is observed from the results that the items suggested after the top 10 positions are irrelevant while computing the metric. Regarding the evaluation metrics, the ranking accuracy of generated item recommendations is validated using Mean Average Precision (MAP). It calculates the mean of average precision scores of items presented on top of the recommendation list. Some other evaluation metrics such as Mean Reciprocal Rank (MRR) are also examined and discovered that this metric provides the same results as MAP but the computation part takes excess time. Hence, the MAP metric is used for evaluation. MAP metric evaluates the recommended items produced by the proposed CD-SemMF model for new users. Mean Average Precision is shown in equation 6 for better understanding.
Here, |X| represents the number of test sets, U represents the set of all users and APU@N represents the average precision of the recommendations at Top-N of all users.

C. BASELINE METHODS FOR COMPARISON
The performance of the proposed Cross-Domain Semantic Relatedness based Matrix Factorization (CD-SemMF) model is compared with the existing baseline models, namely, CD-iMF and CD-FISM.
• CD-iMF: It's a MF algorithm for positive-only feedback is trained by fast ALS. • CD-FISM: The variant FISMauc is optimized for item ranking problems. Its implementation is used in the recommendation model which is accessible from LibRec [45]. FISM stands for Factored Item Similarity Model.
The optimal values for parameters of three different domains namely, Music, Movie, and Book are obtained by applying the Bayesian Optimization Technique [46] which is represented in table 6. This optimization rules out the probability of carrying out a grid search for preeminent values. Bayesian trains the model to forecast the candidate values which maximizes the given function and also reduces the uncertainty over the unknown parameter values. Here 'k' is the dimensionality of the latent factor, 'λ' is the regularization factor, 'α' is the confidence parameter and 'λC' is the cross-domain regularization factor. With these optimal values mentioned for each domain in Table 6, the proposed CD-SemMF model provides better recommendations for cold start users. If any of the parameter settings go beyond the fixed limit, there is no significant change in the outcome instead it consumes time and produces the same accuracy. Hence, the system works with these tuned values to provide good results. For CD-iMF parameter settings are k = (12,29,20), λ = (10 -6 ,0.623,1) and α = (7,6,9) for books, movies and music respectively. For CD-FISM λ = 0.001, n=10 -5 and reset of the settings are same like CD-iMF.

D. INTER-DOMAIN ITEM SEMANTIC RELATEDNESS
The initial phase in processing is discovering semantic relatedness among items from both domains and later the knowledge is injected into the MF process. MAP metric is used to assess the performance of existing semantic relatedness metrics with the proposed LOD-semantic relatedness metric for the Facebook dataset is shown in Table  7. It is evident from Table 7 that the performance of the proposed LOD Semantic Relatedness metric is far better than baseline approaches such as Latent Semantic Analysis [47], Explicit Semantic Analysis [48], and Milne & Witten method [49]. For instance, in the book music domain Milne & Witten relatedness measure is 0.141, LSA is 0.067, ESA is 0.113 whereas the proposed LOD semantic relatedness measure score is 0.274. So, the superior proposed LOD-Semantic Relatedness measure is incorporated with MF in the recommendation process. This proposed model is called CD-SemMF.

E. ITEM RANKING ACCURACY FOR FACEBOOK DATASET
The ranking accuracy of items suggested by the proposed recommendation model is evaluated using the MAP metric.
In an aim to study the effectiveness of cross-domain variants than single-domain variants, and also to discover whether the proposed MF model outperforms other existing models in user cold-start scenarios, the accuracy of the item rankings is analyzed. The ranking accuracy for movie recommendation is presented in table 8 with plain item profiles. For the booksmovies domain CD-iMF ranking score is 0.303, CD-FISM score is 0.298 whereas the Proposed CD-SemMF ranking score is 0.319. similarly, for the music-movie domain CD-iMF ranking score is 0.302, CD-FISM score is 0.289 and the proposed CD-SemMF score is 0.312. It is evident that movie recommendations are improved by 5.01% in the Book-Movie domain and 3.1% in the Music-Movie domain. This recommendation is done using normal item profiles which is less accurate than the semantically enriched one which is discussed next. Similarly, for other domains also the accuracy is less with normal item profiles in recommendations. Comparisons from table 9-11 is done using semantically enriched item profiles which provides better results than consuming plain item profiles in the recommendation process. Table 9 presents the ranking accuracy of recommendations generated for the book domain through Mean Average Precision. Performance of the recommendation model increases with an increase in each like. Likewise, user overlap between different domains also helps to generate more relevant recommendations. For the movies-books domain CD-FISM ranking score is 0.265, CD-iMF score is 0.284 whereas the Proposed CD-SemMF ranking score is 0.296. similarly, for the music-book domain CD-FISM ranking score is 0.258, CD-iMF score is 0.279 and the proposed CD-SemMF score is 0.292. From table 9, it is evident that book recommendations are improved by 4.45% in the Music-Book domain and 4.05% in the Movie-Book domain. The average result reported for cold-start user profiles whose likes range from 6-10 is stable. It is observed based on the evaluation methodology, that the number of test users remains the same despite the size of the profile and also controls it by iteratively down sampling the profiles of training subsets. Similarly, Table 10 presents the ranking accuracy estimated for movie recommendations. For the books-movies domain CD-iMF ranking score is 0.393, CD-FISM score is 0.332 whereas the Proposed CD-SemMF ranking score is 0.419. similarly, for the music-movies domain CD-iMF ranking score is 0.392, CD-FISM score is 0.234 and the proposed CD-SemMF score is 0.405.
It is evident from table 10 that the movie recommendations are improved by 6.205% in the Book-Movie domain and 3.21% in the Music-Movie domain. Existing cross-domain models provide items for a new user of the movie with less accuracy than the proposed CD-SemMF. If the supporting cross-domain data contains book preferences, then the proposed MF model outperforms the best existing singledomain baseline approaches. In such a scenario, CD-SemMF generates more relevant recommendations than other popular approaches for profile size ranges from 1-10. This is owing to the high level of user overlap existing between the domains movie and book which allows CD-SemMF to calculate accurate item relatedness based on like patterns. In brief, user preferences on music and book domains are useful to generate movie recommendations for cold-start users. While the proposed recommendation model is more efficient than others by exploiting supportive likes of music. Similarly, the ranking accuracy results for music recommendation generations are presented in table 11. The music recommendations are generated for cold-start users, and the proposed model is capable enough to enhance the quality of item ranking with the help of cross-domain item metadata.
For the books-music domain CD-iMF ranking score is 0.469, CD-FISM score is 0.347 whereas the Proposed CD-SemMF ranking score is 0.506. similarly, for the movies-music domain CD-iMF ranking score is 0.486, CD-FISM score is 0.384 and the proposed CD-SemMF score is 0.513. It is evident that music recommendations are improved by 7.31% in the Book-Music domain and 5.26% in the Movie-Music domain.

F. PERFORMANCE EVALUATION USING AMAZON DATASET
Amazon dataset is chosen for real-time evaluation. The following figure 2 presents the format of the Amazon dataset [50]. It contains a lot of item information like title, category of the item, id, ASIN No. and so on. Among them all, only a very few data are required for the experimental purpose. They are id, rating, customer, and group which represents the customer id, item id, numerical rating value, and group type provided by the customer of that particular item. But, there are items with no user rating. For example, in figure 2 the id: 9 has no rating. Such types of entry are irrelevant and so they are removed from the dataset. It is also filtered out for accurate recommendations and converts the data into four tuples with id, group, customer, and rating. The overall description of the Amazon dataset taken for the experiment is given in Table 12. It contains a category of the item, ratings for each item, the number of items, and users present in the dataset.

V. Conclusion and Future Work
The most considered and well accepted solution for user cold start issues in collaborative filtering algorithms is crossdomain. CF only mines the pattern of user-item preferences and ignores item content information to connect interested domains. While some other algorithms exploit content-based relations to effectively connect domains of interest. A recent initiative called Linked Open Data (LOD) provides large interconnected repositories of structured knowledge which is used in a way to relate different types of data. This type of heterogeneity helps to launch content-based links between different types of items. Therefore, in cross-domain recommendations, LOD can be used to connect different domains. Hence, in the proposed work Linked Open Data were exploited to extract the item metadata which helps for better recommendation across domains. With this additional information, it is feasible to discover the relation between items of various domains and eventually easy to compute inter-domain item relatedness. A novel Cross-Domain Semantic relatedness-based Matrix Factorization model (CD-SemMF) is proposed for recommendations that exploit the computed relatedness between items to link knowledge across different domains. From the experimental analysis in the severe user cold start scenarios, it is evident that the crossdomain recommendation algorithm generates accurate suggestions for new users in the target domain by exploiting item metadata based on the source and target domains involved in the recommendation process. However, the priority of the CF recommender system in cold start scenarios is to engage the target user by generating relevant suggestions than diverse or non-relevant items. The model proposed in this research work belongs to the knowledge transfer crossdomain recommender system. The proposed approach is applied to the linked-domain exploitation task which addresses the user cold start issue. Moreover, it also inferred from the experimental results that item metadata retrieved from DBpedia and the relatedness measure are the added advantage of recommendations with minimum user overlap. The proposed CD-SemMF model provides better recommendations for new users than existing models. In the future, the work represented in this article will be extended by inheriting deep learning techniques in cross domain recommendations and also improvement should be done in the experimental section by considering more cross domain datasets linked to DBpedia.

DECLARATION OF COMPETING INTEREST
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
scheme (MTR/2019/000542). The authors also acknowledge SASTRA Deemed University, Thanjavur, for extending infrastructural support to carry out this research work.