A Bayesian Inference Based Hybrid Recommender System

The large mass of various products/services accessible on the Internet has motivated the development of recommender systems to refine the selection of items aligned with users’ expectations. Recommender systems have been developed to tackle the item targeting problem. They are crucial tools that quickly target items fitting users’ needs, thus allowing them to easily identify the items that fit their tastes and preferences. Following state-of-the-art methods, a distinction is made between content-based recommender approaches and collaborative filtering-based recommender approaches. Collaborative filtering-based recommender approaches are the most widely adopted methods. They are divided into memory-based methods that show the advantage of their easy-understandability, and model-based methods that are data sparsity resilient and high-accurate. In this paper, we propose a hybrid model-based recommendation approach, a combination of a user-based approach and an item-based approach. Our method estimates the probability with which a user would rate an item. It performs a Bayesian inference of future end-user interests and shows the advantage of the easy-understandability of memory-based methods and the effectiveness of model-based methods. Experiments are conducted on real-world datasets and show that our method outperforms several state-of-the-art recommendation methods regarding the prediction accuracy and the recommendation quality.


I. INTRODUCTION
The rapid growth of the Internet has boosted the increase of rich and varied content (videos, films, music, articles, services etc.). However, the diversity and the large number of items accessible by users complexify the selection of items aligned with users' needs [1], [2]. To address the information overload problem, Recommender Systems (RS) have been developed to refine the items' targeting. Therefore, RS are decision support tools for end-users. According to the literature, RS are organized into content-based methods and Collaborative Filtering (CF)-based methods [3]. Contentbased methods rely on the past users' experiences in order to predict their future interests. They are strongly focused on the target user's profile, and therefore they are not effective in case of poor user profile [4]. To overcome content-based methods limitations, CF-based approaches lay on profiles of a group of users who share the same or similar tastes. CF-based The associate editor coordinating the review of this manuscript and approving it for publication was Tomás F. Pena . methods are built on the assumption that within a group of users with preferences identical or similar to those of a target user, the historical data of some of them can be used to predict the future interests of the target user [5]. CF-based approaches are very popular and organized into memory-based CF and model-based CF [6]. From a users' rating matrix, memorybased methods rely on computations of inter-user similarities according to user-based CFs, or inter-item similarities regarding item-based CFs [7]. They are easy-explainable and simple-implementable but show poor performances in case of users' data sparseness [8]. Meanwhile, model-based methods are complex but show a high precision even for large datasets [9]. They alleviate the data sparsity problem and the cold start problem [10]. Indeed, they are effective even when users' data are insufficient or when a new user or a new item is added to the system and no relevant information are available to improve the related recommendation.
The matrix factorization technique is widely adopted for the implementation of model-based CFs [11]. This technique performs a reduction of the original data matrix.
Following the matrix factorization technique, the prediction of the target user's interests is carried out on a reduced data matrix. Consequently, relevant data are lost during the latent feature modeling that is a hard masterable process [7]. The data lost during the matrix factorization process affects the prediction accuracy and therefore the item targeting precision. To alleviate the item targeting problem, in this paper, we propose a hybrid recommender system that uses Bayesian estimation to predict users' interests. Our proposal combines a user-based approach and an item-based approach to assess the probability with which a user rates an item. In other words, the user's sensitivity relative to a given item is estimated by maximizing the posterior probability with which a user rates an item. Recall that, the user-based approach of our model refers to infer the probability with which a user rates an item knowing that users with similar profiles have rated the item; while the item-based approach of our model refers to estimate the probability with which an item is rated by a user knowing that similar items have been rated by the user.
Our contribution in this paper is highlighted by the proposal of a hybrid Bayesian recommender model showing the advantage of the easy-explainability of memory-based CFs and the effectiveness of model-based CFs. Indeed, although high-effective, the hard-understandability of model-based CFs complexifies their implementation. Our method is easyexplainable and high-accurate. In this paper, our contributions are declined as follows: • A novel Bayesian inference-based model (BIHRS) is proposed to predict users' interests. Our model uses a Dirichlet distribution to accurately model a priori information such as the item's popularity, the usual behavior adopted by a large number of users, the consumption habits, and preferences shared by a majority of users.
• An easy-explainable and high-accurate recommendation approach is developed by combining assets of memorybased CFs and model-based CFs. Indeed, the proposed recommender approach benefits from a hybridization process that enables the inclusion of all various aspects raised by interactions between users and items; and therefore increases the resilience of our method to the data sparsity problem. With an increased datasparseness resilience thanks to an effective hybridization, our method shows a valuable recommendation accuracy.
• Extensive experimentations are conducted on various real-world datasets to highlight the effectiveness and the scalability of our proposal. The remainder of this paper is organized as follows: section II surveys state-of-the-art recommendation methods, section III details our recommendation approach. Section IV describes the conducted experiments and obtained results. Section V concludes this paper and presents our perspectives.

II. RELATED WORK
The information overload on the Internet has complexified the selection of items relevant regarding users' needs.
The challenges in the RS's field have motivated researchers to develop recommendation methods. In this section, we survey state-of-the-art methods employed in CF-based approaches as CFs are part of the most influential recommendation algorithms [8] and are related to our proposal.

A. MEMORY-BASED METHODS
Memory-based CFs exploit users' ratings relative to items to make the recommendation. For this purpose, a user-item matrix is built from the collected ratings. Afterwards, interuser similarities are computed to assess the proximity of tastes between users [7]. Finally, the recommendation is based on a weighted average of ratings of users having similar preferences [8]. Another variant of memory-based CF focuses on items and computes inter-item similarities to assess the proximity of users' interests raised by items [12].
In [13], the authors propose a recommender system based on inter-user trust relationships. They define a measure of inter-user similarity from a customized Pearson Correlation Coefficient (PCC) and the Jaccard Correlation Coefficient (JCC) [14]. Subsequently, the inter-user trustworthiness is assessed in order to guarantee the recommendation reliability.
The authors in [15] propose a cloud service recommendation approach based on the evaluation of inter-user similarity using the Spearman Rank Correlation Coefficient (SRCC). The authors' proposal presents the advantage of ranking prediction accuracy. In [16], the authors propose a similar recommendation approach but they apply the data smoothing technique to improve the recommendation quality. Following [17], the authors propose a recommendation model based on a customized K-means algorithm combined with the cuckoo search to aggregate users with similar profiles. Regarding [18], the authors propose a measure of inter-user similarity based on Bayes concordance rate to improve the recommendation precision. In [12], the authors propose an efficient privacy-preserving recommendation approach. The authors' method is based on an item-based CF. To preserve user privacy, the authors assess inter-item similarity using a customized PCC. Memory-based CFs used in the aforementioned researches present the advantage of being easy-understandable and easy-implementable. However, they are unsuitable for large datasets because they are highcomputational cost methods, and therefore non-scalable and low-accurate. Indeed, the similarity computations are performed on a full high-dimensional rating matrix inducing an increase of the computational cost. In addition, referring to the data sparsity problem, these proposals show poor performances when a large amount of relevant data is missing.

B. MODEL-BASED METHODS
Model-based CFs apply machine learning techniques to build from users' data, a model that better fits observations [19]. Model-based methods are high-accurate and data sparsityresilient. It means that they are effective even when facing the missingness of significant information relevant to the recommendation refinement [20]. VOLUME 8, 2020 The matrix factorization technique is widely used in model-based CFs due to its efficiency in modeling users' preferences from a small number of latent factors [21]. This technique consists in reducing high-dimensional users' data matrix into low-dimensional matrices thanks to an accurate selection of relevant latent features [22]. Thereafter, an optimization method is applied to minimize the error due to the matrix reduction [11].
Assuming that users' data can be modeled by lowdimensional latent factors matrices, in [23], the authors use the matrix factorization technique to improve the recommendation accuracy. To tackle the data sparsity problem and the cold-start problem, in [24], the authors propose a probabilistic matrix factorization framework for an online recommendation. The authors in [25], present a non-negative variant of matrix factorization that integrates social trust information in a model that addresses data sparsity and cold start issues. With a similar trend, the authors in [26] integrate social information into their recommender system based on a matrix factorization method. From implicit feedbacks, the authors in [27] propose a personalized ranking method using a Bayesian pairwise learning to improve the recommendation performances that are based on the matrix factorization technique. The authors in [28], propose a time-aware recommendation approach that uses a matrix factorization technique and a Long Short Term Memory (LSTM) model for the selection of services from the Quality of Service (QoS) prediction. In [29], [30], authors propose a matrix factorization based method that uses a Bayesian strategy. In [29], they propose a probabilistic matrix factorization based recommendation approach that they extend in [30] by applying the Gibbs sampling strategy that speeds up the algorithm convergence and therefore enhances the execution time of their method.
To alleviate the data sparseness issue, the authors in [31] propose a hybrid recommendation approach for tourist spots. Their proposal assesses the correlation between image features using a Bayesian ranking algorithm. In [32], the authors propose a model-based CF using a parallel naive Bayesian approach implemented on a Hadoop distributed infrastructure. While researchers in [33] infer the potential risk of disease using Bayesian collaborative filtering. In [34], the authors develop an electricity plan recommender system using a probabilistic approach to predict the missing features used in the recommendation of electricity retailing plans. The authors' approach is based on a matrix factorization technique to reduce the computational cost of their algorithm. The authors in [35], propose a recommendation approach based on a naive Bayesian classifier to predict the probability with which a user rates an item. However, the authors use a uniform probability distribution that does not consider a priori intuitions related to the item's popularity, the consumption habits adopted by the majority of users. Authors in [36], [37] propose recommendation approaches based on Bayesian networks. Those methods are both based on the Naive Bayesian classifier that benefits from the effectiveness of modelbased approaches. In [37], they use Laplace estimator in the prediction process, while authors in [36] combine contentbased and CF-based approaches to enhance the prediction accuracy. However, those methods both employ uniform a priori probabilities that do not hold some real considerations. Indeed, in practice users rate only a small subset of items. In this way, some items are more popular than others, and they therefore show a higher a priori probability to be selected by users compared to other items. It means that a uniform a priori probability will not consider those aspects. Furthermore, users who rate most of the items show a higher a priori probability to select an item. For this purpose, our proposal lays on non-uniform probability distributions and hybridization between a user-based approach and an itembased approach in order to consider all aspects raised by interactions between users and items, and therefore enhance the prediction accuracy. In addition, the proposed method gains accuracy by considering non-uniform probabilities that consider the aforementioned aspects of user-item interactions.
Based on the matrix factorization technique, the abovesurveyed researches present the advantage of being high accurate and efficient for large datasets. However, the matrix factorization technique is mainly based on the decomposition into low-dimensional latent factor matrices which is a complex, hard-explainable and hard-masterable process. Meanwhile the above-presented bayesian researches are based on uniform probability distributions that do not model the a priori information in a realistic way given the observed data.

C. HYBRID RECOMMENDATION METHODS
Hybrid recommender systems have been developed to alleviate limitations shown by existing user-based CF, item-based CF, content-based, and model-based approaches separately [38]. In hybrid approaches, predictions are performed on each method involved in the hybridization process separately, and thereafter combined to overcome some common problems such as cold start problem and data sparseness [39]. In the literature, several approaches [40]- [42] are developed to benefit from hybrid recommender systems. The main advantage of those methods is their ability to employ plural aspects raised by interactions between users and items to refine as most as possible the recommendation. In [43], the authors propose a hybrid recommendation approach that combines several explanations styles (user-based, itembased, popularity-based) to refine the music recommendations. Authors in [44] propose a recommendation method that merges item-based CF, user-based CF, and factor-based approaches in order to build a hybrid recommender system for artists. Researchers in [45] exploit user-item interactions and data from purchased items to propose a hybrid system for artist recommendation. As the above-presented methods, existing hybrid recommender systems are based on a combination of recommendation approaches which have shown some limitations. Their effectiveness, therefore, remains limited by the drawbacks of methods involved in the hybridization process. To remedy this, the hybridization of our proposal lays on Bayesian user-based and item-based approaches that are separately efficient before the effectiveness of the hybrid method.
In this paper, we embed the hybridization concept to benefit from various aspects raised by user-items interactions. We perform a hybrid prediction as each user presents an interest in specific items; and reversely, each item raises an interest observed from a set of users who have consumed it. To address the fast items identification problem, we propose an easy-explainable and high-accurate Bayesian recommendation approach. Driven by Dirichlet distributions, our proposal considers a priori information and users' intuitions to accurately model the observed ratings.
The next section presents our BIHRS approach.

III. BIHRS RECOMMENDATION METHOD
The proposed recommendation approach is flexible and configurable since any kind of actions (clicks, likes, purchases, reviews . . . ) that users perform on items can be transformed into scores assimilable to ratings. In this way, the degree of the users' interest on items is therefore scored like an explicit rating would do. This configurability of our model contributes to the robustness of the system to the cold-start problem that happens when a new item or a new user is added to the system knowing that there are no explicit ratings on the new item or from the new user. In the rest of the paper, our model is fed by ratings assigned by users on items knowing that any other type of users' actions on items can be scored as explicit ratings and involved to feed our prediction model. Our approach aims to estimate the probability with which a user u would rate an item i with an unknown ratingr ui . Recall that each rating assigned to an item by a user reflects the level of interest expressed by the user relatively to the concerned item. In other words, to predict the future interest of a user u, it is required to infer the probability with which he would rate an item i. The Bayesian approach used in our model first identifies the probability distribution that better fits the observed ratings. Thereafter, the probability model of the estimation parameters is built in order to generalize the probability distribution of observations and ratings likely to be assigned on item in the future by the active user. Following the Bayesian approach, Fig. 1 describes the workflow of our BIHRS estimation model as follows: • The historical users' experiences regarding items are expressed via a D tensor of users' ratings probabilities, and via a B tensor of probabilities of ratings on items.
The D tensor is fed by the dataset of users' ratings while the B tensor is fed by the dataset of observations on items.
• Subsequently, we estimate the h u ( ) prior probability distribution of the users' cluster membership and the overall rating probability of each users' cluster. Analogically, regarding the item-based approach, we calculate the h i ( ) prior probability distribution of the items' cluster membership and the overall rating probability by each items' cluster. • We evaluate the g u (D| ) sampling distribution of users' ratings considering the α users' cluster membership parameter and the γ overall rating probability parameter of each users' cluster. From the item-based approach, we compute the g i (B| ) sampling distribution of ratings observed on items considering the θ items' cluster membership and the ω overall rating probability of each items' cluster.
• Based on the h u ( ) prior probability distribution, we infer the g u ( |D) posterior probability distribution of users' ratings considering D observations. Thereafter, we infer the g u (d|D) Bayesian predictive distribution of unknown observabled given the D users' observations. Considering the D users' observations, the g u (d|D) distribution evaluates thed probability with which a user u would rate an item i by assigning it an unknown ratingr ui . From the item-based approach, based on the h i ( ) prior probability distribution, we estimate the g i (b|B) Bayesian predictive distribution of unknown observableb considering the B observations on items. Given the B observations on items, the g i (b|B) distribution prospectively evaluates theb unknown probability with which an item i is rated by a user u.
• From Bayesian predictive distributions g u (d|D) and g i (b|B), the hybridization is performed to obtain a hybrid Bayesian predictive distributionp ui . Following the Maximum a Posteriori (MAP) principle, the unknown rating r ui is deduced by maximizingp ui . To increase the robustness of our model related to data sparsity, we build clusters using the popular K-means algorithm [46]. To enhance the clustering performance, we elect centroids of clusters following a specific scheme. At the initialization of the clustering process, the first centroid is VOLUME 8, 2020 the user who has rated the highest number of items to get more common items between him and other users. The other centroids are thereafter selected among users in such a way that maximizes the distance between them and existing centroids. That distance is characterized by a probability defined as follows [47]: where u is a potential centroid. and Dist is computed as follows: where Sim PCC is the similarity evaluated between u and existing centroids according to the Pearson Correlation Coefficient (PCC) [8], and τ ≥ 1 is a positive value that avoids negative distance due to values of PCC that range between −1 and 1. Through an analogical approach, we initialize items clustering process by selecting as the first centroid, the item that has been most often rated. Other centroids are selected among items the most distant from existing centroids. Once elected centroids, the remaining of the process consists to populate clusters by adding users and items that are the most similar to centroids. Our estimation model is based on the distribution of all users into users' clusters. In accordance with the principle of collaborative filtering, the idea is to use the historical experiences of certain users of the users' cluster to infer the other users' future experiences. Each users' cluster aggregates users with similar preferences; to mean users who have historically assigned similar ratings to the same items. From a probabilistic approach, considering users' profiles, a user belongs to a users' cluster CLU t with an a priori probability α t . By similar reasoning, items are also organized into items' clusters. An item is assigned to an items' cluster according to the proximity of ratings on it. An item belongs to an items' cluster CLI l with an a priori probability θ l . Fig. 2 describes the probability distribution of users and items clusters membership.
×S is defined by using the U set of users, the I set of items, and the R set of ratings. This tensor hosts the d uir probabilities of the rating r assigned by a user u on an item i. We consider γ = {γ 1 , γ 2 , . . . , γ t , . . . , γ C }, the vector of users' ratings probabilities belonging to clusters CLU 1 , CLU 2 , . . . , CLU t , . . . , CLU C respectively. According to the Bayesian theory [48], we set the estimation parameter = (α, γ ) made from probabilities of users' cluster membership and users' rating probabilities by cluster.
The purpose is to predict the probability with which a user u rates an item i given the estimation parameter and the D set of historical observations known as past observed users' ratings. Following the Bayesian inference, we estimate the g u (d|D) predictive distribution of ad unknown observable given the D set of observed users' ratings. For this purpose, we implement in Algorithm 1 a progressive reasoning declined as follows: • Estimation of the g u (D| ) sampling distribution also known as the Likelihood distribution according to Bayesian theory [49]. This distribution is defined as the probability distribution of all users' ratings.
• Estimation of the h u ( ) prior distribution defined as the probability distribution of estimation parameter.
• Estimation of the g u ( |D) posterior distribution known as the probability distribution of the parameter based on observations.
• Estimation of the g u (d|D) Bayesian predictive distribution defined as the probability distribution for an unknown observabled.
Considering a user u belonging to the cluster CLU t with a α t cluster membership probability. His g u (d u | ) rating probability distribution is trivially described as a multinomial distribution defined as follows: where v uir is a random variable set to 1 when user u rates an item i with rating r, and 0 otherwise; d u is the known observable relative to user u; and d uir is the probability with which the active user u rates an item i. Given the fact that the user u can belong to any cluster CLU t , t ∈ T , the g u (d u | ) is reexpressed as follows: The overall g u (D| ) users' rating probability distribution [50] also known as the sampling distribution or Likelihood distribution, is described as follows: The next step is to estimate the h u ( ) prior distribution of the estimation parameter. The h u ( ) prior probability distribution aims to model a priori user-related information such as the most common users' consumption habits. To estimate h u ( ), it is mathematically convenient to choose a conjugate distribution to reduce the estimation process complexity [48]. The Dirichlet distribution is the right conjugate prior distribution given our g u (D| ) Likelihood distribution that is a multinomial probability function [51].
Because α and γ parameters are both independent vectors, the h u ( ) prior distribution is deductively computed as follows: where h u (α) and h u (γ ) are partial prior distributions defined by Dirichlet functions.
After estimating the g u (D| ) Likelihood and the h u ( ) prior distributions, we infer the g u (d|D) Bayesian predictive distribution for an unknown observabled. For this purpose, we first estimate the g u ( |D) overall posterior distribution of estimation parameter given the D set of observed users' ratings. Thereafter, from the g u ( |D) posterior distribution, the g u (d|D) Bayesian predictive distribution is deduced. According to the Bayesian theory [49], the g u ( |D) posterior distribution is computed as follows: where h u (α) and h u (γ ) are prior distributions; and g u (D| ) is the Likehood distribution computed in Equation (5). According to [48], the g u (d|D) Bayesian predictive distribution is computed as follows: where g u ( |D) is the overall posterior distribution of the estimation parameter given the observed users' ratings; and g u (d| , D) is the joint sampling distribution for the unknown observabled given the D set of observations. The integral in the g u (d|D) expression is complex to compute. To reduce the computational complexity, we use a numerical integration for a straight computation [49]. The g u (d|D) distribution is reexpressed as follows: The g u (d|α t , D) distribution component of Equation (11) is estimated as follows [48]: The integral in Equation (12) can be approximated by a Beta function as follows: From the original definition of the Beta function through the Gamma function, Equation (13) can be rewritten as follows: .
By averaging Equation (14), the g u (d|α t , D) distribution is approximated as follows: According to the Bayesian theory, the g u (α t |D) distribution component of Equation (11) is estimated as follows: The integral in Equation (16) can be approximated by a Beta function as follows:
The next subsection details the Bayesian inference process regarding the item-based approach.

C. ITEM-BASED APPROACH
We consider the tensor B ∈ N ×M ×S , where ∈ [0, 1]. The tensor B = [b iur ] N ×M ×S is defined by using the I set of items, the U set of users, and the R set of ratings. This tensor hosts the b uir probabilities of a rating r on an item i and assigned by a user u. We consider ω = {ω 1 , ω 2 , . . . , ω l , . . . , ω H }, the vector of ratings probabilities on items belonging to clusters CLI 1 , CLI 2 , . . . , CLI l , . . . , CLI H respectively. We consider θ = {θ 1 , θ 2 , . . . , θ l , . . . , θ H }, the vector of membership probabilities respectively related to the items' clusters CLI 1 , CLI 2 , . . . , CLI l , . . . , CLI H . Following the Bayesian theory, we set the estimation parameter = (θ, ω) from probabilities of items' cluster membership and rating probabilities on items by cluster.
As for the user-based approach, in the item-based approach, we adopt in Algorithm 2 a progressive reasoning declined as follows: • Estimation of the g i (B| ) sampling distribution of all ratings on items given the parameter.
• Estimation of the h i ( ) prior distribution of estimation parameter.
• Estimation of the g i ( |B) posterior distribution that refers to the probability distribution of the parameter based on observations.
• Estimation of the g i (b|B) Bayesian predictive distribution for an unknown observableb. Let an item i belonging to the cluster CLI l with a θ l cluster membership probability. Its g i (b i | ) rating probability distribution is trivially described as a multinomial distribution defined as follows: where w iur represents a random variable set to 1 when item i receives a rating r assigned by user u, and 0 otherwise; b i is the known observable relative to item i. Knowing that item i can belong to any cluster CLI l , l ∈ L, the g i (b i | ) is reexpressed as follows: The overall g i (B| ) rating probability distribution on item i, also known as the Likelihood or sampling distribution, is described as follows: Now, we estimate the h i ( ) prior distribution of the estimation parameter. The h i ( ) prior probability distribution aims to model realistic considerations such as items' popularity information. To estimate h i ( ), we employ the Dirichlet distribution as the conjugate prior for the g i (B| ) multinomial Likelihood function. Because of the independency of parameters vectors θ and ω, the h i ( ) prior distribution is deductively estimated as follows: where h i (θ) and h i (ω) are partial prior distributions defined by Dirichlet functions.
The h i (ω) distribution is obtained as follows: where ε = (ε 1 , ε 2 , . . . , ε K ) are Beta function hyperparameters. For a non-informative prior, K and ε are set to 1. After having computed the g i (B| ) Likelihood and the h i ( ) prior distributions, we estimate the g i (b|B) Bayesian predictive distribution associated to an unknown observableb. For this purpose, we first infer the g i ( |B) posterior distribution of estimation parameter given the B set of observations. Afterwards, from the g i ( |B) posterior distribution, the g i (b|B) Bayesian predictive distribution is inferred. Following the Bayes rule [49], the g i ( |B) posterior distribution is obtained as follows: where h i (θ) and h i (ω) are prior distributions; and g i (B| ) is the Likehood distribution. The g i (b|B) Bayesian predictive distribution is computed as follows: where g i ( |B) is the posterior distribution of the estimation parameter given observations; and g i (b| , B) is the joint sampling distribution for theb unknown observable given the B set of observations. For a straight computation, the g i (b|B) distribution is approximated as follows: The g i (b|θ l , B) distribution component of Equation (27) is approximated as follows: VOLUME 8, 2020 The g i (θ l |B) distribution component of Equation (27) is computed as follows: Algorithm 2 details the Bayesian inference process regarding the item-based approach.

Algorithm 2 Item-Based BIHRS Algorithm
Data: • B : Tensor of probabilities of ratings on items • θ : Vector of probabilities of items' cluster membership Compute the Bayesian predictive distribution for an unknown observableb g i (b|B)+ = g i (b|θ l , B).g i (θ l |B); 11 end 12 Return g i (b|B); The next subsection describes the approximation of our BIHRS model which is a combination of the user-based approach and the item-based approach.

D. MARKOV CHAIN MONTE-CARLO SAMPLING
The above-computed Bayesian predictive distributions host a significant number of computationally greedy sums in high dimensions. In order to avoid computational overflows, we perform a Markov Chain Monte Carlo (MCMC) sampling of our model. MCMC sampling is a numerical integration method that allows the approximation of an integral over continuous function by computing the function over a finite number of values [52]. The Metropolis algorithm is one of the most widely adopted MCMC sampling methods [48]. Like an adaptation of the Random Walker [53], it is based on conditioned moves towards the target approximate distribution. Iterations are performed in order to converge towards the target distribution. At each iteration, the move is validated by a x acceptance rate. For this purpose, from g u (d|α t , D), we draw a sample α 1 , α 2 , . . . , α Q ; and from g i (b|θ l , B), we draw a sample θ 1 , θ 2 , . . . , θ P .
From the user-based approach, a move α t → α t * is validated with a probability equal to min(x u , 1), with the acceptance rate Recall that the variables v jir and v * jir are respectively induced by α t and α t * . The sample is used to approximate the g u (d|D) Bayesian predictive distribution as follows: Following the item-based approach, a move θ l → θ l * is granted with a probability equal to min(x i , 1), with the acceptance rate The random variables w jur and w * jur are respectively induced by θ t and θ t * . The sample is used to approximate the g i (b|B) Bayesian predictive distribution as follows: The next subsection describes the rating prediction.

E. HYBRID RATING PREDICTION
The hybridization in our method consists of the merging of user-based and item-based approaches. The main advantage of using hybridization of those approaches is the reduction of the system sensitivity to the number of items that have been rated, and the number of users who have rated an item. This reduction of the system sensitivity contributes to the resilience of the system regarding the data sparseness problem.
The rating prediction process is based on both user and item-based approaches. Thep ui hybrid probability is calculated as follows: where g u (d|D) and g i (b|B) are the Bayesian predictive distributions respectively from the user-based approach and the item-based approach.
According to the MAP principle [35], ther ui predicted rating is the rating that maximizes thep ui predictive probability. It is obtained as follows: Algorithm 3 summarizes the hybrid Bayesian inference process.

Algorithm 3 BIHRS Algorithm
Data: • D : Tensor of users' ratings probabilities for an unknown observabled using Algorithm 1; 3 Compute the g i (b|B) Bayesian predictive distribution for an unknown observableb using Algorithm 2; 4 MCMC model approximation; 5 # Hybrid rating prediction 6 Compute the predicted rating Items are ordered from the most liked by the target user u to the least liked. The Top-N set of items with the highest ratings is the list of items that will be recommended to the target user.

F. DRIVING EXAMPLE
In this section, we run an example to show how our proposal works. Tab. 1 on which we apply our method, presents the data matrix that hosts ratings assigned by six users on ten items.
We run our method under the following settings: • The number of users' cluster is fixed to one (t = 1). This means that there is only one users' cluster. The cluster membership probability is the same for all users of the cluster and the cluster membership probability is evaluated as α = α t = 1.
• The number of items' cluster is set to one (l = 1). It means that there is only one items' cluster. The cluster membership probability is the same for all items belonging to the cluster and is estimated as θ = θ l = 1.
• From the user-based approach, the hyperparameter λ in Equation (7) is set to 1 for a non-informative prior probability distribution. Deductively, we compute the users' preference probalibity in Equation (7) as h u (α) = 1.
• From the item-based approach, the hyperparameter ϕ of the h i (θ ) preference probabilities on items in Equation (23) is set to 1 for a non-informative prior probability distribution. Deductively, we find h i (θ) = 1.   We aim to predict u 5 's interest concerning item i 3 . Knowing the observed rating assigned by user u 5 on i 3 , the purpose is to estimate using our method the u 5 's preference degree on item i 3 . Tab. 2 and Tab. 3 show Likelihood probabilities VOLUME 8, 2020 according to user-based and item-based approaches respectively. In Tab. 2, using Equation (5), the Likelihood probability on i 3 is computed as follows: Following the item-based approach in Tab. 3, using Equation (21), the Likelihood probability of the u 5 's interest is computed as follows: Tab. 4 shows the Bayesian predictive probability with which user u 5 rates item i 3 following the user-based, item-based and the hybrid approaches. For instance, using Equation (18), the Bayesian predictive probability with which user u 5 rates item i 3 with the rating r = 1 following the user-based approach is estimated as follows: Using Equation (29), the Bayesian predictive probability with which user u 5 rates item i 3 with the rating r = 1 following the item-based approach is estimated as follows: Using Equation (32), the Bayesian predictive probability with which user u 5 rates item i 3 with the rating r = 1 following the hybrid approach is estimated as follows: Regarding Tab. 4, it can be observed that the rating value r = 2 by user u 5 on item i 3 shows the highest probability. In other words, the rating value r = 2 is the predicted rating following our method. This estimation perfectly matches with the observation in Tab. 1 concerning u 5 's interest about i 3 as he really assigned rating r = 2 to item i 3 in Tab. 1.

G. BIHRS RECOMMENDATION UNDERSTANDING
Our BIHRS model is a hybridization that combines the userbased approach hereafter called User-based BIHRS and the item-based approach hereafter called Item-based BIHRS.
Unlike baseline model-based CFs, our approach is realistic and easier to understand thanks to the prior splitting into clusters of users and items. Indeed, following a K-Nearest Neighbors (KNN) reasoning [7], users as much as items necessarily belong to a cluster. In addition, our model integrates a priori information such as the items' popularity and the most frequent users' needs. Indeed, the prior probability distributions h u ( ) and h i ( ) model a priori intuitions that can improve the quality of the estimate. Faced with a multitude of items, users only rate a small number of them. The prediction is therefore performed to estimate the missing features.
Following the User-based BIHRS approach, the recommendation is understandable as being based on the probability with which a user u would rate an item i given the D set of observations. In other words, a user u is most likely to like items that have a positive and high popularity with other users of the same clusters as u. If in addition to the D set of observations, a priori information referring to the item reputation are known, then our model integrates these information in order to refine the estimate.
Regarding the Item-based BIHRS approach, the recommendation is explainable as being based on the probability for an item i to be liked by a user u given the B set of observations. In other words, the item i is most likely to be liked by users who previously liked similar items belonging to the same cluster. If in addition to the B set of observations, a priori information related to the reputation of the most recurrent users' needs are known, then, our model includes these data in order to improve the prediction quality.
The hybrid BIHRS approach is therefore explainable as being based on the probability with which a user u would like an item i given the evidence relative to users belonging to the same cluster and having previously liked the item i, and liked items similar belonging to the same cluster as i.
The next section presents the experimentations and results.

IV. EXPERIMENTATIONS AND RESULTS DISCUSSION
In this section, we assess our method performances comparatively to other recommendation methods. Experiments are conducted on open real-world datasets from MovieLens, 1 Ciao 2 , Epinions 2 and Flixster. 2 The features of those datasets are specified in Tab. 5.
The next subsection details the experiment process.

A. EXPERIMENTS SETUP
Experiments are conducted on a computer hosting a processor of type Intel Core i7 (2.4 GHz) with 16 GB RAM, running Windows 10 Operating System. The free-available CF4J framework [54] has been used to implement our algorithm. We have used Netbeans version 8.2 as Java development environment.
The used datasets host invocations of 3.900 movies by 6.040 users generating 1.000.209 ratings regarding the MovieLens1M dataset and 58.000 movies invocated by 280.000 users generating 27.000.000 ratings following the MovieLens10M dataset. In Flixster dataset, 147.612 users rate 48.794 items and generate 8.196.077 ratings; while 40.163 users interact on 139.738 items via 664.824 ratings in Epinions dataset. Ciao dataset hosts 7.375 users which rate 99.746 items and generate 278.483 ratings. Each rating is a value ranging from 1 (referring to a poor user satisfaction) to 5 (meaning a high satisfying user experience).
For each dataset, a rating matrix is built. The rating matrix is split into test data and training data. We define the data sparsity as the density of the data matrix. It describes the proportion of data used for the recommendation process. The matrix is progressively dug to sketch the data sparsity. For this purpose, the matrix sparsity also known as the matrix porosity intensity ranges from 20% to 80% in steps of 20%. For instance, a matrix sparsity set to 20% means that datasets are divided into 20% of test data and 80% of training data.
The next subsection shows the evaluation metrics used to assess our method performances.

B. EVALUATION METRICS
The BIHRS's prediction accuracy is assessed using key indicators namely the Mean Absolute Error (MAE) and the Root Square Mean Error (RSME). The Normalized Discounted Cumulative Gain (NDCG) is additionally used to assess the ranking accuracy. Low values of MAE and RSME express a high prediction accuracy. MAE and RSME values are 1 https://grouplens.org/datasets/movielens/ 2 https://www.librec.net/datasets.html computed as follows: where r ui is the original rating andr ui the approximated value; N is the number of recommendations and S rec is the set of recommendations. NDCG is widely used to assess the ranking accuracy and is computed as follows: where IDCG N and DCG N respectively refer to the Ideal Discounted Cumulative Gain and the Discounted Cumulative Gain of Top-N recommended items. DCG N is computed as follows: where rel j is the rating associated to the item ranked at position j. A high NDCG N means a high-accurate ranking. The recommendation quality is assessed using the precision and the recall indicators. The high precision and recall values express a high recommendation quality. The recall indicator measures the proportion of items correctly recommended comparatively to the number of items expected to be recommended. Meanwhile, the precision indicator measures the proportion of items correctly recommended comparatively to the number of items recommended. The precision and recall measures are computed as follows: where S is the set of recommended items and E the set of items expected to be recommended.

C. RESULTS AND ANALYSIS
The datasets that we use do not provide a priori information such as the movies' popularity or the consumption habits of a users' majority. For this reason, for non-informative prior probability distributions, their hyperparameters have been set to 1.
In the following points, BIHRS's performances are evaluated comparatively to other recommendation methods.

1) PERFORMANCES ANALYSIS
Referring to the distribution of users and items in clusters, our model is based on a reasoning inspired by the KNN algorithm. Furthermore, our method is a precise model-based CF performing an understandable prediction. For this reason, its VOLUME 8, 2020 performances are evaluated comparatively to neighborhoodbased methods also known as memory-based approaches, and to model-based methods. Those methods are hereafter presented: • IPCC is a memory-based method that lays on the inter-item similarities computation [12] using Pearson Correlation Coefficient (PCC). In this method, a neighborhood defined as a collection of items having been rated by the same users is built. Thereafter, the predicted rating is computed as a weighted mean of ratings over items belonging to the neighborhood.
• UPCC is a memory-based method that lays on the interuser similarities computation [8] using PCC. In this approach, a neighborhood defined as a set of users having rated the same items is built. Thereafter, the predicted rating is computed as a weighted mean of ratings assigned by users belonging to the neighborhood.
• Spearman-based CF is a memory-based method that performs inter-user and inter-item similarities computations [15] using Spearman Rank Correlation Coefficient (SRCC). Afterwards, the prediction is made based on user and item approaches.
• Bayesian Non-Negative Matrix Factorization (BNMF) based method is a model-based method. It is a probabilistic recommendation approach based on a nonnegative variant of matrix factorization [20]. This approach decomposes the rating matrix into nonnegative latent features matrices. Users and items are divided into groups in order to increase the efficiency of the matrix factorization. The BNMF method is parametrized as follows: -The degree of evidence needed to the users' ratings inference is set to 4. -The number of iterations to the algorithm convergence is set to 150. -The number of latent factors is set to 12.
• Probabilistic Matrix Factorization (PMF) based method is a model-based recommendation method built on the baseline matrix factorization following a probabilistic approach [29]. This method decomposes the highdimensional rating matrix into low-dimensional latent features matrices. Thereafter, the Gradient optimization method is performed to determine the optimal lowdimension matrices. The PMF method is parametrized as follows: -The number of iterations for the convergence of the algorithm is set to 150. -The step size of an iteration is set to 0.05. -The number of latent features is set to 10.
• Naive Bayesian Collaborative Filtering (NBCF) method is a recommendation approach based on the Naive Bayes classifier in which uniform probability distributions are used and therefore simplify the modeling of interactions between users and items [35].

2) ASSESSMENT OF THE PREDICTION ACCURACY
Following Tab. 6, the prediction quality of our method is compared to that of the existing model and memory-based recommendation methods. It can be observed that the prediction accuracy of our method outperforms that of competing model and memory-based approaches. Indeed, MAE and RSME performances of our BIHRS model are lower than those of IPCC, UPCC, Spearman-based CF, NBCF, PMF and BNMF. This effectiveness of our method is due to the fact that our approach is built on a model that learns from users' and items' observations to thereafter infer users' behaviors based on realistic considerations such as users' intuitions and items' features. Furthermore, our approach shows a higher NDCG trend compared to that of other recommendation methods. It means that regarding the ranking accuracy, our proposal is better than other methods. In addition, it can be observed that the matrix sparsity affects globally the performances of each method indicated in Tab. 6 due to the fact that the increase of matrix sparsity rate induces a decrease in training data that feed each recommendation approach. However, despite that negative impact, our approach outperforms other methods and thus highlights its data sparsity resiliency capacity. Compared to model-based recommendation methods, Fig. 3a depicts the MAE performance of our proposal from the usage of the MovieLens1M dataset. It can be observed that MAE trends of our method and its two variants namely User-based BIHRS and Item-based BIHRS are the lowest, meaning a high prediction accuracy compared to that of other model-based methods. This prediction accuracy improvement is explained by the fact that by modeling plausible intuitions or a priori information, our method includes more realistic aspects in the prediction model than other modelbased methods. Fig. 4a depicts the MAE performance evaluation from the usage of the MovieLens10M dataset. It can be observed that the MAE trend of our method is globally improved from 0.7 to 0.692 on average. This is due to the fact that the MovieLens10M dataset hosts more ratings than the MovieLens1M dataset. Hence, by using the MovieLens10M dataset, many more observations are involved in the inference process to refine the prediction. Once more, our method shows a MAE performance better than that of other modelbased approaches. In Fig. 5a, 6a and 7a, our method performances are shown using Flixster, Ciao, and Epinions datasets. It can be observed that MAE performances of our method are lower and therefore better than those of other methods.
From the usage of the MovieLens1M dataset, Fig. 3b depicts the RSME performance of our approach compared to model-based recommendation methods. It can be observed that our method and its two variants show a lower RSME trend meaning that our approach prediction accuracy is better than that of other model-based methods. Fig. 4b shows the RSME performance from the usage of the MovieLens10M dataset. It can be noted that thanks to more data in the dataset, the RSME performance of our method is roughly improved from 0.857 to 0.845 on average. Hence, by its RSME trend, our method outperforms other model-based approaches. In Fig. 5b, 6b and 7b, our method performances are observed using Flixster, Ciao, and Epinions datasets. It can be observed that RSME performances of our proposal are lower and therefore better than those of other methods.   From the usage of the MovieLens1M dataset, Fig. 3c shows the NDCG performance of our proposal compared to modelbased recommendation methods. It can be observed that our method shows a higher NDCG trend meaning a ranking accuracy higher than that of other methods.
Exploiting the MovieLens10M dataset, Fig. 4c shows the NDCG improvement of our proposal and its variants compared to other methods. Indeed, it can be highlighted a NDCG increase of 2.5% on average compared to other model-based methods. In Fig. 5c, 6c and 7c, our method performances are shown using Flixster, Ciao, and Epinions datasets. It can be observed that NDCG performances of our approach are higher and therefore better than those of other methods.

3) EVALUATION OF THE RECOMMENDATION QUALITY
According to Tab. 7, the recommendation quality of our method is evaluated comparatively to that of the competing memory-based recommendation methods. It can be observed that, for each method mentioned in Tab. 7, the precision is globally affected by the increase in the number of recommendations. It is explained by the fact that the increasing number of recommendations induces the addition of low prediction accuracy items in the recommendations list and consequently affects the recommendation precision. The recall trend of all methods increases with the number of recommendations because a significant recommendation number increases the possibility of recommending expected items. Despite this fact, the recommendation quality of our method VOLUME 8, 2020  remains better than that of memory-based approaches (IPCC, UPCC, and Spearman-based CF) since the precision of our BIHRS method is the highest. Indeed, referring to Tab. 7, the recommendation quality of our method is better than that of memory-based approaches since the precision and recall trends of our BIHRS method are higher than those of IPCC, UPCC and Spearman-based CF. Now, we compare our method to other model-based methods. Fig. 3e, 5e, 6e and 7e depict the recommendation precision of our proposal using MovieLens1M, Flixster, Ciao and Epinions datasets. The precision trend is affected by the increase in the number of recommendations due to lowaccurately predicted items added to the set of recommendations. However, it can be observed that the precision shown by our method is better than that of other model-based methods. This precision improvement is justified by the fact that our model integrates more features in the prediction process than other methods.
From the usage of the MovieLens10M dataset, Fig. 4e highlights the recommendation precision of our proposal. It can be observed that our method is scalable because the BIHRS's precision is higher than that of other model-based methods, even in the case of large datasets. Fig. 3d, 5d, 6d and 7d highlight the recall trends of our proposal compared to other model-based recommendation methods using MovieLens1M, Flixster, Ciao and Epinions datasets. It can be observed that our recall performance method is improved compared to other methods.
Exploiting the MovieLens10M dataset, Fig. 4d shows the recall performance of our proposal which is slightly enhanced compared to other model-based recommendation methods. In addition, compared to BIHRS's recall trend obtained by using the MovieLens1M dataset, it can be observed that the scalability of our method is highlighted because the BIHRS's recall trend obtained by using the MovieLens10M dataset is increased of 3% on average. Fig. 8 shows the computational resources consumption of each model-based methods. In Fig. 8a, it can be observed that our method shows a competitive execution time slightly higher than that of NBCF and BNMF methods but lower than that of PMF for the large MovieLens10M dataset. The relatively low execution time of the BIHRS method highlights the scalability of our proposal for large datasets. For Ciao and Epinions datasets, our method's time execution is slightly higher but it is offset by significative recommendation performances of BIHRS compared to other model-based methods. Fig. 8b and Fig. 8c show that our proposal consumes more Central Processing Unit (CPU) and memory resources than other methods. It is due to the complexity of computed non-uniform probability distributions. However, it has been proved that our method shows an increased prediction accuracy (see Fig. 3, 4, 6, 7, 5) and an enhanced recommendation quality (see Tab. 3 and Tab. 4) compared to other methods. For this reason, our method appears as a valuable trade-off between a high-accurate recommendation and an affordable computational cost.

5) THREATS TO VALIDITY
In our proposal, the recommendation is performed on datasets used to model users' tastes in the training phase. Experiments have been conducted on datasets with significant matrix density and in which the number of items jointly rated by users is significant. Consequently, the evaluation of users' tastes similarity is accurate since users express their interest in the same items. In the case of datasets in which users do not jointly rate items, the assessment of users' preferences proximity could be corrupted and therefore affect the prediction accuracy due to clusters hosting dissimilar users. In other words, the recommendation precision could be threatened in the case of datasets in which users do not jointly rate a significative number of items. In addition, users' and items' cluster sizes have to be carefully chosen since small cluster sizes could affect the recommendation accuracy as probability distributions are computed on clusters, while big cluster sizes could noise the prediction process with additional irrelevant information. Knowing that the number of clusters is set at the initialization of the clustering process, it should be chosen with caution since a small number of clusters could induce big cluster sizes in which there will be noisy data; while a high number of clusters could induce small cluster sizes in which data will be sparse. However, our proposal remains general since experimentations have been conducted on popular and widely adopted datasets in the recommendation field.

V. CONCLUSION AND PERSPECTIVES
In this paper, we have addressed the item targeting problem by proposing a hybrid model-based recommendation approach that is built on the Bayesian inference in order to predict users' interests. The proposed method excels through its easy-explainability like that of memory-based CFs, and its high-precision like model-based CFs. The Bayesian prediction model developed in our proposal employs non-uniform probability distributions to model intuitions, and a priori information related to items' features and users' behaviors. The proposed method merges a user-based approach and an item-based approach to alleviate the data sparsity, and therefore enhance the prediction accuracy by exploiting all various aspects raised by user-item interactions. Afterwards, the MAP principle helps to predict user's interest in a given item by computing the rating that maximizes the Bayesian predictive distribution on the unknown observable. Experiments have been conducted on free-available datasets and show that the proposed method outperforms others (UPCC, IPCC, Spearman-based CF) regarding the MAE and the RSME performances. Besides, the results obtained highlight the improved prediction accuracy of our proposal, the ranking accuracy of the proposed method through its valuable NDCG performances. Moreover, the effectiveness of our approach expressed by its increased prediction accuracy and its improved recommendation quality has been shown comparatively to other model-based recommendation methods (PMF, NBCF, BNMF).
Given the rapid growth of Big Data, an extension of our approach could be developed by the integration of other sources of information. Indeed, the high intensity of users' activity on social networks offers increased possibilities and rich sources of information to embed in the recommendation process. For this purpose, to enhance the prediction in the future, we plan to mine users' opinions, and contents consumed by users on social networks to implicitly extract their feedbacks and tastes. In addition, the recommendation quality could be improved by using some methods such as listwise, pairwise and pointwise ranking techniques to learn to accurately rank items recommended to users; and therefore refine the recommendation precision since users will receive a list of relevant and ranked items at the end of the process. We also plan to develop an improved data sampling strategy that selects only the most informative training instances to involve in the prediction process. Consequently, that sampling strategy will contribute to reduce the computational cost and speed up the recommendation process.