Variational Autoencoder-based Hybrid Recommendation with Poisson Factorization for Modeling Implicit Feedback

Hybrid recommendation, which is based on collaborative filtering and supplemented with auxiliary content information, is being actively researched due to its ability to overcome the cold-start problem. Many proposed hybrid methods make recommendations using Gaussian distribution-based collaborative filtering even though they handle variables that tend to be non-Gaussian, such as the number of interactions. We present a method that uses a hybrid recommendation framework based on collaborative filtering that models the number of interactions as a Poisson-distributed and variational autoencoder-based content information generation process that shares latent variables with collaborative filtering. As a prior for the shared latent variables, we use a gamma distribution, which is a conjugate prior of a Poisson distribution. An implicit-derivative-based reparameterization trick enables the use of a gamma distribution in a variational autoencoder. The latent variables in the generative model are inferred using the stochastic gradient variational Bayes approach, taking the number of interactions corresponding to users and items and content information as input. In accordance with the inference, unobserved interactions between users and items are predicted for recommendation. The use of a neural-network-based generative model for content information enables the framework to handle various types of content information. Experimental results show that the proposed method utilizes content information effectively for predicting the number of interactions and that it should aid in overcoming the cold-start problem.


I. INTRODUCTION
The growth of web services and applications has enabled access to a huge variety of content such as articles, music, movies, and games. Recommendation systems for finding items of interest for users from the huge amount of information have become an essential technology for various services. In addition, the number of interactions between users and items, such as the number of views and play count, is becoming increasingly important as a key performance indicator for not only service providers but also content creators, especially for advertising and music/video streaming services. The number of interactions with an item is strongly related to the interest in the item itself. Hence, predicting the number of interactions with an unknown item by a user is closely related to recommendation problems.
One of the most representative approaches for recommendation systems is based on collaborative filtering (CF) [1]- [6]. CF-based recommendation systems typically use a useritem matrix composed of feedback from each user for each item. The feedback can be explicit feedback (e.g., rating) or implicit feedback (e.g., number of interactions) [7]. Explicit feedback requires the user to provide additional input while implicit feedback requires only that the service be used. In either case, a matrix decomposition-based technique is usually used to fill the matrix elements with low-dimensional variables describing the characteristics of the users and items.
An alternative approach is the content-based (CB) approach. CB-based recommendation systems predict matrix elements from content information such as a description of the item itself or its metadata [8]. Generally, if each item has been interacted with by a certain number of users, the CF approach works well [9]. However, for an item that has not received any user feedback, the CF approach does not work well due to the lack of observations (the "cold-start problem" [10]). Hybrid approaches combining CF and CB, which can overcome this weakness, are thus being actively researched [11]- [15]. By learning the correspondence between the latent expression of items and the content information, the system can acquire a latent representation even for an item that has never been selected.
As noted above, the number of interactions is an example implicit feedback variable. It is an element of the matrix used for CF input and is often modeled as a Poisson or multinomial distribution in statistics. However, the major approaches used for deriving implicit feedback treat interactions as if they are Gaussian distributed, in other words, deriving them by least squares error approximation [2], [13], [14].
In this paper, we present a hybrid recommendation method based on building a CF model that models the number of interactions as Poisson distributed. The use of a neural network-based generative model for content information enables the method to handle various types of content information.
A list of the main contribution of this paper is summarized as follows: • With content generation model using VAE based on a gamma distribution, which is a conjugate prior distribution of Poisson distribution, we extend of Poisson distributed collaborative filtering to hybrid model. • Experimental results show effectiveness of our proposed method for implicit feedback. • We show that the framework comprising the proposed method is applicable to a broader range of collaborative filtering, including state-of-the-art methods, and can be extended to a hybrid recommendation algorithm that handles the number of interactions. The rest of this paper is structured as follows. First, related works are introduced in section 2, then section 3 provides a detailed description and discussion of the extension of the proposed method, and the experimental results are discussed in section 4. Finally, section 5 provides concluding remarks.

B. IMPLICIT FEEDBACK REPRESENTATION
Weighted matrix factorization (WMF) [2] is often used for CF for implicit feedback. Given a matrix consisting of a number of interactions for each user and each item, rather than decomposing the matrix directly, WMF obtains a decomposition representation from a binary matrix. The elements in the matrix corresponding to the non-zero elements in the input matrix are set to 1 along with minimizing the objective function, which is weighted on the basis of the number of interactions. This is in contrast to the common approach in statistical modeling of describing a generative model by using a Poisson or multinomial distribution for count variables. While WMF is widely used for CF of implicit feedback [13], [14], it does not provide an estimation of the number of interactions. Therefore, there has been research focusing on the modeling of the number of interactions for recommendation purposes. For example, Poisson factorization (PF), deeplearning-based multinomial CF [5], [19], and collaborative competitive filtering [16] have been investigated.

C. CF + CB HYBRID APPROACH
As mentioned above, hybrid approaches combining CF and CB are being actively studied. Several studies have extended CF to hybrid recommendation by incorporating content information [11], [13]- [15], [25]. In particular, collaborative topic Poisson factorization (CTPF) [25], a typical example of using a Poisson distribution for CF, uses a Poisson distribution not only for CF but also for topic modeling, which is content information, and shares the latent variables generated from a gamma distribution with both CF and topic modeling, thereby constructing a generative model that can be derived using the standard variational Bayesian method [26] without losing conjugacy.
Also being actively studied is the use of information incorporation combined with deep learning for supporting various types of content information such as images and acoustic features (Fig. 1). For example, a collaborative variational autoencoder (CVAE) [12], a generative model for hybrid recommendation, shares latent variables between content corresponding to each item itself and its components for CF. Similar to the operation of a variational autoencoder (VAE) [27], a CVAE generates content information from latent variables and simultaneously biases the item components used for CF by adding latent variables.
Except for CTPF, which can only handle simpler models for content information, collaborative filtering in most hybrid recommendations consists of Gaussian-based objective functions. To the best of our knowledge, the proposed method is the first hybrid recommendation method that uses Poisson distribution for implicit feedback and neural networks for handling diverse content information.

III. PROPOSED METHOD
For a more accurate fitting to the number of interactions, we use PF for CF and construct a hybrid recommendation framework with a probabilistic formulation that, like a CVAE, jointly describes a VAE-based content information generation process that shares latent variables with CF. As a prior for the shared latent variables, we use a gamma distribution, which is a conjugate prior of a Poisson distribution. The use of an implicit-derivative-based reparameterization trick [28], [29] enables the use of a gamma distribution. The latent variables in the generative model are inferred using the stochastic gradient variational Bayes (SGVB) method [27], taking the number of interactions corresponding to users and items and content information as input.
An overview of the proposed method is shown in Fig. 2. Next, we describe the problem setup, generative model, and algorithm used for inference and estimation.

A. PROBLEM SETUP
We formulate a recommendation problem for implicit feedback. The user index and item index are denoted as i ∈ {0, ..., I − 1} and j ∈ {0, ..., J − 1}, respectively. The implicit feedback matrix is denoted as Y ∈ N I×J ; element y i,j corresponds to the number of interactions between user i and item j. The content information corresponding to item j is denoted as x j . The recommendation problem is defined as how to acquire item subset J ′ i ∈⊊ {0, ..., J − 1}, which includes items with which user i is likely to interact, given implicit feedback matrix Y and content information X = {x 0 , ..., x J−1 }. The notations used in the proposed method are shown in Table. 1.

B. GENERATIVE MODEL
Here we explain the processes for generating implicit feedback and generating content information, starting with their Shape and rate parameters of gamma distribution drawing z j,k C i,j Parameter for neglecting fitting to unobserved elements (i.e., zero elements) Nonlinear function that is parameterized by ϕ dec to output the parameters of p(x j |z j ) with z j as input Nonlinear functions that is parameterized by ϕ to output the rate and shape parameters of posterior distribution q(z j ) with x j as input latent factors. For each user i, each element of the K-dimensional latent variable u i is drawn from a gamma distribution, where a u shp and b u rte are hyperparameters, and Gamma(a, b) is the gamma distribution characterized by shape parameter a and rate parameter b. For each item j, each element of the K-dimensional latent variables v j and z j is drawn from a corresponding gamma distribution, where a v shp , b v rte , a z shp , and b z rte are hyperparameters. Latent variable mapping to content information is represented by z j and is used to generate content information.
In other words, content information x j is drawn from probabilistic distribution p(x j |z j ), which is parameterized by the outputs of nonlinear function f ϕ dec (z j ) (e.g., a neural network), where ϕ dec is the set of parameters of f ϕ dec (·).
Such a setup is often used in VAEs; the form of p(x j |z j ) depends on the nature of the content information. Here, we discuss a specific case, music piece recommendation, in which the play count as implicit feedback and the content features represented as a fixed-length vector are provided for each piece. Our method is applicable not only to a fixedlength feature vector but also to images, texts, and their combinations by using a VAE as the generative model for x j . The generative process for F -dimensional feature x j and its parameters is represented as where x j ∈ R F , and µ j ∈ R F . VOLUME 4, 2016 Next, we describe the generative process for implicit feedback, namely the count of user-item interactions. As mentioned above, there are two latent variables corresponding to items, z j and v j . The item component is represented by their sum, z j + v j . The feedback matrix is thus drawn from a Poisson distribution characterized by the inner product between user and item latent variables, where C i,j is a parameter for neglecting fitting to unobserved elements (i.e., zero elements), similar to that in collaborative topic regression [11] (e.g., C i,j = 1 if y i,j ̸ = 0 and C i,j = c otherwise).

C. ALGORITHM FOR INFERENCE AND ESTIMATION
The standard variational inference algorithm for generative models dealing with nonlinear distributions involving neural networks such as p(x j |z j ) is intractable, so the SGVB method is often used. Maximizing the evidence lower bound (ELBO), is equivalent to minimizing the Kullback-Leibler divergence between p(u, v, z|X, Y) and the posterior q(u, v, z), where q(u, v, z) is the joint posterior of the latent variables, and E q [·] is the posterior expectation.
We applied mean-field approximation, assuming that the posterior is factorized as We assumed that q(z j ) is represented as a gamma distribution parameterized by the output of the neural network with x j as input a ϕ (x j ) and b ϕ (x j ), where ϕ is the set of parameters consisting of neural networks. Namely, In maximizing the ELBO, the terms affected by z j are The problem in calculating the ELBO is calculating the posterior expectation of q(z j ). By using approximation with sampling, we obtain where z l j ∼ Gamma(a ϕ (x j ), b ϕ (x j )), and l is the sample index (l = 1, ..., L). In the SGVB method, the derivative of the sampled values must be calculated in order to minimize the objective function with respect to ϕ and ϕ dec .
Although a conventional VAE obtains derivatives of samples from Gaussian distributions by using a reparameterization trick, various methods have been developed to calculate derivatives for samples from various distributions in order to accommodate different applications [28], [30]. For a gamma distribution, an implicit reparameterization gradient [28] can be used. Using implicit differentiation with a cumulative distribution function enables an unbiased estimator of the derivative of the samples to be calculated for any continuous distribution by using a cumulative distribution function that can be calculated numerically.
Maximum a posteriori (MAP) estimation can be performed for u i and v j , whereas inference using the SGVB

Algorithm 1
Poisson Hybrid Recommender Input: a dataset of implicit feedbacks Y a dataset of content information x Output: prediction of implicit feedbacks Y * Procedure: Initialize parameters of neural networks ϕ and ϕ dec and latent variables corresponding to users and items u and v. while not converged do: Update ϕ and ϕ dec by using SGVB method to minimize (12) Estimate posterior expectation Update u in accordance with (13) Update v in accordance with (14) end while Estimate Y * in accordance with (15) method is necessary only for z. The objective function for MAP estimation is where a u shp and b u rte and the update equations for u and v are derived using the auxiliary function method [31]: We note that the hybridization does not increase the computational complexity in the collaborative filtering. Comparing the update equations of the proposed method with those of conventional Poisson factorization, the only difference is the addition of the term E q [z j,k ′ ]. Therefore, the computational complexity of each update equation is the same with Poisson factorization, O(JK) for (13) and O(IK) for (14), and since they are updated IK and JK times for u i,k and v j,k respectively, the computational complexity per iteration is O(IJK 2 ). Although MAP estimation was used for fair comparison with prior research and simplification of the model, variational Bayesian iteration equations can also be derived [25], [32].
As z j , u i , and v j are mutually dependent, inference and estimation are performed by updating SGVB method to decrease (12) and by iteration equations (13) and (14) in turn (Algorithm 1). After convergence, the estimation of y i,j can be calculated using From y * i,j , the count of user-item interactions is predicted, and the Top-N items for each user is presented.

D. VAE ARCHITECTURE FOR GAMMA DISTRIBUTION
In the VAE architecture of the proposed method, f ϕ dec (·), a ϕ (·), and b ϕ (·) are composed by a neural network such as a multilayer perceptron network, and the structure can be flexibly selected in accordance with the content information, x j . For example, convolutional networks [33] and recurrent networks [34]- [36] are respectively used for image data and series data such as text [37]- [39].
Since a ϕ (·) and b ϕ (·) are parameters of a gamma distribution, they have to have positive values. We meet this requirement by applying an exponential function to the outputs of the neural network corresponding to the two parameters. This is similar to the procedure used for adjusting the variance parameter of a VAE with a Gaussian distribution.
In contrast, there is no restriction on f ϕ dec (·). However, since its input, a posterior sample of z j , is a component of the number of interactions, and since it is a sample from a gamma distribution, it has positive values and thus can have non-Gaussian properties. In general, preprocessing input data to make them Gaussian is beneficial in terms of stability and performance. Therefore, instead of using z j , we used log z j as input for f ϕ dec (·). This can be interpreted as a special case of the Box-Cox transformation [40] in which log z j is transformed into data that are more Gaussian.

E. THE GENERAL FORM FOR GAMMA-POISSON HYBRID RECOMMENDATION ARCHITECTURE
We have described the proposed method using a Poisson factorization-based CF up to this point. In this subsection, we discuss its extension to a broader class of CF. For simplicity, we do not consider regularization terms for u i and v j such as prior distributions. We assume that all functions treated in this subsection are differentiable.
First, we consider a general form of objective function of CF based on matrix factorization. It is generally described as follows: where d(·, ·) is a metric function and α Ωi,j (·, ·) is a function that is parameterized by elements of a parameter set Ω i,j and linear for each input. In the case of Gaussian-based matrix factorization for example, d(·, ·) is a squared error, α Ωi,j (·, ·) is inner product, and Ω i,j = ∅ for all pairs of (i, j). In addition, many of the matrix factorization-based CF with technique such as weight reduction for unobserved elements [11] that are used in (6) and removal of exposure bias [17], [20] can be represented in this form. Second, we consider CF using neural networks. To begin from a simple extension, we consider that the latent variables u i and v j are the outputs of neural networks with the i-th row and j-th column of Y, represented as y i,: and y :,j , as inputs. Namely, latent variables u i and v j in (16) are replaced with u(y i,: ) and v(y :,j ) respectively, i.e., i,j d(y i,j , α Ωi,j (u(y i,: ), v(y :,j ))), where u(·) and v(·) are nonlinear functions (e.g., neural network). This is the form of deep matrix factorization [4].  In addition, another common architecture for collaborative filtering using neural networks is by using only y i,: or y :,j as input. It can be considered as a special case of (17), where α Ωi,j (u(y i,: ), v(y :,j )) is regarded as a function that takes only u(y i,: ) or v(y :,j ) as input. Many of neural networkbased collaborative filtering takes this form [3]- [5], [18], [19].
Finally, we extend the CF discussed above to a hybrid version with our framework. We add the latent variable of content z j generated from a gamma distribution as in (3) as follows: i,j d(y i,j , α Ωi,j (u(y i,: ), z j + v(y :,j )).
For example, let us assume where s 1 and s 2 are input variables for d(·, ·). As (20) is called I-divergence and its minimization is known to be equivalent to Poisson likelihood maximization [41], under these conditions, (18) is equivalent to the proposed method described in this section. As (4) and (5), or as discussed in subsection III-D, z j is drawn from a gamma distribution and a VAE architecture between z j and content information x j can be flexibly configured in accordance with properties of x j .
Since all of the functions that compose them are differentiable, we can at least employ stochastic gradient descent method [42] to obtain solutions that perform local minimization of the objective function.
As mentioned above, there are CFs that do not take v(y :,j ) as input, which means that there is no term to add z j in (19). In such cases, i.e., methods that take u(y i,: ) as input can be regarded as methods that take v(y :,j ) as input due to the symmetry of y :,j and y i,: in most cases, and thus our framework is applicable.
Using I-divergence for d(·, ·) and adding z j drawn from gamma distribution-based VAE, the gamma-Poisson hybrid recommendation architecture can be applied to various CF, not limited to these based on matrix factorization.

A. DATASET SPECIFICATIONS
We evaluated our proposed method using both simulation and real-world data.

1) Simulation data
To obtain simulation data, we simulated the number of useritem interactions for 1000 users and both 1500 and 1000 items. The variation in the number of items was intended to evaluate the influence of the number of items for learning the correspondence between content information and latent variables. We represented the user factor as K-vector u ′ i for each user, the item factor as K-vector v ′ j for each item, and the content factor as K-vector z ′ j for each item.
The content factor is observed and utilized as the content information. In addition, for each user and item, the exposure variables were represented as where p is a parameter, and Bernoulli(p) is the Bernoulli distribution characterized by p. We then simulated count y u,i as (25) where the constant λ ∈ [0, 1] is used to control how much the content information contributes to the play count. Experiments were conducted for K = 10, p = 0.25, and λ ∈ {0.2, 0.4, 0.6, 0.8}.

2) Real-world data
To obtain real-world data, we used the MSD-A dataset [14], which includes users and songs that are a subset of those included in the Million Song Dataset [43], and the Echo Nest Taste Profile Subset, the official user dataset of the Million Song Dataset [44]. The MSD-A dataset provides richer information with 2048-dimensional features corresponding to songs than that for the entire dataset by embedding text information, including biographies of artists and acoustic features of audio samples obtained from the Web. We took information only for users who had listened to at least 200 songs and only for songs listened to by at least 1000 users. This resulted in a dataset comprising 881 users and 8866 songs.

B. EXPERIMENTAL CONDITIONS
We randomly split the data into training, validation, and testing sets at ratios of 70%, 5%, and 25% (dense case) and at ratios of 25%, 5%, and 70% (sparse case) to simulate the cold-start case (lack of observations).
As a baseline, we used the probabilistic matrix factorizations (PMFs) based on Gaussian likelihood [45] and based on Poisson likelihood [31] as objective functions and used the CVAE [12] as a hybrid recommendation method. The PMFs respectively correspond to the CVAE and the proposed method without utilizing content information. Hereafter, the PMF based on Gaussian likelihood will be referred to as "GF," and the PMF based on Poisson likelihood will be referred to as "PF." The mean squared error (MSE) and the normalized discounted cumulative gain (NDCG [46]) with log counts as the relevance were used as evaluation metrics. NDCG is the sum of the relevance measures discounted in accordance with their rankings (DCG), normalized by the DCG of the ideal recommendation result (IDCG). The NDCG takes values from 0 to 1, with higher values indicating more positive recommendations.
To utilize the MSD-A dataset, we changed the content generation loss of the variational autoencoder in the CVAE from binary cross entropy to Gaussian likelihood (corresponding to (5) in the proposed method). We did this because we used continuous features of the content information in our experiment, whereas bag-of-words features [47], which are discrete, were used in the original work by Li and She [12].
For fair comparison, the CVAE and the proposed method had a common neural network structure (Fig. 3). The en-coders had three layers (respectively, 800, 400, and k), and the activations were ReLU [48], sigmoid, and linear in that order. The decoders also had three layers (units: 400, 800, F ) and the same activations. Adam [49] was used as the optimization algorithm for the neural networks, with the learning rate set to 0.0001.
We performed a grid search for each hyperparameter with each method and compared the best results of each method. Early stopping with patient = 10 was used to determine convergence based on the likelihood of the validation set [50].

C. RESULTS AND DISCUSSION
First, The performance of each method on the simulation data with 1500 items, J = 1500, is shown in Fig. 4: NDCG (top) and MSE (bottom); the sparse case (left) and the dense case (right). The vertical axis represents performance, and the horizontal axis represents λ, the parameter used to control the contribution of content information.
We compare the results for ordinary collaborative filterings, i.e., GF and PF. PF was superior in terms of both NDCG and MSE in both the sparse case and the dense case. Although content information is not utilized in both methods, there was a slight difference in the metrics when λ was changed. This difference, which is not the main point of the discussion, is attributed to the simulation data being expressed as a mixture of components that contribute and do not contribute to the content information. As λ approached the center (λ = 0.5), the number of observations increased slightly. MSE improved as the number of observations increased whereas NDCG worsened as the number of items that made up the ranking increased. This trend was observed for both PF and GF.
We compare the results for the two hybrid recommendation methods, i.e., the proposed method and the CVAE. In the sparse case, both NDCG and MSE improved as λ was increased for both methods. This means that the learning process was able to utilize the content information. It is also observed that the proposed method shows the best performance except for the MSE with λ = 0.8. In the dense case (right side of Fig. 4), the performances of both the proposed method with PF and CVAE with GF were almost the same for NDCG and MSE, regardless of the value of λ. This indicates that, in the dense case, there were enough observables to avert the need for content-derived supplementation. In addition, at least for this data set, using a Poisson distribution resulted in a better fitting than using a Gaussian one.
Next, cases with fewer item observations, The performance of each method on the simulation data with 1000 items, J = 1000, is shown in Fig. 5: Same as Fig. 4, NDCG (top) and MSE (bottom); the sparse case (left) and the dense case (right). The vertical axis represents performance, and the horizontal axis represents λ, the parameter used to control the contribution of content information. Since the trend in the dense case was not much different from that in the J = 1500 case, we put the scope on the sparse case.  Vertical axis represents performance, and horizontal axis represents contribution of content information, which is controlled by λ.
In the sparse case, one of the trend that differs from the case of J = 1500 is that GF was superior in terms of both NDCG and MSE. In addition, Although NDCG and MSE improved as increasing λ for both methods, the trend when λ was smaller was different between the Poisson (Proposed) and Gaussian (CVAE) approaches.The performance with the CVAE approach was worse than that with the GF approach when λ = 0.2, and the performance was almost the same when λ = 0.4. In contrast, the performance of the proposed method was almost the same as that of PF at λ = 0.2 and much higher at λ = 0.4. Going back to the sparse case in J = 1500 again (left side of Fig. 4), we can observe that there is a slight spread in the performance of the proposed method and CVAE when λ is small. This indicates that the content information can be better utilized by using a Poisson distribution.
The utilization of content information could have been affected by the scale invariance property of the Poisson distribution model. Fig. 6 shows plots of the training data with the target variable (play count) on the x-axis and the estimated variable on the y-axis for the CVAE (left) and the proposed method (right). The dotted lines represent the 95% confidence interval for each of the metrics (Gaussian distribution for CVAE; Poisson distribution for proposed method).
A Gaussian distribution has a constant variance regardless of its scale, whereas the variance of a Poisson distribution depends on its scale. We can observe these trends in the learning outcomes. That the variance is constant means that, for hybrid recommendation using a Gaussian distribution, the latent variable estimated from the content information must contribute to predicting values very strictly, even for relatively large values of target variables. This may have caused overfitting in learning the relationship between the content information and the target variables. For λ = 0.6 and 0.8, the proposed method performed better in terms of NDCG, and the CVAE performed better in terms of MSE. Note that, related to these trends, assuming a Gaussian distribution is equivalent to minimizing the MSE. This can be interpreted to mean that using a Gaussian distribution minimizes the MSE without allowing for large variance for large values of the latent variable, whereas a Poisson distribution allows for a large variance for large values and renders other values as important as well, resulting in a better performance.
Finally, we discuss application to a real-world dataset, MSD-A. As shown in Table 2, in both the sparse and the  dense cases, the performance of CVAE with content information was almost equal to or lower than that of GF, which is the same trend as that with a low content information contribution rate in the simulation data. In the real-world case, the play count of a music piece is affected not only by the content but also by many other factors such as marketing (e.g., advertisements) or the period of its availability. This means that the contribution of content information to the play count is not so strong. Even in such a case with weak contributing content features, the proposed recommendation method, a hybrid method using a Poisson distribution, extracted beneficial information from the content information and performed well, except for the dense case in which NDCG was almost equal to that of GF and CVAE. These results suggest that the proposed hybrid recommendation method utilizes content information effectively for predicting the number of interactions and is thus able to aid in overcoming the cold-start problem.

V. CONCLUSION
Our proposed hybrid recommendation framework uses Poisson factorization for collaborative filtering and jointly describes a generative model for content information that shares latent variables with collaborative filtering. The processes for generating content information and for generating implicit feedback are described in a probabilistic manner. The algorithm for inference and estimation is derived using the stochastic gradient variational Bayes method. Furthermore, we present that the framework comprising the proposed method is applicable to a broader range of collaborative filtering, including state-of-the-art methods, and can be extended to a hybrid recommendation algorithm that handles the number of interactions. Experimental results show that the proposed method utilizes content information effectively for predicting the number of interactions and should thus aid in overcoming the cold-start problem. VOLUME 4, 2016 IWAO TANUMA received a B.S. degree in science from the Tokyo University of Science, Tokyo, Japan, in 2010 and an M.S. degree in information physics & computing from the University of Tokyo, Tokyo, Japan, in 2012. He worked as a researcher in several laboratory of Hitachi, Ltd. R&D Group and Hitachi America, Ltd. R&D from April 2012 to November 2021. He is currently working as a backend engineer at DWANGO Co., Ltd. as well as working on a Ph.D. in the Dept. of Statistical Science, Graduate University for Advanced Studies, Japan. His research interests include unsupervised and semi-supervised learning based on probabilistic latent factor models as well as potential applications.
TOMOKO MATSUI (Senior Member, IEEE) received a Ph.D. degree in computer science from the Tokyo Institute of Technology, Tokyo, Japan, in 1997. From 1988 to 2002, she was a researcher in several NTT laboratories, focusing on speaker and speech recognition. From 1998 to 2002, she was a senior researcher in the Spoken Language Translation Research Laboratory, ATR, Kyoto, focusing on speech recognition. In 2001, she was an invited researcher in the Acoustic and Speech Research Department, Bell Laboratories, Murray Hill, NJ, working on identifying effective confidence measures for verifying speech recognition results. She is currently a professor at The Institute of Statistical Mathematics, Tokyo, Japan, working on statistical spatial-temporal modeling for various applications, including speech and image recognition. She received the Best Paper Award from the Institute of Electronics, Information, and Communication Engineers of Japan, in 1993. VOLUME 4, 2016