ES_Use: An Efficient Rating Prediction Method

In an online shop scenario, learning high-quality product embedding that captures various aspects of the product is important to improve the accuracy of user rating prediction. There is a lot of research about product embedding learning, for example, the side information which is the fusion of user feedback and the appearance of a product. However, because of the diversity of a product’s aspects, taking into account only its appearance as side information is not sufficient to accurately learn its embedding. In this paper, we present a matrix co-factorization method that employs information hidden in the so-called “also-viewed” or “also-bought” products, i.e., a list of products that have also been viewed or have also been bought by a user who has viewed a target product. To improve the accuracy of the rating prediction, our first step is to find out similar users. However, suppose the dataset is very large, e.g., if we have to deal with tens of millions of users’ data, the similarity calculation among users will be very time-consuming. For dealing with this problem, we use a compact binary sketch (i.e. ES, Even Sketch for user similarity estimation) to estimate user similarity. Our experiments demonstrate the superiority of our method in comparison with a state-of-the-art baseline in generating high-accuracy rating prediction.


I. INTRODUCTION
Triggered by the Netflix Prize [1] in 2009, which focused on predicting user ratings on movies based on past user feedback, many studies have been addressed on accurately predicting user ratings [2]. However, computing user similarities is a key building block in modern recommender systems. To do this, many works have tried their best to learn a low dimensional embedding of users and items simultaneously [3]- [5]. Moreover, high-quality item embedding yields higher accuracy in user rating prediction [6]. So, various information related to items is considered to learn better item embedding such as product reviews and product images [7]- [9].
In this paper, aiming at learn high-quality product embedding, we talk about the influence of product aspects for users in different product domains. In a real-world online shopping scenario, e.g., Amazon and Taobao, a user may review or rate the product, which he bought. However, the reviews or ratings are almost different for different customers.
The associate editor coordinating the review of this manuscript and approving it for publication was Dominik Strzalka .
What leads to these different results? Each user obviously has his or her preferences, which are mainly affected by products' inherent aspects such as appearance, functionality or specifications [10]. Moreover, the influence extent of each aspect may be different, in different product domains [10]. More specifically, all these phenomena are embodied by ''also-viewed'' or ''also-bought'' historic records [10]- [12], which indicate those products that have also been viewed or have also been bought by users when viewing or buying a target product. Generally speaking, users and products can be viewed as sets and set elements. So, for rating prediction, set similarity estimation is one of the key challenges. In this paper, we present a method (detailed in Section 1.2) named ES (Even Sketch for user similarity estimation), which is a compact binary sketch for estimating the Jaccard similarity of users.

A. MOTIVATION
For the sake of clarity, we first detail our motivation. A user may have also viewed a list of other pens when he finally placed an order for one pen. Besides, he has probably bought a bottle of ink. Similarly, a user probably buys a set of batteries after having bought a toy car, or to simultaneously buy shoe polish when he has ordered a pair of shoes. Generally speaking, a user may view some similar products before he finally places an order for a certain product, and he may buy some related products at the same time. As a result, when we shop online, we focus the attention on different aspects of products in different product domains.
After having purchased a product, the user may review and rate it from his/her perspective. So, the question is: how to accurately do user rating prediction? To achieve this, most works have considered the appearance of products particularly in visually-aware product domain, but the effort is unevenly in other domains [10]. [10] presented a matrix co-factorization method named Visual Matrix Co-Factorization (VMCF) to improve the user rating prediction accuracy, where the ''also-viewed'' products were considered but other attributes were overlooked, such as ''also-bought'' and ''bought-together''. In this paper, we address a rating prediction method named ES_USE (as shown in Fig. 1), which is a combination of three types of product attributes, i.e. ''also-viewed'', ''also-bought'' and ''bought-together''. Moreover, to effectively estimate the similarities among users in terms of different scale datasets, we present a compact binary sketch to compute the similarities among users (detailed in Fig.2), which is a variant of the Jaccard similarity. By means of this method, we can get a better accuracy but with much less computation and storage. Fig.1 indicates the high-level schema of our approach. Simply speaking, our method mainly consists of three steps. The first step in the process is to perform user similarity calculation. Then, based on the results of the previous step, we find similar users. In the end we leverage ''also-viewed'' and ''also-bought'' product records (specifically speaking, these products refer to that the similar users have also viewed or have also bought) to do rating prediction. As shown in Fig.1, we utilize Even Sketch (ES for short, detailed in Fig.2) to get user similarity, which is based on historical ''also-viewed'', ''bought-together'' and ''also-bought'' records (as shown in Fig.2a). After a real-world online shopping runs over a period of time, there may be a large number of also-viewed, alsobought and bought-together records in the repository. So, user similarity calculation is really time-consuming. Even Sketch is clearly up to solve this problem, and its construction process is shown in Fig.2b. The experimental results in Section 5.3 proved its effectiveness and accuracy.

B. OUR APPROACH
In this paper, we assume that each set S is summarized in a data structure D(S) of n' bits. To facilitate the understanding, we detail Even Sketch method with an example as follows.
When getting a target product, a user may have also viewed a list of the same kind of products. Besides, he probably buys some related products. For example, a skirt with product ID 0000031852 happened to strike Betty's fancy. However, before placing the order , Betty may have also viewed the other skirts  with product ID list of ''B00CEUWY8K, B004FOEEHC,  0000031895, B00BC4GY9Y, B003XRKA7A, B00K18-LKX2, B00EM7KAG6, B00AMQ17JA, B00D9C32NI,  B002C3Y6WG, B00JLL4L5Y, B003AVNY6I, B008UBQZ-KU, B00D0WDS9A, B00613WDTQ, B00538F5OK, B005-C4Y4F6'' (AV 1 for short), and she may have also  bought those products with ID list of ''B002R0FA24,  B00D23MC6W, B00D2K0PA0, B00538F5OK'' (AB 1 for  short). Similarly, for a target skirt, Sophia may have  also viewed ''0000031895, B00BC4GY9Y, B003XRKA7A,  B00K18LKX2, B00EM7KAG6, B00B0AVO54, B00E95LC-8Q, B00GOR92SO, B007ZN5Y56, B00AL2569W, B00B60-8000, B008F0SMUC, B00BFXLZ8M'' (AV 2 for short), and may have also bought ''B00D23MC6W, B00D2K0PA0, B007R2RM8W'' (AB 2 for short). Then, for Betty and Sophia, suppose considering the intersection of the ''also viewed'' or ''also bought'' records, there exist the same portion, i.e. two subsets of ''0000031895, B00BC4GY9Y, B003XRKA7A, B00K18LKX2, B00EM7KAG6'' and ''B00D23MC6W, B00D2K0PA0'', respectively. In fact, it is quite possible that Betty and Sophia both have bought a lot of products on a shopping website. In other words, there could be many same subsets on ''also viewed'' or ''also bought'' records between Betty and Sophia. Through the experiments in Section 5, we get that these common subsets contribute more on user similarity estimation, which we make full use of to improve the rating prediction accuracy.
However, suppose the dataset is very large, e.g., if we have to deal with tens of millions of users' records, the calculation of similarities among users will be very time-consuming. To achieve this problem, we present a compact binary sketch, which hints the construction process in Fig.2b. Given two users, let U 1 and U 2 denote their sets of ''also viewed'' (or ''also bought'') products, respectively. If two sets have common part (e.g. AB 1 and AB 2 ), they will be hashed into the same bits, i.e. the bits with the same location number (as showed in the shaded bits of Fig.2b). Moreover, instead of setting a bit to 1, we flip a bit according to the hash value of each element in U 1 and U 2 . Specifically, we utilize the even sketch to hash value, which means that the i th location records the parity of the number of the elements, whose hash values hit location number i.
In summary, the contributions of our paper are three-fold.
(1) To the best of our knowledge, few existing works considered information hidden in the so-called ''also-viewed'' or ''also-bought'' products in user rating prediction scenario. We recognize the substantial significance of the hidden information and specify the problem formally. (2) We employ a compact binary sketch (i.e. ES, Even Sketch for user similarity estimation) to estimate user similarity, which can deal with huge amounts of data and can greatly reduce computation time. The remainder of this paper is organized as follows. In Section II, we discuss the related work relevant to our research. Then we formalize the used notions in Section III before we detail our approach in Section IV. We present the experiments in Section V and conclude this work in Section VI.

II. RELATED WORKS
Rating prediction is a hot spot in the research of recommender systems. There are many studies about how to utilize various side information, which is related to products. For example, Tong et al. [13] presented a rating prediction algorithm named NF-SVM based on the analysis of users' natural noise and relationships. It firstly clustered users to sharpen the similarity attribute among users, then analyzed users' rating history to obtain the attributes of users' natural noise. For recommendation, in [14], Shi et al. addressed the correlation between expenditures and rating scores, such that an Expenditure aware Rating Prediction method (EARP) was presented, which was based on low-rank matrix factorization. Pradhan et al. [16] utilized multi-view clustering to cluster users or items leveraging information from multiple modalities, by which the accuracy of CF-based rating prediction systems was improved. A framework named AB (Attribute Boosting) was presented in [17], which took full advantage of the interactions with attributes from user aspect, item aspect, and attribute type aspect, respectively. Similarly, Pan et al. [24] presented a novel approach to cluster Mashups into groups, which integrates structural similarity and semantic similarity using fuzzy AHP (fuzzy analytic hierarchy process).
Meanwhile, some novel methods have been proposed to contribute to the rating prediction. In [15], Wu combined the rough sets with the back propagation neural network to do audience rating prediction, which can predict complicated audience ratings with dynamic and non-linear factors. By means of modeling ratings over each user's latent interests and also each item's latent topics, Harvey et al. [18] presented a Bayesian latent variable model for rating prediction. In [18], Harvey firstly described a Gibbs sampling procedure for estimating parameters that can compete with the gradient descent SVD methods commonly used in state-of-the-art systems, then extended the model so that rating estimation was improved significantly by means of user-dependent and item-dependant biases. For dealing with the cold-start users/items problem, [19] addressed a rating comparison strategy (RAPARE) to learn the latent profiles of cold-start users/items, which focused on providing a fine-grained calibration on these latent profiles by exploring the differences between cold-start and existing users/items. Similarly, for the prediction accuracy, [20] addressed a sentiment-based rating prediction method (RPS), which not only calculated each user's sentiment on items/products, but also took into account interpersonal sentimental influence and product reputation.
In [12], [21], McAuley et al. leveraged the ''also-viewed'' product information to recommend visually alternative products in clothing domain as a link prediction task. However, other product information was overlooked, and they only referred to the clothing domain. A similar work was presented in [10]. It addressed ''also-viewed'' product information for rating prediction, where all the product domains were allowed. In this paper, we also leverage rating prediction, but we consider ''also-viewed'', ''bought-together'' and ''alsobought'' product information simultaneously, which can improve user rating prediction accuracy in general domains.

III. PROBLEM FORMULATION A. NOTATIONS
The notations of this paper are defined in Table 1.
Assume that there are n users and m products in an online shopping scenario, U = {u1, u2, . . . ,un} and P = {p1, p2, . . . ,pm} are the sets of users and products respectively. Rating matrix R = [r ij ] n×m denotes the ratings assigned by users to products, e.g., the element r ij of R means that user i rates product j with value r ij . In different application scenario, r ij can be either a real number or a binary value [10]. In this paper, the datasets are extracted from Amazon.com by McAuley et al., where r ij in the range [1,5].

B. CONSTRUCTING PRODUCT-AFFINITY NETWORK
In this paper, we build two networks to embody ''also viewed'' and ''also bought'' product information respectively. Also-viewed product-affinity network is similar to Fig.3, whose nodes denote products and the directed edges show the ''also viewed'' relationships among products. In this product-affinity network, neighboring products share common product information such as appearance, functionality and specifications. When a customer wants to buy a pair of leisure shoes, he or she may view several pairs of other shoes, whose manufacturers, styles, even the price may be different. Moreover, after having placed an order for a pair of target shoes, he may probably buy a shoe brush, or a target jacket then a shirt. These purchase habits indicate the also-bought relationship among products, which can be described by means of Fig.4. In Fig.3 and Fig.4, the edges are directed from a target product to its ''also  viewed'' or ''also bought'' products respectively, and obviously these two relationships are not symmetric. This paper uses two product-affinity matrices to represent the ''also viewed'' and ''also bought'' relationships, i.e. also-viewed product-affinity matrix AV = [av ij ] m * m and also-bought product-affinity matrix AB = [ab ij ] m * m , where [av ij = 1] or [ab ij = 1] indicates that the product j is the ''also viewed'' or ''also bought'' product of product i respectively.

IV. METHOD A. MODELING RATING
Given an on-line shopping scenario consisting of n users and m products, the relationships between users and products are denoted by a user-product matrix named R = [r ij ]n × m. Every entry in this matrix r ij represents the rating of user i for product j. U ∈ K×n and P ∈ K×m denote user and product embedding matrices, with column vectors Ui and Pj denoting user-specific and product-specific embedding vectors, respectively.
The conditional distribution over the observed ratings is as follows: where f(x|µ, σ 2 ) denotes the probability density function of a Gaussian distribution with mean µ and variance σ 2 , and I R ij equals to 1 if user i rated product j, and 0 otherwise. The range of U T i Pj is restricted within [0, 1] by means of the logistic function g(·). For the sake of simplicity, we also VOLUME 9, 2021 convert rating r ij into [0, 1]. The hidden variables are placed zero-mean spherical Gaussian priors [22] as follows: Based on Eqs. 2 and 3, the log-posterior distribution over the hidden variables is computed by the following formula:

B. MODELING ALSO-VIEWED RELATIONSHIPS
For the also-viewed product-affinity matrix M avp , its conditional distribution over the observed also-viewed productaffinity matrix is defined as: where V ∈ K ×m is the ''also-viewed'' product embedding matrix, with V k denoting the k th ''also viewed'' productspecific column embedding vector for product k. Given product j and k, their ''also viewed'' relationship is modeled by P T j V k. I M avp jk equals to 1 if product k belongs to one of the ''also viewed'' products of product j, and 0 otherwise. Similarly, we place zero-mean spherical Gaussian priors on the hidden P and V as: Thus, the log-posterior distribution over the hidden variables is computed by Eq.7.
C. MODELING ALSO-BOUGHT RELATIONSHIPS Similar to the also-viewed relationships, the conditional distribution of the also-bought product-affinity matrix M abp is showed on Eq.8.
where B ∈ K ×m is the ''also bought'' product embedding matrix, with B k denoting the k th ''also bought'' productspecific column embedding vector for product k. P T j Bk denotes the ''also bought'' relationship between product j and k. I M abp jk equals to 1 if product k belongs to one of the ''also bought'' products of product j, and 0 otherwise.
The zero-mean spherical Gaussian priors on the hidden P and B are defined by Eq.9.

P(P|σ
At last, the log-posterior distribution over the hidden variables is computed by Eq.10.

D. EVEN SKETCH
As showed in Fig.2, we use a compact binary sketch (i.e. ES, Even Sketch for user similarity estimation) to estimate user similarity. The Even Sketch consists of an array l of n >2 bits. Without the loss of generality, we assume the i th bit can be formally described as follows: where h : a [n ] is a random hash function, and 0≤ i < n . More specifically, li is the parity of the number of set items that hash to the i th location. Based on the Even Sketch, we can greatly reduce the storage space, especially when we deal with large-scale datasets, since we just record the parity of the number of elements, which hash to a location. Moreover, we further handle the Even Sketch values of two sets by means of an exclusive-or operation, thus the public portion is omitted. For notational convenience we will think of ES(S 1 ) and ES(S 2 ) as the sets of bit positions containing 1, which means that their exclusive-or corresponds exactly to the symmetric difference.
Let m and n be the size of set S (e.g. set of ''also viewed'' products) and the size of ES(S) in bits, respectively. In this work, hash functions are fully random. So, the process of constructing ES(S) can be referred as a voting problem. Assume that there are m voters and n candidates, we calculate the parity of the number of votes for each candidate.
We consider the parity of number of votes for any candidate as a simple two-state Markov chain model. The first/second state corresponds to the odd/even parity. As the probability of changing states is 1/n , let p i be the probability that any specific candidate has an even number of votes after i voters have cast their ballots, we can get p i as follows Generally speaking, let X i be a 0-1 random variable corresponding to the parity of the number of voters who voted for the i th candidate, then Finally we can get an estimate Given S 1 and S 2 , through the method detailed in Fig.2, we can construct Even Sketches ES(S 1 ) and ES(S 2 ). To improve the accuracy, we should increase the number of independent permutations ( i.e. variable k). We describe the symmetric set difference of S 1 and S 2 as S 1 S2, then where J is the Jaccard similarity of S 1 and S 2. For simplicity, we think of ES(S 1 ) and ES(S 2 ) as the sets of bit positions containing 1. So, we can get the symmetric difference of S 1 and S 2 by means of exclusive-or of ES(S 1 ) and ES(S 2 ). Let Here |ES(S 1 ) ES(S 2 )| denotes the number of 1s in the structure. Let |S 1 ∧ S 2 | as a proxy for E[|S 1 ∧ S 2 |], the Jaccard similarity can be estimated as follows:

E. OBJECTIVE FUNCTION
In the previous sections, we described how to independently model the ratings, ''also-viewed'' relationships and ''also-bought'' relationships among products. Moreover, we detailed how to estimate user similarity. Maximizing the log-posterior over the hidden variables is equivalent to minimizing the following objective function [10], where fixed hyper-parameters (i.e. the observation noise variances and prior variances) are used.
F denotes the Frobenius norm. The importance of ''alsoviewed'' and ''also-bought'' products in the unified model can be regulated by λ M avp and λ M abp respectively. For the non-convex objective function showed in Eq.19, by means of computing the gradient of each embedding variable, i.e., Ui, Pj, V , B, we can learn and obtain a local minimum solution.

F. COMPLEXITY ANALYSIS
ItemCF [23] is a basic item-based collaborative filtering algorithm. For generating predictions for a user u on item i, it first retrieves the pre-computed k most similar items corresponding to the target item i. Then it finds how many of those k items were purchased by the user u. PMF [22] utilizes regularization parameters to provide a more flexible method to regularization, its main drawback is computationally expensive, since instead of training a single model it has to train a multitude of models. As for VMCF [10], it obtains a total complexity of O(ρ(K + D)+µK) (ρ denotes the average number of observed ratings in rating matrix, and µ denotes the average number of observed elements in product-affinity matrix), where K and D are Num. dimensions of product embedding and Num. dimensions of product visual embedding, respectively. Our method mainly consists of three steps. The first step is to perform user similarity calculation by means of ES (Even Sketch for user similarity estimation), and this step has complexity O(n × m 1 + n × m 2 + n × m 3 ) (m 1 , m 2 and m 3 denote the average numbers of ''alsoviewed'', ''bought-together'' and ''also-bought'' products, respectively). Then, for a user u, we find its similar users, and this incurs complexity O(n). Finally, based on those similar users (actually, our method can effectively reduce the VOLUME 9, 2021  searching space), we do rating prediction. The last step takes complexity O( § ×K + ×K + ×K ) ( § denotes the average number of observed ratings which the similar users rate, denotes the average number of observed elements in alsoviewed product-affinity matrix, and denotes the average number of observed elements in also-bought product-affinity matrix). The above analysis indicates that the complexity of ES_USE is a linear value. Moreover, § ρ, m 1 m, m 2 m and m 3 m, and they contribute to the highefficiency of ES_USE.

V. EXPERIMENTS A. DATASETS
In this section, we conduct experiments to evaluate our approach. We compare our ES_USE with several state-ofthe-art methods on multiple public real-world datasets, which are extracted from Amazon.com by McAuley et al. [12]. The datasets contain product reviews (i.e., ratings, text, helpfulness votes) and product metadata. Specifically, the metadata includes price, title, a list of also viewed products, a list of also bought products, and so on. We use the lists of also viewed products and also bought products to construct also-viewed product-affinity matrix and also-bought productaffinity matrix, respectively. We preprocess all datasets so that each user rated at least four products. And by the way, every product in the product-affinity network must exist in the user ratings data. Table 2 details the statistics of our datasets, which include four datasets, i.e. books, baby, office products and pet supplies. In Figure 5, the number of rated products in each dataset is counted, respectively.
We compare our ES_USE with ItemCF, PMF and VMCF, introduced in [23], [22], and [10], respectively. ItemCF is a traditional recommendation method based on the similarity of products, and it uses Pearson's correlation coefficient as similarity metric. However, PMF considers only user ratings information, which is a matrix factorization-based recommendation method. As for VMCF, it leverages ''also-viewed'' products to do rating prediction. In order to present a clear understanding of baseline methods, we detail a summary of their properties in Table 3.

B. EXPERIMENTAL SETTINGS
For comparison, we employ MSE (Mean Squared Error) [10], MAE (Mean Absolute Error) [26] and F1 score [25] to evaluate the performance of user rating prediction.
where r ij denotes the rating that user i rated product j, r ∧ ij denotes the corresponding predicted value, and N denotes the number of ratings. Considering our method detailed by the equations above, we can get a better performance with a lower MSE or MAE. We divide the user ratings datasets into two parts, one (i.e. 70% of the user ratings datasets) for training and the other for testing (i.e. the remaining 30%). For ItemCF, we set the number of neighbors to 20. For all other methods, we carry out grid search with λ U , λ P ∈{0.01, 0.1, 0.2, 0.5, 1.0}, λ M avp , λ M abp ∈{0.01, 0.1, 0.2, 0.5, 1.0} and λ V , λ B ∈{0.01, 0.1} while K is fixed to 5. Figures 6-9 present the evaluation results on each dataset in terms of MSE. We observe that in all product domains (i.e. Books, Baby, Office Products and Pet Supplies), ES_USE obtains smaller MSE values, which means better prediction accuracy. This indicates that the prediction accuracy of ES_USE is improved by modeling the ''also-viewed'' and ''also-bought'' relationships. Moreover, the Even Sketch (i.e. a compact binary sketch for estimating the Jaccard similarity of users) is incorporated into ES_USE, which in turn yields further improvements in user rating prediction accuracy. Figures 10-12 detail the influence of balancing parameters λ M avp and λ M abp , which are hyper-parameters that regulate the importance of ''also-viewed'' and ''also-bought'' products in   the unified model, respectively. In our experiments, we set λ = α ×λ M avp +(1-α)×λ M abp , where α is the weight of λ M avp . When λ M avp = 0, we ignore ''also-viewed'' product information but exploit ''also-bought'' product information, and vice versa. Observing from Figures 10-12, we draw the conclusion that incorporating ''also-viewed'' and ''also-bought'' product information indeed improves user rating prediction accuracy, and the optimal value of λ is different in each product domain, which is mostly a value in the range [0.0, 1.0].

C. EXPERIMENTAL RESULTS
In the third set of experiments, we try to evaluate the precision ratio for our ES_USE sketch on four datasets, respectively. Figures 13-16 show the experimental results on books, baby, office products and pet supplies, respectively.   On the whole, ES_USE has a high precision ratio. Under the different sizes of ES_USE (i.e. the size of Even Sketch in bits (n')), it almost has a precision above 85% with different size scales. As shown in Figures 13-16, when the size of ES_USE varies from 100 bits to 600 bits, the precision ratio of the four datasets all grows. Actually, if bigger the size of Even Sketch is, more accurate Even Sketch will be. Generally speaking, ''also-viewed'' and ''also-bought'' records can be hashed into a bigger bits space, and this can reduce conflict probability. ES_USE obtains higher relative precision ratio on Books and VOLUME 9, 2021    Pet supplies datasets. ES_USE achieves up to 92% and 90% on Books and Pet supplies, respectively. In addition, the purpose of the fourth set of experiments is to verify the effects of user similarity assessment on ES_USE.   The size of ES_USE (i.e. n') is fixed at 512 or 1024 bits.
In the experiments, we compare the performance of user similarity assessment involved with no user similarity involved (abbreviated as ES_USE-Y and ES_USE-N, respectively). As we can see from Figures 17-18, ES_USE-Y and ES_USE-N both have a high precision ratio, but ES_USE-Y is better. Especially when n' = 1024, ES_USE-Y is 9.57% superior to ES_USE-N, though the precision ratio of ES_USE-N is above 0.82 on the four datasets. It is clear that ES_USE-Y performs more accurately.
However, in the fifth set of experiments, we verify the precision performance of ES_USE for different Jaccard  similarities on the Books dataset, where the size of Even Sketch (i.e. n') is fixed at 512 or 1024 bits. As shown in Figures 19-20, when the Jaccard similarity of users varies from 0.7 to 0.9, the precision ratio has a slow growth in both of the above scenarios. It is worth noting that the performance can be improved if we double the size of Even Sketch, i.e. n' from 512 to1024 bits. The influence of user similarity assessment on ES_USE is further confirmed by this set of experiments. Though the size of Even Sketch influences the accuracy of user similarity, it is not always that a higher value is better. Through our experiments, it is concluded that about 800 bits is the optimal value.
To further evaluate the performance of our approach, we make use of MAE metric (the smaller the better) in the sixth set of experiments. The size of ES_USE (i.e. n') is fixed at 1024 bits. The experiment results are shown in Fig.21. Concretely, on the four datasets, our proposed ES_USE approach outperforms the rest three ones in terms of prediction accuracy, which is due to the inherent nature of our adopted Even Sketch technique (i.e., a compact binary sketch for user similarity estimation). Besides, this set of experiment results is consistent with those of Figures 6-9. Fig.22 shows that the F1 value of ES_USE-Y (user similarity assessment involved) approach is larger than other  approaches. The F1 value is defined as the mean of the accuracy rate and recall rate, and it is a more objective system performance evaluation index. So, it is considered that the accuracy of ES_USE-Y has improved compared with other approaches. The prediction method which is combining the ''also-viewed'', ''also-bought'' and the ''bought-together'' product attributes. Moreover, user similarity assessment can make more comprehensive predictions for users.
Finally, we verify the F1 performance of ES_USE for different sketch size on the Books dataset. As shown in Fig.23, when the size of Even Sketch (i.e. n' ) varies from 100 to 1000 bits, the F1 score has a slow growth. It has the maximum value when n' holds 800 bits. From this set of results we summarize as follows. Firstly, the Even Sketch can contribute a great deal to the rating prediction accuracy, which can estimate the Jaccard similarity among users. Secondly, a bigger Even Sketch size means a better prediction accuracy, but it is not always the case (about 800 bits is the optimal value).

D. DISCUSSION AND ANALYSIS
By these experiments conducted on a large scale dataset, we firstly find that ES_USE is the most accurate method among ItemCF, PMF and VMCF. Specifically, by means of Even Sketch (ES for short), we can estimate the Jaccard similarity among users. For rating prediction, we search similar users out in advance. Then, based on these similar users, we do ''also-viewed'' and ''also-bought'' information extraction. As a result, all that can produce high prediction accuracy. Actually, similar users may have the same behavior in shopping, such as ''also-viewed'', ''also-bought'', ''boughttogether'' and so on.
In our method ES, we take into consideration three dimensions, i.e. ''also-viewed'', ''also-bought'' and ''boughttogether'' product records (as shown in Fig.2a). The construction process of ES is detailed in Fig.2b, which is a compact binary sketch for estimating the Jaccard similarity of users. And, theoretically, a larger number of bits of Even Sketch mean better performance. But from our experimental results, we get that it is not the case. Given a dataset, performance gets better and better with the scale of ES size grows. It starts to converge when ES owns a certain amount of bits. As the scale of the datasets in Table 2, convergence value is around 800 bits. If dataset scale is different, we may get different optimal convergence value.
Although our method has many advantages mentioned above, it still has a lot to be improved upon. First, accuracy of user similarity measure is really critical, on which ES_USE does rating prediction based. Second, given a dataset, an effective method is necessary to find an optimal size of Even Sketch. Third, given a data, it is a difficult point that how to uniformly hash it into the bits of ES. In our future work, we will overcome these drawbacks.

VI. CONCLUSION
In this paper, we propose ES_USE, which jointly incorporates ''also-viewed'', ''bought-together'' and ''also-bought'' product information, and can not only reflect various product aspects, but also be varyingly influential to user ratings in different product domains. A systematic user similarity measurement is designed and real-world experiments are conducted. The comprehensive experimental analysis shows the effectiveness and feasibility of ES_USE.
In our future work, more relationships among products will be investigated, for example, ''bought-after-buying''. We will try our best to improve the accuracy and scalability of Even Sketch, since it influences heavily the performance of the whole system.