A Hybrid-Preference Neural Model for Basket-Sensitive Item Recommendation

Basket-Sensitive Item Recommendation (BSIR) is a challenging task that aims to recommend an item to add to the current basket given a user’s historical behaviors. The recommended item is supposed to be relevant to the items in current basket. Previous works mainly produce a recommendation based on user’s current basket, ignoring the inherent preference released by user’s long-term behaviors and failing to accurately distinguish the item importance in the basket for detecting user intent. To tackle the above issues, we propose a hybrid model, i.e., Hybrid-Preference Neural Model (HPNM), where a user’s inherent preference is recognized by modeling the historical sequential baskets and the recent preference is identiﬁed by focusing on the current basket. In detail, we apply an attention mechanism for distinguishing the importance of items in a basket to generate an accurate basket representation. GRU is utilized for modeling the basket-level sequential information to obtain user’s long-term preference and the representation of the current session is regarded as user’s short-term preference. We evaluate the performance of our proposals against the state-of-the-art baselines in the ﬁeld of BSIR on two public datasets, i.e., TaFeng and Foursquare. The experimental results show that HPNM can achieve obvious improvements against the baselines in terms of HLU and Recall. In addition, we ﬁnd HPNM with an attention mechanism can lead to a larger improvement against the baseline for item recommendation in terms of HLU and Recall on testing baskets with relatively fewer items.


I. INTRODUCTION
Recommender systems can help people address personalized need out of overloaded information on the internet [1,2], which are typically implemented by modeling user's historical behaviors [3][4][5]. As we know, user behaviors can be often collected from user's interactions with a set of items rather than a single item at a timestamp [6][7][8]. For instance, people would like to buy more than one product in a purchase event or visit more than one place in a day during the journey. This set of items (i.e., products or places) can be regarded as a basket [9,10], where items in a basket are often regarded relevant to each other. Basket-sensitive item recommendation (BSIR) aims to produce a recommended item by considering user's interactions with items in the historical baskets and the current basket [9], which can represent user's inherent preference and instant preference, respectively. Here three factors are often considered, i.e., the relationships among items within a particular basket, the items in the current bas-ket, and the basket-level sequential information in historical baskets. Considering an example in Fig. 1, there is a user who likes purchasing clothes on the e-commerce platform, such as JD.com, Alibaba, etc. At each timestamp, the user may select several items together, i.e, a basket, which may due to that there will be a discount if those products are consumed at the same time. Moreover, there may be some items that don't conform to user's main interest in the basket because of user's complex behavior pattern. For example, in Fig. 1, the user intends to buy some summer clothes in the current basket. However, during the interaction process, some summer snacks, e.g., the ice cream, may attract the user's attention and is then added into the basket together with these clothes. Thus, the importances of these items are different for predicting the user's main interest in current basket. Moreover, the user's interest from his historical sequential baskets (i.e., all the baskets in the timeline) and the current basket (i.e., the last basket) can represent user's inherent attributes (such as the female gender) and the instant needs (i.e., the purpose of purchasing summer clothes), respectively. Both of them are important for predicting the user's next purchase. Furthermore, the sequential information in historical baskets should also be emphasized, which can reflect the preference migration of the user (e.g., user preference from winter dress to summer clothes). By taking the basket-level sequential signals into consideration, user's dynamic demands can be captured.
Generally, the major approaches for BSIR can be categorized into two folds, i.e., the association rules based recommendation [11] and the factorization machine based recommendation [9]. Association rules based methods can discover the correlations that exist in a large amount of interactions. However, the generated rules are the same for all users. Although there exists "personalized" recommendation by using the rules that the user has interacted with the items on the left side [11], the personalized association is still not well learnt. Factorization machine based methods consider the relationship among users, items in current basket and the target item to be recommended. Commonly, a linear combination is applied to take all relationships into consideration. However, such approaches can not model the dynamic change of a user's preference. For instance, if a user bought a lot of winter clothes in winter. After the winter, the user wants to buy clothes for spring or summer. Using the factorization machine based approach, the winter clothes will be recommended again since the user has bought lots of winter clothes. Obviously, it is unreasonable to buy winter clothes in the summer time. Moreover, the linear combination will cause an issue of information loss [12,13].
In summary, the main challenges in Basket-Sensitive Item Recommendation (BSIR) are: 1) How to mine user's interest in each basket by modeling the items interacted in the basket through distinguishing their different importance? 2) How to take both user's inherent preference and instant needs into consideration and meanwhile capture the dynamic interest migration in the historical baskets?
To solve the above issues, we introduce a Hybrid-Preference Neural Model (HPNM) for basket-sensitive item recommendation. First, in the item level, an attention mechanism is applied to distinguish the importance of items in each basket to generate an accurate basket representation, which can partly solve the issue of information loss that exists in the process of the average pooling and the max pooling. Then, in the basket level, HPNM utilizes Gated Recurrent Units (GRU) to model a user's historical sequential baskets to generate the long-term preference. In addition, the representation of current basket is regarded as user's short-term preference, which is then combined with user's long-term preference as the final user preference for item prediction.
In general, the contributions of our work are summarized as follows: (1) We introduce a hybrid neural model for basket-sensitive item recommendation which can capture user's long-term and short-term preferences simultaneously.
(2) We apply an attention mechanism to model the items in each basket to better distinguish the importance of items, solving the problem of information loss when representing the basket. (3) We conduct comprehensive experiments on the TaFeng and Foursquare datasets, finding that our proposal achieves the state-of-the-art performance, returning the target items at an earlier position in the recommendation list.

II. RELATED WORK
According to the number of items interacted by the user at each timestamp, there are item recommendation (one item at each timestamp) and basket-based recommendation (more than one item at each timestamp). In this section, we briefly introduce the previous works on item recommendation in Section II-A and basket-based recommendation in Section II-B.

A. ITEM RECOMMENDATION
General recommendation methods can obtain a user's inherent preference by modeling his historical interactions, where Collaborative Filtering (CF) is the most widely used approach which mainly utilizes Matrix Factorization (MF) to take the user-item interaction matrix into consideration. Recently, some neural methods are proposed to better model the interactions between users and items. He et al. [12] propose Neural Collaborative Filtering (NCF) to combine the generalized matrix factorization and a multi-layer perceptron to nonlinearly learn the user-item interactions. Then Chen et al. [13] propose a Joint Neural Collaborative Filtering (JNCF) to extract deep feature representations of users and items, and capture the deep interactions between users and items through the user-item interaction matrix. Furthermore, Wang et al. [14] utilize Graph Neural Networks (GNN) [15] to learn the high-order connectivities in the user-item integration graph by performing embedding propagation to explicitly encode the collaborative signal. However, general recommendation methods can merely focus on the inherent user preference, they often neglect the migration of user interests, thus failing to capture user's instant needs.
Different from general recommendation methods, sequential recommendation is proposed to model users' dynamic interest [16][17][18]. For example, Hidasi et al. [16] propose GRU4REC, which utilizes Gated Recurrent Units (GRU) to model the sequential information in a session to obtain users' dynamic interests. Moreover, Li et al. [19] introduce attention mechanism into GRU4REC to capture users' main purpose. And Liu et al. [20] distinguish the item importance using an attention mechanism to obtain users' long-term preference and explicitly take the current action as the shortterm interest to make predictions. Furthermore, Kang and McAuley [17] propose SASRec to introduce self-attention mechanism into sequential recommendation to identify the related items for predicting the next item. Considering the ability of graph neural networks for modeling the complex transitions between items [15,21], GNNs have been widely applied in the sequential recommendation. For instance, Wu et al. [22] first convert each session into a session graph and then use Gated Graph Neural Networks (GGNN) [23] to obtain the accurate representation of each item. However, all the above sequential recommenders are designed for item recommendation, without considering the situation that a user may interact with a whole basket rather than a single item at each timestamp.
Generally, current item recommendation methods cannot be directly applied on basket-sensitive item recommendation because of the ignorance of sequential information or merely considering the interactions with a single item at each timestamp. In contrast, our proposed method can take both of those two factors into consideration to achieve considerable recommendation accuracy.

B. BASKET-BASED RECOMMENDATION
Basket-based recommendation has been well investigated in recent years. In this section, we divide the major basketbased recommendation approaches into two categories, i.e., basket-sensitive item recommendation (BSIR) and next basket recommendation (NBR), according to the difference of recommendation aim.
As for basket-sensitive item recommendation, the association rules can find the associations from historical data and can be employed in recommender systems [24,25]. The typical association rule based methods, like FP-tree [26], mainly focus on the computational efficiency. In addition, these methods are not personalized. Further more, Sarwar et al. [11] apply association rules in recommendation and make it "personalized". However the association still has no difference for users. In addition, factorization machine has also been applied in BSIR. Le et al. [9] employ a factorization machine to consider the relations, e.g., user and target item, basket item and target item. Further more, it integrates the constraints for baskets with similar intent. So far, this method achieves the state-of-the-art performance in the task of BSIR [9].
Different from BSIR, Next Basket Recommendation (NBR) recommends a whole basket to a user given the historical behaviors [27]. Traditional methods like Markov Chains [28] and deep learning based methods [2,29,30], have been well studied. For instance, Rendle et al. [28] propose Factorizing Personalized Markov Chains (FPMC) for next basket recommendation, which takes both the sequential information and user's general taste into consideration. Wang et al. [31] utilize the average and max pooling to aggregate the items in each basket and propose a hierarchical representation model (HRM) to represent user's preference. In addition, Yu et al. [32] apply Recurrent Neural Networks (RNN) in NBR and achieve an excellent performance, which verifies the effectiveness of RNN in NBR. Le et al. [33] propose to consider the relative importance of items when modeling the sequential information of baskets.
However, in the group of methods for aggregating the basket, the average pooling treats each item equally and the max pooling only takes the most significant one in every dimension, which results in an issue of information loss. In this paper, we propose to apply an attention mechanism to distinguish the importance of items in each basket. Moreover, we consider user's long-term preference for recommendation by modeling the whole historical baskets.

III. APPROACHES
In this section, we describe our Hybrid-Preference Neural Model (HPNM) for Basket-Sensitive Item Recommendation (BSIR). We first formally present the BSIR task in Section III-A. The major architecture of HPNM is shown in Fig. 2. In the first stage, the item embeddings in the basket are inputted to an aggregation layer to generate the basket representation (See Section III-B). Then, long-term and short-term preferences are modeled separately and combined to generate user preference representation (See Section III-C). Finally, the user preference is inputted to a similarity layer, together with all item embeddings, to calculate the probability scores of all potentially recommended items and make prediction (See Section III-D).

A. PROBLEM FORMULATION AND NOTATION
Assuming there are a set of users denoted by U and a set of items denoted by I, the number of users and items are |U | and |I|, respectively. Each user is represented as u ∈ U , and each item is represented as i ∈ I, respectively. Given a user u, the historical baskets can be denoted as S u = {S u 1 , S u 2 , ..., S u n }, which includes user's current basket S u n . Intuitively, user u's basket at timestamp t is denoted by S u t and consists of a set of m items, i.e., S u t = {x t1 , x t2 , .., x tm }. Given user's interaction history, the purpose of BSIR is to recommend an item to add to the current basket S u n . To begin with, for every item, we use an one-hot vector to represent it, represented as e i ∈ R |I| for item i. Specifically, for item i, the value at i-th position in the one-hot vector is 1, while in other positions they are all set to 0. After that, the one-hot vector will be inputted to an embedding layer to get its embedding representation. To be more specific, we create an item embedding matrix V ∈ R |I|×d , and retrieve the corresponding vectors from the item embedding matrix for items as their representations, i.e., regarding the i-row in V as the representation of item i. After the embedding layer, the item i is represented as v i ∈ R d ,where d is the dimension of item embedding vector. VOLUME 4, 2016 User Preference Representation    In recommender systems, the item prediction is commonly conducted by comparing the similarity of user representation with the embeddings of the candidate items [12,34]. Specifically, in basket-sensitive item recommendation, the generation of user preference is hierarchical, which means that we need to provide the representation of the baskets for the user preference modeling. Moreover, before basket-level fusion, we need to firstly conduct the aggregation on the item embeddings to obtain basket representations.
To identify the importance of different items in a basket, we first input the item embeddings to an importance function. After generating the item importance, the item embeddings are combined according to the importance weight to vectorlize the basket S u t as follows.
where g(·) is the representation function and f (·) is the importance function, λ ti and v ti are the importance and the embedding of the i-th item in S u t , and q is the vector for item importance calculation.
For the importance function f (·), we apply an attention mechanism [35] to produce the importance weight of all items in the basket as shown in Fig. 3. Attention mechanism is a method that can dynamically determine the importance of different components, which is widely applied in document classification [36], recommender systems [19,34,37], etc. Moreover, attention mechanism is usually used together with the softmax function, where the softmax function is utilized to normalize the attention scores. We first feed the embedding vectors into an one-layer Multi Layer Perception (MLP) to where W k ∈ R d×d , b ∈ R d are a weight matrix and a bias term to be learnt, respectively. Then, we initialize an vector for the attention mechanism, i.e., q ∈ R d , to caculate the weight of each item. Following [34,36], the weights of items in the basket is calculated with a softmax function as follows: For the representation function g(·) in Eq. (1), we calculate it as follows: where W g ∈ R d×d is a weight matrix that projects v ti into a new embedding space. After aggregating the items in the basket, the basket is eventually represented by B u t ∈ R d as produced by Eq. (1), where d is the dimension of embedding dimension.

C. LONG-AND SHORT-TERM PREFERENCES
After generating the representation of all historical baskets, we identify user's final preference by considering the interaction history in two aspects, i.e., user's long-term preference and short-term preference, respectively.
As for the long-term preference, we apply GRU to model the sequential information, which applies a linear interpolation to combine the previous activation h t−1 and the candidate activationĥ t together: where z t is the update gate that controls how much information in the former state h t−1 should be preserved. z t can be formalized as: where B u t is the input basket representation at timestamp t. The candidate activation function is computed as: where r t is the reset gate, controling how much information of previous state should be written in the candidate activation, which is formalized as: where σ represents the activation function sigmoid for all the above formulas, and W z , U z , W , U , W r , U r ∈ R d×d are the trainable neural network parameters in the network. Here we adopt the last hidden output h t as user's long-term preference P u l , i.e., P u l = h t . As for the short-term preference, which is denoted as P u s , we regard the representation of the last basket as the current preference, i.e., where B u n can be generated by the similar scheme introduced in Eq. (1).

D. PREFERENCE FUSION AND PREDICTION
After getting user's long-term and short-term preferences, we combine them for item recommendation prediction. We identify user's final preference as P u by where W l , W s ∈ R d×d are the trainable parameters; is the Hadamard product used for combination. Thus, given user u's historical baskets B u and a set of items I to be recommended, we calculate the probability of each item i ∈ I being added to user's current basket by multiplying the user preference P u with the embeddings of all candidate items, and a softmax funtion is utilized to normalize the predictions scores: where v i ∈ R d is item i's embedding, andŷ = {ŷ 1 ,ŷ 2 , . . . ,ŷ |I| } are the prediction scores on all candidate items. Finally, we recommend the items with the highest probabilities.
To effectively train the model, we adopt the cross-entropy as the optimization objective to learn the parameters, which is defined as: where y denotes an one-hot vector of the ground truth. That is, if a user interacts with item k, then, y i = 1 if i = k and y i = 0 if i = k.

IV. EXPERIMENTS
In this section, we list research questions that guide our experiments in Section IV-A, then briefly summarize the baseline models we choose to compare with as well as our proposed models in Section IV-B. Next, datasets and evaluation metrics are introduced in Section IV-C and Section IV-D, respectively. At last, we give the details of our experimental settings and hyper-parameters in Section IV-E.

A. RESEARCH QUESTIONS
To guide our experiments, we list three research questions: RQ1 Can our proposed Hybrid-Preference Neural Model (HPNM) perform better than the state-of-the-art baselines for basket-sensitive item recommendation? RQ2 How does the number of historical baskets affect the recommendation performance? RQ3 How well will the HPNM models perform when the average number of items in the historical baskets change? VOLUME 4, 2016

B. MODEL SUMMARY
To prove the effectiveness of our proposal, we conduct experiments by comparing the performance against several baselines. We list the major models discussed in this paper below.
• POP [38]: This is a non-personalized recommendation method, which recommends the most popular items to each user. • ASR [11]: An Association Rule based approach for basket recommendation, which collects all items that the user has interacted with in the historical baskets to make recommendation "personalized". • CBFM [9]: A personalized model for BSIR, which considers the relationship among the users, the items in the current basket and the target item. • SHAN [34]: A two-layer hierarchical attention network based approach for item recommendation, which jointly applies two attention layers to generate user representation, respectively. • HPNM ave : Our proposed Hybrid-Preference Neural Model, which considers user's long-term and shortterm preferences and the average pooling is utilized for basket representation. • HPNM max : Different from HPNM ave , the max pooling mechanism in HPNM is utilized for basket representation. • HPNM att : Different from HPNM ave , an attention mechanism in HPNM is utilized for basket representation.

C. DATASETS
We evaluate the models on two publicly available datasets, i.e., TaFeng and Foursquare.
• TaFeng: TaFeng is a public retail market dataset, which covers products from many aspects. It contains 817,741 transactions between 32,266 users and 23,812 items. • Foursquare: Foursquare is a point-of-interest dataset, which contains user's check-ins at various places. It contains 194,108 transactions between 2,321 users and 5,596 items [39]. Following CBFM [9], we preprocess the datasets as follows before training. We filter out items with few transactions, i.e., 10 and 5 for TaFeng and Foursquare, respectively. And items that behave like "stop words" with presence in a large fraction of transactions are removed, which is set as 5 % for both two datasets. There are 2 such items in TaFeng, none in Foursquare. Moreover, baskets with just one item are filtered. We split the data into train/validation/test data, and retain the users who have more than 3 historical baskets. The details of the processed datasets are presented in Table 1.
For both datasets, we use the last transaction of each user for testing, the penultimate transaction as the validation data, and the rest are regarded as the training data. The hyperparameters are tuned on validation data and applied in the test period.

D. EVALUATION METRICS
In order to evaluate the performance of different models, similar to CBFM [9], we use Recall@K and HLU as metrics.
Recall@K: The Recall@K score ( i.e., R@K) is a wellknown metric that frequently used in the task of recommendation, which evaluates if the target item is in the recommendation list.
where N is the number of test sequences in the dataset and n hit is the number of cases that the target item is ranked in the top-K positions of the recommendation list. We set K as 10, 20 and 50 for comparisons in our experiments. HLU: The HLU is an evaluation metric considering the ranking position of target item. HLU can be defined as: where N is the number of all tests, T test is the test set, r t is the rank of target item in the recommender list, and C is the scaling parameter. Following CBFM [9], we set β to 5 and C to 100 .

E. EXPERIMENTAL SETTINGS
We employ the Adaptive Moment (Adam) as the optimization method for cross-entropy loss. The item embedding dimension is set to 8, and the epochs for two datasets are set to 500 with early stopping. The hyper-parameters are optimized by a grid search on both datasets and set as follows: learning rate α in [0.001, 0.005, 0.01, 0.1], mini-batch size Bs in [32,64,128,256,512].
In addition, the attention weight is initialized from a normal distribution N(0, 0.005 2 ) and the bias are set to 0. All items embedings are initialized with a normal distribution N(0, 0.002 2 ).

V. RESULTS AND DISCUSSION
In this section, we first evaluate the overall performance of the discussed models in Section V-A. Then, we zoom in on the effect on item recommendation across various number of historical baskets in Section V-B as well as across various average number of items in historical baskets in Section V-C.

A. OVERALL PERFORMANCE
To answer RQ1, we conduct experiments of all discussed models on the TaFeng and Foursquare datasets, and present the results in terms of HLU and Recall@K in Table 2.
First of all, for the baselines, as shown in Table 2, we can find that between the non-personalized approaches, ASR outperforms POP, indicating the utility of association rules in historical interactions. Moreover, we can observe that the personalized approaches, i.e., CBFM and SHAN, generally outperform the non-personalized methods, i.e., POP and ASR, in terms of all metrics. In addition, SHAN beats CBFM in terms of all metrics on two datasets. It could be explained that SHAN considers users' long-term as well as short-term preference for item recommendation while CBFM makes recommendation only based on users' short-term preference. Hence, in the later experiments, we use SHAN as our baseline for comparisons.
Next, we zoom in on the performance of our proposals. We can observe that between the variants of our proposal (i.e., HPNM ave and HPNM max ), HPNM max shows obviously better performance than HPNM ave . And clearly, HPNM max beats SHAN in terms of all metrics on Foursquare, which indicates the utlity of modeling the basket-level sequential information in the historical baskets. However, HPNM max loses the competition against SHAN on TaFeng, which may be due to the fact that using max pooling for aggregating item embeddings in the basket will introduce bias. Differently, our HPNM att can consistently beat the baseline SHAN in terms of all metrics on TaFeng and Foursquare. It could be due to the fact that the attention mechanism in HPNM can adaptively select the important items to help generate an accurate basket representation, which leads to the best performance on all datasets.
In detail, the improvements of HPNM att against SHAN in terms of HLU are 3.76% on the TaFeng dataset and 17.03% on the Foursquare dataset. The Recall improvements of HPNM att against SHAN are 2.40%, 2.99% and 0.76% when recommendation numbers K =10, 20 and 50 on TaFeng, respectively, as well as 16.47%, 15.27% and 16.12% on Foursquare. A higher improvement in terms of HLU indicates that our proposal can return the target item at a higher position in the recommendation list.

B. EFFECT OF NUMBER OF HISTORICAL BASKETS
To answer RQ2, we analyze the impact on the performance of item recommendation brought by the number of historical baskets. We examine the performance of our three proposals and the baseline SHAN across various number of historical baskets. We group the tests according to the number of historical baskets, i.e., short (no more than 3), medium (between 4 and 6) and long (more than 6) on TaFeng, and short (no more than 12), medium (between 13 and 24) and long (more than 24) on Foursquare as the users in Foursquare have longer histories than TaFeng in general. We plot the results of our proposals and SHAN in terms of Recall@10 and HLU on both two datasets in Fig. 4.
For TaFeng dataset, as shown in Fig. 4a and 4c, we can find that HPNM att performs better than SHAN for most cases with an exception that HPNM att loses the competition against SHAN for the group "long". In addition, on TaFeng, the performance of HPNM ave in terms of both Recall@10 and HLU both show a decreasing trend in general, while the performance of HPNM max first increases and then decreases in terms of both metrics. The difference could be due to the fact that HPNM ave comes up with a serious issue of information loss when the number of historical baskets increases while HPNM max has the ability to distinguish different importance of the items in these baskets to some extent. However, RNN in the HPNM models has difficulty in modeling overlong sequential information, leading to a  drop performance for cases with many baskets. Moreover, HPNM att can consistently outperform other models on all three groups. With the basket number increasing, the performance of HPNM att keeps increasing in terms of HLU while the performance of Recall@10 first increases and then decreases. This indicates that it is more difficult to push the target item at the top position in the recommendation list for users with relatively more historical baskets. Moreover on Foursquare, the performance of HPNM models as well as SHAN in terms of both metrics show an increasing trend in general with the number of historical baskets increasing. We believe that this could be due to the fact that Foursquare is a point-of-interest dataset, users' interests are narrowed to the geographic location and hence can be identified more accurately when there are more historical baskets.

C. EFFECT OF AVERAGE NUMBER OF ITEMS IN THE BASKET
To answer RQ3, we conduct experiments to analyze the impact of average number of items in historical baskets. The results of our proposed models and the baseline SHAN in terms of Recall@10 and HLU are shown in Fig. 5. Note that since the average number of items in historical baskets varies sharply on TaFeng, we split the tests into groups, i.e., [1,2], [3,4], · · · , and [11,+∞). Taking Recall@10 as an example, from Fig. 5a and 5b we can observe that HPNM att outperforms SHAN on most cases except the longest group (i.e., [11,+∞)) on TaFeng and relatively small numbers on Foursquare. This phenomenon may be explained by the fact that attention mechanism also has difficulty in distinguishing the items importances accurately on baskets with too many items. In addition, HPNM att outperforms HPNM ave and HPNM max for most cases on both datasets, which shows the ability of attention mechanism for distinguishing items importance. Interestingly, as the average number of items increases, the performance of all models decreases in general on TaFeng while increases in general on Foursquare. It could be explained by the fact that the average number of items in historical baskets is quite different, i.e., 7.2 for TaFeng and 3.0 for Foursquare. Specifically, the relatively larger number on TaFeng means that more unrelated items may be included in the basket, resulting in an difficulty on generating accurate basket representation when the [1,2] [  average number of items is increasing. In addition, the gap between HPNM att and SHAN is obviously decreasing with the average number increasing on both two datasets. Hence, we would like to conclude that HPNM with the attention mechanism can improve the performance more obviously on baskets with relatively fewer items. Similar phenomenon can also be found in terms of HLU, which indicates that the impacts of the average item number in historical baskets are approximate on Recall and HLU metrics.

D. VISUALIZATION OF ATTENTION MECHANISM
To intuitively show the working mechanism of attention mechanism for aggregating the items in the basket, we conduct a case study of the attention visualization. Specifically, we randomly select two training samples from the TaFeng dataset, i.e., user A and B, who have 5 and 6 historical baskets, respectively. The attention scores obtained by Eq. (3) are plotted in Fig. 6, the depth of color denotes the attention value of the corresponding item. The numbers on the left and at the bottom denotes the basket index of each user and the item index in each basket, respectively. In the attention visualization for each user, different rows have different lengths since the number of items in user's historical baskets varies.
From Fig. 6, we can get some interesting findings. First, the attention scores are generally more diverse in long sessions than in short sessions. For instance, in basket 1 of user A and basket 1 of user B, the attention scores on most items in the basket are small, while on several specific items (such as item 2 and 9 in basket 1 of user A) the attention scores are relatively larger. This is reasonable, since it is likely to include irrelevant items if the item number of the basket is large. Moreover, we can observe that even for baskets which have the same length, the attention score distributions are also different. items for generating accurate basket representation.

VI. CONCLUSION
In this paper, we propose a Hybrid-Preference Neural Model (HPNM) for basket-sensitive item recommendation, that incorporates a user's long-term and short-term preferences for item recommendation. In particular, an attention mechanism is utilized for distinguishing the importance of items in a basket when representing the basket. Our experimental results show our proposed model can outperform the state-ofthe-art baselines. In addition, the experimental results show that our proposed HPNM att can lead to a larger improvement against the baseline in terms of HLU and Recall on baskets with relatively fewer items. As future work, we would like to investigate the item details to generate an accurate basket presentation. For instance, we can take item's attributes into consideration. Moreover, it is potential for us to study the importance of different baskets in sequential purchase history, to distinguish them in modeling long-term preference.
In addition, we have interest in investigating different fusion strategies to combine the long-term and short-term preference for generating accurate user preference representation like in [19,40].