Hierarchical Alignment With Polar Contrastive Learning for Next-Basket Recommendation

Next-basket recommendation methods focus on the inference of the next basket by considering the corresponding basket sequence. Although many methods have been developed for the task, they usually suffer from data sparsity. The number of interactions between entities is relatively small compared to their huge bases, so it is crucial to mine as much hidden information as possible from the limited historical interactions for prediction. However, the existing methods mainly just treat the next-basket recommendation task as a single-view sequential prediction problem, which leads to the inadequate mining of the information hidden in multiple views, and the mining of other patterns in the historical interactions is neglected, thus making it difficult to learn high-quality representations and limiting the recommendation effect. To alleviate the above issues, we propose a novel method named HapCL for next-basket recommendation, which mines information from multiple views and patterns with the help of polar contrastive learning. A hierarchical module is designed to mine multiple patterns of historical interactions from different views at two levels. In order to mine self-supervised signals, we design a polar contrastive learning module with a novel graph-based augmentation approach. Experiments on three real-world datasets validate the effectiveness of HapCL.

go through all of items.In order to help users find the target item quickly, recommendation systems came into being.Recommendation systems try to mine useful information from the historical interactions and match items for users based on the mined information [1], [2], [3].Hence, the time that users spend in finding their target items is saved greatly.
The next-basket recommendation task, which aims to infer several items that the target user interacts with in the next basket based on the corresponding basket sequence, has drawn increasing attention [3], [4], [5], [6].Its success is mainly due to the fact that the order of user interaction over a period of time does not necessarily follow a strict chronological order, which is also an important difference between next-basket recommendation and sequential recommendation [7], [8], [9], [10], [11].However, a common issue in recommendation methods is data sparsity [12], [13], [14].The number of interactions between users and items is relatively small compared to their huge bases, which results in insufficient data for learning various representations, e.g., basket representations, in recommendation methods.It is crucial to make full use of the limited historical interactions.Some recommendation methods attempt to mine selfsupervised signals from the original data and employ contrastive learning to help recommendation models with learning highquality representations [4], [10].Since the data form of the historical interactions, which is time-aware, is not as easy to be augmented as images, the augmentation approaches in next-basket recommendation remain to be explored.More specifically, Fig. 1 shows a basket sequence consists of three baskets with various sizes.The order of items in every basket has no strict precedence, i.e., the order of items in every basket can be changed.The augmentation approaches, e.g., rotation, do not apply to timeaware data.In addition, the existing time-aware methods [3], [15], [16], [17] mainly concentrate on the sequential pattern of the historical interactions in a single view, which results in the inadequate mining of the information hidden in multiple views, and the mining of other patterns, e.g., graph pattern, in the historical interactions is neglected.Take the sequence that is shown in Fig. 1 for example, the lipstick and pressed powder in the first basket and the eyeliner in the second basket are all cosmetics.Sequential pattern mining pays attention to the view of category, so that it tends to recommend eye shadow in the following basket.But from the view of interest, the buying of the cosmetics can also imply the user attaches importance to appearance, where the perfume matches the preference too.And from the view of consumability, cosmetics and skin care Fig. 1.An example of the historical interactions in the basket sequence form with the length of 3. The three baskets have different sizes and the order of items in every basket can be changed.The sequence can be displayed at both basket-level and item-level.Next-basket recommendation aims at inferring the items belonging to the third basket based on the preceding two baskets.
products are consumable, which would not be changed during consumption and need to be re-purchased when they are used up, while hairpins can be reused and users always buy a variety of styles for recycling.The sequential methods that focus only on sequential pattern belittle the complex relationships between baskets and items, which are intricately correlated across different views and need to be mined from more patterns.Note that the concrete views mentioned above are hypotheses for the purpose of illustrating motivation.The views in real-world are implicit, and too convoluted to explain their meaning unambiguously.Different views can be explored and decoupled through the combination of different patterns.
All these deficiencies lead to insufficient mining of information from the limited historical interactions, thus making it difficult to learn high-quality representations and limiting the recommendation effect.In this paper, we propose a novel method for next-basket recommendation (named HapCL), which designs a Hierarchical alignment module and a polar Contrastive Learning module to alleviate the above issues.The HapCL method constructs a weighted bipartite graph based on the historical interactions to help with mining higher-order correlation between baskets and items.Basket sequences are encoded by a hierarchical module to mine the information at basket-level and item-level, respectively.At both levels, hidden information will be mined in multiple views, and be integrated into an overall probability prediction for capturing the complex relationship between baskets and items.In order to make the predictions at different levels supervise each other, they are constrained to achieve distribution alignment.Moreover, a novel graph-based augmentation approach and a polar contrastive learning strategy are designed for the mining of self-supervised signals.Every basket in basket sequences will be augmented to obtain a positive augmentation and a negative augmentation.The two augmentations are used for polar contrastive learning, which is an auxiliary task to help the next-basket recommendation task mine more hidden information from the limited historical interactions and learn high-quality representations for baskets and items.To sum up, the contributions of this paper are as follows: r We propose a novel HapCL method that designs hierar- chical alignment with polar contrastive learning for nextbasket recommendation.The method adopts a hierarchical framework to mine information from multiple views at both basket-level and item-level for taking full advantage of the limited historical interactions.
r We propose a graph-based augmentation approach to con- struct polar self-supervised signals for contrastive learning.The polar contrastive learning is applied as an auxiliary task to promote the next-basket recommendation.The impact of the polar self-supervised signals is validated by ablation study.
r We conduct extensive experiments on three real-word datasets to demonstrate the effectiveness of the proposed HapCL method.The rest of this paper is arranged as follows.In Section II, we will briefly introduce the research status by reviewing related work.In Section III, each part of the proposed method will be detailed.In Section IV, extensive experiments will be designed and conducted to confirm the effectiveness of the proposed method.In the end, we will draw the conclusion of this paper in Section V.

II. RELATED WORK
In this section, we briefly review three tasks related to our work, namely graph-based recommendation, next-basket recommendation and contrastive learning.

A. Graph-Based Recommendation
Graph is a form of data which can represent the association between nodes through the connection between them, which well fits the association between entities in recommendation.Some works have realized the consistency, and as deep neural networks have proven to be powerful in data mining in recent years, they have proposed some graph-based deep learning methods that model the association between entities by graph and learn representations for nodes in graph for recommendation.[18] employs random walk to extract some node sequences from graph.All the sequences are regarded as a corpus where each node is encoded into an embedding.[19] and [20] design graph convolutional neural network for learning the representations of the nodes in graph, and recommend items for users according to the similarity between the representations.Considering the different importance of different nodes, [21] introduces attention mechanism into graph convolution neural network, which assigns different nodes with different weights when aggregating neighbor nodes.[1] constructs a directed graph for item sequences and employs self-attention layers to assign different weights for the items in sequences.Moreover, in order to provide more information for model to learn high-quality representations, some methods [5], [22], [23] attempt to add additional information, e.g., attributes of items, properties of associations, and social relations of users, into graph, which are outside the scope of this paper where we focus on how to make the best use of the limited historical interactions.

B. Next-Basket Recommendation
With the assumption that the shopping preference of users and the correlation between items are often reflected in the order of the historical interactions, temporal recommendation methods are proposed to mine the sequential pattern hidden in interactions and make recommendation based on it.Temporal recommendation can be divided into two main categories according to the units of sequences: sequential recommendation which aims at recommending next item based on the item sequence and basket recommendation which aims at recommending next basket based on the basket sequence.Note that the baskets in basket sequences consist of several items, and have no fixed size.Although the units of sequences in these two tasks are different, the solutions for them share many similarities.[24] combines the advantages of matrix factorization and Markov chain to consider both the overall taste of users and their current preferences.[6] encodes basket sequence by recurrent neural network to learn the dynamic representation of users, and captures sequential features among baskets.With the hypothesis that the correlation between items would help to infer more coherent basket recommendation, [3] mines the correlation and infers the next basket with the correlation for a basket sequence by a hierarchical network, thus validating the hypothesis.Some studies have noted that most methods in next-basket recommendation focus on the mining of sequential patterns but ignore other patterns, e.g., graph pattern.[25] models the relationship among the items, users and baskets by a heterogeneous graph, and regards the next-basket recommendation task as a link prediction task.However, the methods that can integrate the superiority of different patterns effectively remain to be explored.

C. Contrastive Learning
Although deep learning has achieved great success in various fields in recent years, the issue of data sparsity still plagues many deep models.In the recommendation field, the interaction data between users and items is relatively small compared to their huge bases, but it requires a lot of labeled data to train the large number of parameters in deep models, e.g., historical interactions.The conflict between the supply of labeled data and the demand of models requires us to make full use of the limited historical interactions and mine as much information as possible.Contrastive learning, which mines self-supervised signals from unlabeled data by constructing contrastive pairs, has been widely used in some fields for alleviating the issue caused by sparse data.The approach of data augmentation and the way of constructing contrastive pairs are the key problems in contrastive learning.In the filed of computer vision, [26] proposes a texture-based augmentation and a patch-based augmentation to generate negative samples which preserve superfluous instead of the semantic features for contrastive learning.[27] introduces Gaussian noise to perturb the feature embedding during training, thus mining self-supervised signals.When it comes to the filed of recommendation, [10] performs data augmentation by masking or segmenting the items in item sequences, and learns the representations of items and attributes with the correlation among attributes, items, segments and sequences.[4] derives two sub-basket sequences at item-level according to the relevance among the items, and encodes the sequence pairs for contrastive learning.Although there have been some successful attempts in making use of contrastive learning to benefit the recommendation tasks, the research on combining contrastive learning with the next-basket recommendation task for addressing the issues caused by data sparsity is still limited.

III. THE PROPOSED METHOD
In this section, we first formalize the next-basket recommendation problem.Then, we represent an overview of the proposed HapCL method, followed by describing the two main modules of this method, i.e., Hierarchical Alignment for Next-basket Recommendation and Graph-based Polar Contrastive Learning.Finally, the multi-task learning strategy is described.

A. Problem Formulation
The main notations used in this paper are summarized in Table I. Let

B. Overview
In order to mine as much information as possible from the historical interactions, and alleviate the issues caused by data sparsity, we propose the HapCL method as shown in Fig. 2. The method models historical interactions by a weighted bipartite graph to mine graph pattern, and designs a module with hierarchical framework to mine sequential pattern at basket-level and item-level.At both levels, every sequence is encoded to multiple representations to explore hidden information from different views.With the expectation that predictions at different levels can help supervise each other, distribution alignment is applied to put a constraint on them.Furthermore, we design  a novel graph-based augmentation approach to construct positive augmentations and negative augmentations of baskets for polar contrastive learning, which is an auxiliary task to help with mining self-supervised signals and learning high-quality representations of baskets and items for recommendation.

C. Hierarchical Alignment for Next-Basket Recommendation 1) Graph-Based Embedding:
To extract higher-order correlation between baskets and items from the historical interactions, we construct the weighted bipartite graph G = (V, E), where V = P∪I denotes the set of nodes, and E ⊆ {(b, i)|b ∈ P ∧ i ∈ I} denotes the set of edges that represent the association between baskets and items.The weight of every edge in G is assigned with the number of occurrences of the corresponding basket.For example, there are historical interactions of two users in Fig. 3, which contains 4 baskets involving 9 items.The bipartite graph can be constructed by adding edges between every basket node and the nodes of items assembled into it.Since the basket b 1 occurs twice in this example, the weight of the edges connected to b 1 is assigned with 2, while the weight of the edges connected to the other baskets that only occur once is assigned with 1.The graph can help with the propagation of correlation between baskets and items, and the aggregation of higher-order information for their embeddings.
For the basket node b in G, it can be represented by the embedding e b ∈ R 1×d , where d denotes the dimension.Similarly, e i ∈ R 1×d represents the embedding of the item node i in G.In order to propagate information between nodes and aggregate higher-order information for updating embeddings, as well as reduce computational cost, we perform the light graph convolution operation as follows [20]: where w ib denotes the weight of edge connecting item node i and basket node b, N i denotes the set of basket nodes connecting with item node i (i.e., the baskets containing item i), N b denotes the set of item nodes connecting with basket node b (i.e., the items contained by basket b), and e b and e i denote the embedding of basket b and item i after l layers' propagation, respectively.Note that the more times basket b appears in the historical interactions, the higher the weight of w ib , which indicates the frequency that items belong to the same basket, so the correlation between items and basket should be hidden in the weights and mined by iterative propagation and aggregation.
Only the embeddings in the 0th layer, i.e., e (0) b for baskets and e (0) i for items, will be trained, and the embeddings in higher layers will be updated according to them.After L iterations, we can obtain additional L embeddings for every basket and every item, which encode higher-order information.They will Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I SUMMARY OF THE MAIN NOTATIONS IN THIS PAPER
be further combined for the final embeddings as follows: Since the graph G is sparse, it is difficult to learn high-quality embeddings for items and baskets.To alleviate the issue, we design a novel graph-based augmentation approach for polar contrastive learning in Section III-D to derive self-supervised signals based on the original graph.
2) Hierarchical Multi-Head Predictor: For tasks with sparse data, it is crucial to mine as much hidden information as possible from the data.It is obvious that the basket sequence .
The basket sequence S u and its expanded version S (u,I) are the different formats of the same historical interactions, but they contain the information at different levels, i.e., basket-level and item-level.It is worth mentioning that items in any basket in S (u,I) have no strict chronological order, which is different from item sequence with strict chronological order in sequential recommendation methods.The order of the historical interactions reflects the dynamic interest of users.We employ Gated Recurrent Unit (GRU) [28] to encode the historical interactions for the purpose of capturing the dynamic interests as follows: where e ∈ R 1×d denotes the embedding of the tth basket in the basket sequence of the user u at basket-level S u , s denotes the embedding of the tth item in the basket sequence of the user u at item-level S (u,I) , and s ∈ R 1×d m encodes the preceding t items, with d m denoting their hidden dimensions.The final representation of sequence S u will be denoted by s (u,B) , i.e., s (u,B) |S u | , which encodes the dynamic interests of the user u at basket-level.Similarly, the final representation of sequence S (u,I) will be denoted by s (u,I) .Note that the GRU layer can be replaced with any layer capable of capturing sequential information, e.g., LSTM [29] and Transformer [30].Considering that we adopt GRU4Rec [31] as a baseline method in our experiments, we select GRU for HapCL in this paper to serve as a control to reflect the performance of HapCL itself.
With the representations s (u,B) and s (u,I) , the probability that item i is in the next basket following S u can be inferred as follows: where g (u,B) i and g (u,I) i denote the probability at basket-level and item-level, respectively.
Inspired by [30], we map the embeddings of baskets and items into multiple hidden spaces with multi-head, so as to explore hidden information from different views.Thus, every sequence will infer multiple basket-level probabilities g (u,B) (i,1) , . . ., g and multiple item-level probabilities g (u,I) (i,1) , . . ., g (u,I) (i,H) for item i, where H denotes the number of heads.It's worth noting that different views are not independent of each other, as they pay attention to each other in different degrees.In order to get closer to the fact, we introduce attention mechanism for the inference of probabilities as follows: where α B kh and α I kh denote the attention weight of the kth view over the hth view at basket-level and item-level, respectively.They will be calculated by where s denote the basket-level and item-level representations of S u in the kth view.
Because the probabilities inferred based on a context from different views have different importance in the final probability, it is indispensable to assign different weights to them.We perform the combination of the probabilities as follows: where w B h and w I h denote the weight of the hth view at basketlevel and item-level, respectively.
As we obtain the two probabilities at different levels by the hierarchical network, the final prediction of the probability that item i is in the next basket following S u can be calculated as follows: The next-basket recommendation task will be optimized with the L rec defined as follows: where B + u denotes the set of items in the next basket following S u , while I/B + u denotes the set of items that are not in the next basket following S u .And ŷu i denotes the label corresponding to y u i , with ŷu i being equal to 1 if item i ∈ B + u and 0 otherwise.
3) Distribution Alignment: It is obvious that the prediction inferred from the same historical interactions have the same target, i.e., the probability y (u,B) i inferred at basket-level and the probability y (u,I) i inferred at item-level both indicate the probability that item i is in the next basket following S u .Therefore, we align the probability distributions of the two levels by the following constraint: where T denotes all the user-item pairs in the training set.

D. Graph-Based Polar Contrastive Learning
In this section, we will introduce the novel graph-based augmentation approach for polar contrastive learning in detail.When it comes to contrastive learning, how to perform data augmentation and how to construct self-supervised signals are two key points.Hence, we will elaborate the approach by focusing on the two key points.
1) Basket Augmentation: The simple intention of data augmentation is to transform the original data by adding some noises while retaining most of the information, so as to help with the learning of high-quality representations.For basket b, we can distinguish its relevance to other items by According to the relevance, we can find the item set I pos b , which consists of the K items most relevant to basket b, as well as the item set I neg b that is least relevant.With the item set I pos b of each basket, the positive graph G pos can be constructed by adding edges, which connect basket b and items i ∈ I pos b , to the original graph G. Take the positive graph G pos that is in Fig. 2 for example, the size of item set I pos b of each basket is set to 1, e.g., {i 9 } is the item set corresponding to basket b 3 , so the edge connecting i 9 and b 3 is added, which is shown by a dotted line.Similarly, the negative graph G neg can be constructed with the item set I neg b of each basket.Specifically, the weights of the additional edges are set to 1.
2) Polar Contrastive Learning: By iterative propagation and aggregation based on the positive graph G pos and the negative graph G neg , i.e., a pair of polar augmented graphs, we can get the positive embedding e pos b and the negative embedding e neg b of basket b, respectively.The multiple positive representations {p u 1 , . . ., p u H }, as well as the multiple negative representations {n u 1 , . . ., n u H } of sequence S u at basket-level can be calculated with the embeddings.We concatenate the representations as follows: which are regarded as the polar contrastive pair of basket sequence S u for mining self-supervised signal.Considering a mini-batch with z basket sequences, we can obtain z polar contrastive pairs.With the expectation that a polar contrastive pair from the same basket sequence will be Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.more consistent with each other than that from different basket sequences, the objective of polar contrastive learning is defined as follows: where C − denotes the set of representations of sequences in the polar pairs except p u and n u .

E. Multi-Task Learning
Since the polar contrastive learning is introduced for helping next-basket recommendation task learn high-quality representations, we optimize them jointly by a multi-task learning strategy as follows: where λ 1 , λ 2 and λ 3 denote the trade-off weights of the losses, Φ denotes the learnable parameters of HapCL, and •2 denotes L2 regularization with λ 4 being the regularization coefficient.

IV. EXPERIMENTS
In this section, we conduct extensive experiments on three real-world datasets to evaluate the effectiveness of the proposed HapCL method.The experiments are designed to answer the following research questions: RQ1.How does the proposed HapCL method perform compared to the state-of-the-art baselines in the next-basket recommendation task?RQ2.Whether the different components of the HapCL method benefit the performance?RQ3.Can the polar contrastive pairs provide stronger selfsupervised signals for the HapCL method?RQ4.How do the key hyper-parameters, i.e., the number of additional edges in augmentation K, the number of graph convolution layers L and the number of heads in multi-head H, affect the performance?
A. Experiments Settings 1) Datasets: We adopt three real-world datasets in our experiments, namely Beauty,1 Grocery 2 and Tafeng. 3Beauty and Grocery consist of the interactions of subcategory "Beauty" and subcategory "Grocery" on Amazon, which is a famous e-commercial platform, respectively.And TaFeng contains the transaction data of a Chinese grocery store.Following [3], which is a well-known next-basket recommendation method, the users and the items with less than 10 interaction records are discarded for all datasets.We generate a basket sequence for each user by sort the corresponding interactions according to the timestamp, and filter out the sequences with length fewer than 3.The statistics of the three datasets after preprocessing are 2) Baselines: To verify the effectiveness of the HapCL method, we compare it against the following eight baselines: r POP is a non-personalized method which recommends items with the greatest number of the historical interactions for each user.
r GRU4Rec [31] introduces recurrent neural network to model sequential information hidden in the historical interactions.
r STAMP [32] is a short-term memory priority model, which captures general interests of users from the long-term memory of a session context, and captures current interests from the short-term memory.
r SASRec [16] is a self-attention based model, which can not only capture long-term semantics, but also make prediction based on relatively few actions with an attention mechanism.
r NextItNet [33] consists of a stack of holed convolutional layers, and it can learn representation from both short-range and long-range item dependencies.
r LightSANs [7] extracts a number of latent interests by the low-rank decomposed self-attention, and generates the context-aware representation by making use of itemto-interest interaction.
r CLEA [4] designs a denoising generator to extract items relevant to the target item automatically, and proposes a two-stage anchor-guided contrastive learning to guide relevance learning.
r Beacon [3] encodes baskets with a correlation matrix to take into account the relative importance of items and correlations among item pairs.r SINE [34] attempts to infer the set of concepts for each user adaptively, and predict current intention of users.
r GCSAN [1] is a graph-based method which utilizes graph neural network to capture local dependencies and selfattention mechanism to learn long-range dependencies for prediction.These baselines can be divided into three groups based on the type of technology involved: (1) the traditional methods that do not consider sequence relationships, i.e., POP; (2) the methods focusing on sequential pattern, i.e., GRU4Rec, STAMP, SAS-Rec, NextItNet, LightSANs, CLEA and Beacon; (3) the methods that introduce graph structure to facilitate sequence modeling, i.e., SINE and GCSAN.It should be noted that the models which require user profile are not selected for the experiments, because the users in the next-basket recommendation task are anonymous.
3) Evaluation Metrics: The baskets of the same user are spliced into a basket sequence in chronological order for evaluation.Following [16], we adopt the leave-one-out strategy to evaluate the performance of the methods, which will provide each user with M items as a basket for evaluation.We employ the widely used F1-score@M (F1@M ) [3], [35], [36] and Hit Ratio@M (HR@M ) [10], [25], [37] as evaluation metrics in our experiments where M is set to 5 and 30.
4) Parameter Settings: For a fair comparison, we initialize all embedding parameters randomly in the range of (0, 1), and optimize them by Adam optimizer with learning rate of 1e − 2. The mini-batch size and embedding size are set to 32.For the key hyper-parameters of the HapCL method, we tune the number of additional edges in augmentation K and the number of graph convolution layers L within the range of {1, 2, 3, 4, 5}, the number of heads in multi-head H within the range of {2, 3, 4, 5, 6}.All the parameters are tuned on the validation set with early stopping whose patience is set to 10 epochs, and the results on the testing set are reported.

B. Overall Performance Comparison
The experimental results of different methods are summarized in Table III.
Here, we have the following observations: r It is obvious that the proposed HapCL method shows com- petitive performance against the baselines on the three realworld datasets (RQ1).Although SINE and GCSAN both introduce graph structure to facilitate sequence modeling, SINE shows little change in performance among the three datsets, while GCSAN achieves relatively competitive performance.This indicates that the mining of multiple patterns can not guarantee the improvement of performance.
We need to combine different patterns carefully so as to enhance the data mining ability of the method and achieve better performance together.
r Making use of the order of the historical interactions would help the methods mine more information from the limited historical interactions, and facilitate the modeling of relationship between baskets and items.The methods focusing on sequential pattern and the methods that introduce graph structure show better performance than the traditional method that ignores time information of the historical interactions.Specifically, POP achieves the worst performance due to underutilization of data.
r The performance improvement tends to be much larger on Beauty than that on Grocery and Tafeng.And Beauty is a relatively sparse dataset, while Grocery and Tafeng are relatively dense.The results imply that the proposed method could alleviate the issue of data sparsity.Note that the Tafeng dataset contains the largest average size of baskets, the largest average length of basket sequences and the fewest items among the three datasets.And all the methods tend to show the best performance on the Tafeng dataset, while Beauty at the other extreme, which confirms that data sparsity would lead to poor performance.

C. Further Analysis 1) Ablation Study:
We design the ablation study to validate the effectiveness of several major components of the proposed HapCL method.Since the part of hierarchical alignment and the part of polar contrastive learning are the two most important components of HapCL, we decide to make comparison with the following variants: r HapCL −cl : We remove the part of polar contrastive learn- ing on the basis of HapCL to examine its validity.The embeddings of the baskets and the items will be obtained by the iterative propagation and aggregation based on the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IV PERFORMANCE COMPARISON OF HAPCL AND ITS TWO VARIANTS
original weighted bipartite graph, and the positive graph and the negative graph will not be constructed for the mining of the self-supervised signals.
r HapCL −ali : We remove the part of distribution alignment on the basis of HapCL to examine its validity.The predictions of the probability at basket-level and the predictions of the probability at item-level will not be required to be close to each other, but will be merged into the final predictions directly.The results obtained by HapCL and its two variants on the three datasets are reported in Table IV.
We can see the performance degradation in different degrees of the variants, which validates the effectiveness of the two components (RQ2).Moreover, on the Beauty dataset, whose data is relatively sparse among the three datasets, the absence of either of the two important components, i.e., the part on distribution alignment and the part on polar contrastive learning, tends to have a greater impact on the performance of the HapCL method.This indicates that both components enhance mining information to help with the learning of better representations, thus improving the performance.
The addition of the part on the polar contrastive learning makes the HapCL method achieve performance improvement up to 39.39%.This suggests that the polar contrastive pairs indeed provide useful self-supervised signals, and thus enable HapCL to mine representative patterns and learn high-quality representations.In order to further verify the superiority of polar contrastive pairs, we design the other two types of contrastive pairs for experiments, and the experimental results are reported in Table V.The contrastive type "Ori-Pos" denotes that selfsupervised signals are mined based on the original graph and the positive graph, the contrastive type "Ori-Neg" denotes that selfsupervised signals are mined based on the original graph and the negative graph, and the contrastive type "Pos-Neg" denotes that self-supervised signals are mined based on the positive graph and the negative graph.From the results of the three different contrastive types, we can find that the polar contrast, i.e., the contrastive type "Pos-Neg", shows overwhelming superiority over the other two contrastive types in all metrics on the three datasets.The results confirm the hypothesis that compared with non-polar contrastive types, i.e., the contrastive type "Ori-Pos" and the contrastive type "Ori-Neg", the polar contrastive pairs will be more conducive to mine representative patterns (RQ3).It is worth noting that the performance of the contrastive type "Ori-Pos" is generally worse than that of the contrastive type "Ori-Neg", suggesting that telling the method what is negative information is more important than telling the method what is positive information.
On the three datasets, the performance of the HapCL method degrades by up to 33.33% due to the absence of the part of the distribution alignment.This indicates that the distribution alignment allows the predictions of probability at different levels, i.e., basket level and item level, to guide each other so as to switch to the right direction.The finding will be further verified in Section IV-C-2.
2) Hyper-Parameter Analysis: The number of additional edges in augmentation K, the number of graph convolution layers L and the number of heads in multi-head H are the three key hyper-parameters of the HapCL method.We will analyze the impact of the hyper-parameters on the performance of HapCL in this section (RQ4).
In the part of basket augmentation, the HapCL method tries to construct polar contrastive pairs by adding new edges to connect baskets and items.By fixing the number of graph convolution layers L at 1 and the number of heads in multi-head H at 4, we test the effect of the number of additional edges in augmentation K within the range of {1, 2, 3, 4, 5}.The performance of the HapCL method with different K is plotted in Fig. 4. The lines in dark green denotes the results of HapCL.It can be observed that the HapCL method tends to show the best performance when K is 1, and when K is larger than 1, the performance decreases.Moreover, we can see that more stable performance is achieved when K is fewer than 3. Considering that the average basket size of Beauty is 4.12, it suggests that there should not be too many additional edges, as too many new edges would obscure the extraction of relationship between basket and the corresponding items.
Graph convolution is responsible for the information propagation and aggregation among the nodes of the weight bipartite graph.The more layers of graph convolution, the more higher-order information would be contained in the embedding of each node.By fixing the number of the additional edges K at 2 and the number of heads in multi-head H at 4, we test the effect of the number of graph convolution layers L within the range of {1, 2, 3, 4, 5}.The performance of the HapCL method with various L is plotted in Fig. 5.The lines in violet denotes the results of HapCL.The performance of the HapCL method reaches the optimal value when L is 2, and decreases when the number of graph convolution layers is larger than or smaller than 2. This makes sense because too few layers lead to insufficient mining of the graph pattern, while too many layers make too much noise that disrupts the extraction of the information.
The diversity of the views on information mining is dominated by the number of heads in multi-head H.By fixing the number of the additional edges K at 2 and the number of graph convolution layers L at 1, we test H within the range of {2, 3, 4, 5, 6}.The performance of the HapCL method with various H is plotted in Fig. 6.The lines in turquoise denotes the results of HapCL.The performance of HapCL fluctuates, and it is relatively good when the value of H is 2 and 5. Since the meaning of a view is not artificially defined, but is divided by HapCL itself based on a given number of views, the model will show good performance when the granularity of decomposition is appropriate for the information contained in the data, while an inappropriate granularity of decomposition will harm the model's performance.
In addition, the performance of HapCL −ali with the different number of additional edges K, the different number of graph convolution layers L and the different number of heads H are also plotted in Figs. 4, 5 and 6 with the color of khaki, olive and orange, respectively.It is clear that most of the results of HapCL −ali are worse than the corresponding results of HapCL, where the same variation of the hyper-parameters would lead to greater fluctuation in the performance of HapCL −ali .The above analysis further verifies that the distribution alignment can not only achieve mutual supervision and improve the performance, but also enhance the stability of the method.
3) Attention Mechanism Analysis: The attention mechanism is introduced in hierarchical multi-head predictor to boost the interchange of information among different views.While Fig. 6 shows the impact of the numbers of heads in multi-head, we further investigate the effectiveness of the introduced attention mechanism (RQ2) in this part.Two anonymous users are selected from the testing set of Beauty randomly, and their historical interactions are shown in the form of basket sequence in Fig. 7, where the items are represented by IDs.The basket sequence length of the first user is 2 and that of the second user is 4. We set the numbers of heads in multi-head H to 4, so as to obtain an attention weight matrix within size 4 × 4 at basket-level and item-level, respectively.The matrices are displayed by heatmaps in Fig. 8 for a more intuitive observation.The cell in the ith row and jth column denotes the attention weight of the ith view over the jth view.Note that the attention of the ith view over the jth view and the attention of the jth view over the ith view are not equivalent.We can find that the views tend to pay more attention to themselves at basket-level, while the views at item-level tend to obtain similar weight from different views.This phenomenon indicates diverse characteristics of different formats of the same historical interactions, and the baskets that aggregate the information of items can generate the representations with more discrimination.The combination of sequential patterns at different levels is beneficial to the utilization of the limited historical interactions.

V. CONCLUSION
Since sparse data impedes the performance of recommendation methods, it is necessary to facilitate the mining of the limited historical interactions.The existing next-basket recommendation methods mostly neglect the fact that the information of the historical interactions is hidden in multiple views, and treat the next-basket recommendation task as a single-view sequential prediction problem.In this paper, we propose a novel method that combines hierarchical alignment with polar contrastive learning for next-basket recommendation (named HapCL).The historical interactions are modeled by a weight bipartite graph for information propagation and aggregation.And a module with hierarchical framework is designed to mine information from multiple views at two levels, i.e., basket-level and item-level.The probabilities inferred at the two levels are aligned and integrated into the final prediction.In order to mine self-supervised signals from the original data, we design a novel graph-based augmentation approach for constructing polar contrastive pairs.Extensive experiments on three real-world datasets validate the effectiveness of the proposed HapCL method in mining more information for the next-basket recommendation task.
denote the set of items and U = {u 1 , u 2 , . . ., u |U | } denote the set of users, where |I| and |U | represent the number of items and users, respectively.Each basket b = {i b 1 , i b 2 , . . ., i b |b| } is composed of |b| items in I that are interacted by the same user over a period of time, and the basket size |b| may be different from one basket to another.Let P = {b 1 , b 2 , . . ., b |P| } denote the set of baskets, where |P| represents the number of baskets.And the historical interactions of user u ∈ U can be denoted by basket sequence S u = [b u 1 , b u 2 , . . ., b u |S u | ] in chronological order, where |S u | represents the length of S u .For every anonymous user u ∈ U, the next-basket recommendation aims at the inference of |b |S u |+1 | items that user u will interact with in the next basket based on basket sequence S u , where |b |S u |+1 | is the size of the next basket.

Fig. 2 .
Fig. 2.An overview of the architecture of the proposed HapCL method, which can be divided to two parts, i.e., Hierarchical Alignment for Next-basket Recommendation and Graph-based Polar Contrastive Learning.

Fig. 3 .
Fig.3.An example for the construction of weighted bipartite graph.The two sequences contain 4 baskets involving 9 items, and the corresponding weighted bipartite graph is shown in the grey block.

Fig. 4 .Fig. 5 .
Fig. 4. The performance of HapCL and HapCL −ali with different numbers of additional edges in augmentation, which is denoted by K, on Beauty.The left subfigure corresponds to F1@M and the right subfigure corresponds to HR@M .The K varies in {1, 2, 3, 4, 5}.The lines in dark green denote the results of HapCL and the lines in khaki denote the results of HapCL −ali .

Fig. 6 .Fig. 7 .
Fig.6.The performance of HapCL and HapCL −ali with different numbers of heads in multi-head, which is denoted by H, on Beauty.The left subfigure corresponds to F1@M and the right subfigure corresponds to HR@M .The H varies in {2, 3, 4, 5, 6}.The lines in turquoise denote the results of HapCL and the lines in orange denote the results of HapCL −ali .

Fig. 8 .
Fig. 8. Heatmaps of attention weights of two anonymous user across different views on Beauty.The upper left subfigure corresponds to the weights of the first user at basket-level, the upper right subfigure corresponds to the weights of the first user at item-level, the lower left subfigure corresponds to the weights of the second user at basket-level and the lower right subfigure corresponds to the weights of the second user at item-level.The x-axis and y-axis, which vary within the range of {1, 2, 3, 4}, denote the serial number of views.The darker color of the cells represents the higher weight and the specific values are marked in the corresponding cells.

TABLE II DATASET
STATISTICSsummarized in TableII.For each dataset, we divide the basket sequences into training set, validation set, and testing set in the ratio of 6:2:2.

TABLE III OVERALL
PERFORMANCE COMPARISON OF HAPCL AGAINST THE BASELINES.THE BEST RESULTS OBTAINED BY BASELINES ARE UNDERLINED, AND THE RESULTS OBTAINED BY THE PROPOSED METHOD ARE IN BOLD.THE PERFORMANCE IMPROVEMENT (IMPROV.)IS DEFINED AS IMPROV.= OUR RESULT−THE BEST BASELINE RESULT THE BEST BASELINE RESULT

TABLE V PERFORMANCE
COMPARISON OF DIFFERENT CONTRASTIVE TYPES