Time-Aware Multibehavior Contrastive Learning for Social Recommendation

The social relationships among users can be effectively represented using graph structures, which has led to increasing interests in utilizing graph neural networks (GNNs) for social recommendation. However, there are still some inevitable issues in the existing methods: 1) The problem of sparse supervision signals in the GNN-based recommendation models has not been well addressed. 2) The existing social recommendation methods often neglect the guiding effect of the auxiliary behaviors on the target behaviors, where only the single target behavior data are used for model training. 3) In the GNN-based social recommendation algorithms, the dynamics of recommendations are rarely considered. To address these issues, this article proposes a time-aware multibehavior contrastive learning framework. To achieve better-personalized recommendation, we perform representation learning from multiview perspectives, incorporating temporal information and multibehavior interactions into the social recommendation. A time-aware GNN is then designed to model the dynamic dependency relationships between users and items, by which the dynamics of recommendations can be enhanced. Meanwhile, we propose a multibehavior contrastive learning framework to rationalize the use of multibehavioral data and address the problem of sparse supervision signals. Extensive experiments on three real-world datasets further validate the superiority of our method, where the maximum improvement can reach 6.14% in terms of NDCG@5.


Time-Aware Multibehavior Contrastive Learning for Social Recommendation
Chuyuan Wei , Member, IEEE, Chuanhao Hu , Chang-Dong Wang , Senior Member, IEEE, and Shuqiang Huang Abstract-The social relationships among users can be effectively represented using graph structures, which has led to increasing interests in utilizing graph neural networks (GNNs) for social recommendation.However, there are still some inevitable issues in the existing methods: 1) The problem of sparse supervision signals in the GNNbased recommendation models has not been well addressed.2) The existing social recommendation methods often neglect the guiding effect of the auxiliary behaviors on the target behaviors, where only the single target behavior data are used for model training.3) In the GNN-based social recommendation algorithms, the dynamics of recommendations are rarely considered.To address these issues, this article proposes a time-aware multibehavior contrastive learning framework.To achieve better-personalized recommendation, we perform representation learning from multiview perspectives, incorporating temporal information and multibehavior interactions into the social recommendation.A time-aware GNN is then designed to model the dynamic dependency relationships between users and items, by which the dynamics of recommendations can be enhanced.Meanwhile, we propose a multibehavior contrastive learning framework to rationalize the use of multibehavioral data and address the problem of sparse supervision signals.Extensive experiments on three real-world

I. INTRODUCTION
A CCORDING to the user's social relationship, it is often more accurate to obtain the user's preferences, making the social recommendation receive extensive attention in recent years.Generally, the traditional social recommendation mainly focuses on methods based on matrix factorization [1], [2].However, due to the diversity and complexity of data forms and the sparseness of data content, the traditional social recommendation systems are no longer capable of providing personalized recommendations.
On account of the excellent performance of neural network technology, many researchers have attempted to utilize these techniques for social recommendation.Especially in some studies, the attention mechanism is used for social relationship modeling.For example, Chen et al. introduced [3] the attention mechanism and designed the adaptive transfer neural network EATNN to capture the mutual relationship between different users.In Diffnet++ [4], multilevel attention network was designed to learn user representations in social and collaborative domains.Meanwhile, the social recommendation methods based on graph neural networks (GNNs) have gained widespread attention due to the fact that users' social relationships can be clearly stored in graph structures and the excellent performance of GNNs in graph representation learning can be achieved.In SR-HGNN [5], global social context information was extracted through a hierarchical GNN based on a mutual information learning paradigm between low-level user embeddings and highlevel global representations.Song et al. [6] used dynamic graph attention neural networks to model contextually relevant social influences as well as dynamic user behaviors.Although social recommendation models based on GNNs have achieved encouraging performances, the inherent supervisory signal sparsity in GNN is still not well addressed.
Furthermore, some studies [5], [7] point out that most of the current social recommendation methods are applicable to modeling a single type of interaction between users and items, but the user-item interaction behaviors in real scenarios are often diverse and time-sensitive.Given the variety of behaviors [8], the recommendation models must be capable of modeling user's preferences from various behaviors, and recommendations must be updated over time.Besides, considering the variety of user behaviors, more and more researches on multibehavior recommendations are made [9], [10].However, most multibehavior recommendation systems fail to consider the guiding effect of auxiliary behaviors to target behaviors (e.g., purchase behaviors), where only the target behavior data are used to train the model.Accordingly, in the recent research [9], it has shown that the auxiliary behaviors are extremely useful in modeling user behaviors.The timeliness of user behaviors has been extensively studied in sequential recommendation, whereas few works consider the dynamics of social recommendations in the GNN-based recommendation algorithms.Some recent studies [6], [10] have taken time series information into account in social recommendation, but they are all limited to modeling a single type of user-item interaction, making it difficult to achieve optimal model performance.
To address the aforementioned issues, this article proposes a time-aware multibehavior contrastive learning framework (TM-BCL) for social recommendation.In contrast to the traditional social approaches that only consider user-item interactions and user social relationships, our model takes item dependencies into account as well, by which more semantic information can be captured.To be specific, three essential components, i.e., the user social network, item contact network, and user-item multibehavior interaction graph, are designed.We can then extract user-level and item-level subgraph structures, respectively, from the user social network and item contact network, and perform recursive embedding propagation of user-item information and interaction times through a time-aware GNN, which further models the dynamic dependencies between different behaviors.In addition, a multibehavior contrastive learning module is devised to address the problem of sparse supervised signals.
The main contributions of this article are as follows.1) We emphasize the timeliness of user behaviors, taking temporal data as input and proposing a time-aware GNN to model dynamic behavioral dependencies.2) We design a multibehavior contrastive learning framework to capture the fine-grained differences between different types of user behaviors, considering the guidance of auxiliary behaviors toward the target behaviors.

3) Extensive experiments consisting of comparison experi-
ments and ablation study on three real-world datasets are conducted to demonstrate the effectiveness as well as the necessity of our model.The rest of this article is organized as follows.We provide a brief overview of related work in Section II, followed by a statement of the problem definition and our proposed method in Sections III and IV.Experimental comparisons and analysis are presented in Section V. Finally, Section VI concludes this article.

A. GNN-Based Recommendation Models
GNNs have emerged as a powerful paradigm in recommendation systems, and they aim to capture intricate dependencies and interactions among nodes within graphs, making them particularly suitable for modeling complex user-item relationships.KGAT [11] leverages the knowledge graph as auxiliary information and employs graph attention network (GAT) to capture higher-order relationships through recursive propagation.NGCF [12] integrates user-item interactions into embeddings using graph convolutional network (GCN), explicitly considering high-order connectivity between users and items.TAGNN [13] enhances session-based recommendation by dynamically activating user interests based on different target items, enabling the generation of diverse representation vectors and enhancing model expressiveness.
However, in the case of GCN, feature transformations and nonlinear activation operations do not directly benefit collaborative filtering (CF).This concept is demonstrated by He et al. [14] through ablation experiments, motivating the proposal of LightGCN as a simplified version of GCN.Subsequently, numerous researchers have extended the lightweight GCN model, combining it with advanced deep learning techniques to design more effective recommendation methods.For instance, SR-HGNN [15] employs a hybrid-order gated GNN to capture intricate dependencies, effectively mitigating the oversmoothing problem.LCFN [16] merges 1-D graph convolution with a low-pass collaborative filter (LCF), achieving efficient and effective feature extraction.MixGCF [17] combines the user-item graph structure with the aggregation process of GNNs and employs leap mixing techniques to generate challenging negative samples.

B. Social Recommendation
Social recommendation is a type of recommendation method based on user social relationships, with the core idea that user decisions are influenced by their surrounding friends.SoRec [2] and TrustMF [1] are two well-known traditional social recommendation approaches.SoRec factorizes potential users' feature matrix by rating and social relationship factorization.TrustMF decomposes the social trust network into trust space and trustworthy space, which simulates the interactions between users by factorizing the social trust network and maps users to two low-dimensional spaces.Later, SEREC was proposed by Wang et al. [18], which incorporated the concept of social exposure into the matrix factorization model.
Recent social recommendation works have focused on combining themselves with advanced neural network techniques.Since user social relations can be explicitly described by graphs, social recommendation based on GNN has gained more popularity.Fan et al. [19] first used GNNs for the social recommendation, proposing GraphRec to model social relationships.Song et al. [6] used a graph attention neural network to model user-related social relations, which are then combined with a recurrent neural network to learn users' current dynamic interests.Then, Diffnet [20] and its variant Diffnet++ [4] model the recursive social diffusion process of users to capture the implicit higher-order user relationships.S4Rec [36] is a social recommendation framework that integrates information from both semantic and structural views to enhance performance, where a deep graph model and a wide attentive SVD model Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
are utilized for rating prediction.KCGN [7] uses multibehavior modeling for the social recommendation, where the relationshipaware GNN is employed to capture social relationships containing different behavioral dependencies.Although graph-based social recommendation methods have achieved promising performances in the field of social recommendation, there are still limited researches on handling temporal data.

C. Contrastive Learning for Recommendation
As a self-supervised learning method, contrastive learning has achieved remarkable performances not only in the fields of natural language processing [22] and computer vision [23], [24] but also in the field of recommendation systems.Researchers have applied contrastive learning to various recommendation tasks, such as [25], where the authors proposed a self-supervised collaborative filtering framework (SelfCF) to solve the general recommendation task.KGCL [26] combines GNN with contrastive learning and uses knowledge graphs as auxiliary information.
The typical graph contrastive learning model, similar to SelfCF, generates new contrastive views using graph augmentation techniques and then performs contrastive learning between the views, which would, however, miss the original information [27].Therefore, some subsequent studies no longer use graph augmentation techniques to generate contrastive views.For example, Zou et al. [28] constructed local views and global views as contrastive views based on user interaction behaviors, and proposed a multilevel cross-view contrastive learning framework (MCCLK) to address KG-Aware recommendation tasks.Similarly, Ma et al. [29] constructed bundle views as comparison views, and designed CrossCBR to use the contrastive learning mechanism for the bundle recommendation task.Inspired by the existing contrastive learning recommendation models, we propose a novel multibehavior contrastive learning paradigm for social recommendation tasks.

III. PROBLEM FORMULATION
Given that U = {u 1 , u 2 , . .., u m } and V = {v 1 , v 2 , . .., v n } denote the set of users and items, respectively, we define the user-item matrix Y ∈ |U | × |V| to represent the different behaviors of users toward items.In the matrix Y, the element y i,j represents different interactions between user u i and item v j .At the same time, we also define the user-user matrix X ∈ |U | × |U | to represent the relationship between users, similar for the item-item matrix Z ∈ |V| × |V|.In addition, we construct the user social network G u , the item contact network G v as well as the user-item interaction graph G uv on the basis of the above matrix, and we store the graph in the form of the triple {h, e, t}, where h and t are graph nodes, e is the relationship between node h and t.Specifically, we further define the relevant graph-structured data as follows.
User Social Network: In order to capture the social relationship of users as auxiliary information, we define the user social network G u = {(u i , e i,j , u j ) | u i , u j ∈ U}, where e i,j represents the social relationships, and e i,j = 1 when there is a relationship between user u i and u j .
Item Contact Network: Similarly, to capture item dependencies as auxiliary information, we define the item contact network , where e i,j represents the connection between items v i and v j .And we have e i,j = 1 when both items have the same interaction behavior with the same user or similar attributes (e.g., color, type, etc.).
User-Item Multibehavior Interaction Graph: On the basis of the user-item matrix Y, we define the user-item multi-behavior graph , where e i,j represents the interaction type of the user u i with item v j , and e i,j = k when the user has the kth interaction with a certain item.In particular, to capture the dynamic preferences of the user, we extract the timestamp information of each interaction behavior.Meantime, a time vector T of the same shape is defined to store the information.
Task Statement: Given the user-user matrix X , the user-item matrix Y, the item-item matrix Z, and the user-item interaction time data T , our social recommendation task is to capture dynamic dependencies and accurately predict items that users may interact with.

IV. PROPOSED METHOD
Fig. 1 depicts the specific architecture of TMBCL, where the user-item multibehavior graph is utilized as the main view, and the user social network and item contact network serve as the auxiliary views.Specifically, the model consists of the following modules.
1) Main View Representation Learning Module: In this module, user-item multibehavior interactions and time information are taken as input.First, the time information is encoded, and then a time-aware GNN is employed to learn dynamic multibehavior dependencies.

2) Auxiliary View Representation Learning Module:
There are two such modules in the model, aiming to learn latent information from the user social network and item contact network.We extract connected subgraphs from the auxiliary views, and compute the cosine similarity between users and between items as the semantic similarity.Subsequently, we utilize subgraph GCN to capture the topological relationships within the graphs.3) Multibehavior Contrastive Learning Module: In this module, the embeddings obtained from module 1) are fused and subjected to contrastive learning between positive and negative behaviors.This aims to capture fine-grained differences between different types of user behaviors through contrastive learning.4) Multitask Optimization Module: Multitask joint optimization is executed to effectively leverage information and interdependencies among modules, ultimately enhancing the overall performance.In the following, we will provide a detailed introduction of the above modules.

A. Main View Representation Learning
To capture the dynamic behavioral dependencies, we use temporal information with the user-item multibehavior data as input to recursively learn the representation of the main view Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.using the time-aware GNN.Specifically, we first normalize the raw time data t i,j k corresponding to the kth behavioral interaction between user u i and item v j .Inspired by the temporal information modeling techniques [7], [30], we use the positional encoding mechanism in transformer architecture to map the raw time data to the temporal embedding p i,j k : where f (•) is the standardization function, (2n) and (2n + 1) are the temporal embedding position indices, which are the even and odd position indices, respectively.By combining the index of the time embedding with parity, cyclic information can be introduced, as the alternating changes in parity can simulate the periodic variations in sequential data.In addition, d is the potential dimension, and k is the behavior type number.Following that, we map p i,j k and user-item multibehavior data to the same space using a fully connected layer, construct the multibehavior messages of users and items with temporal information, and follow the way of LightGCN [14] to make message propagation in the user-item multibehavior graph.Note that the temporal information p i,j k is considered in our model, which is different from [14].Specifically, the information propagation under the kth behavior of the (l + 1) layer is expressed as where is element-wise multiplication, and N k,u i and N k,v j are the number of neighbors with the kth interaction behavior with the target node.e (l) k,u i is the user embedding under the kth behavior in the lth layer, and e (l) k,v j is the item embedding under the kth behavior in the lth layer.In particular, e (0) k,u i and e (0) k,v j are the randomly initialized user/item embedding, respectively.Under the kth behavior, the user/item embeddings of each layer learned recursively are concatenated to obtain embeddings containing different behavioral dependencies Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where L is the maximum number of layers, || is the concatenation operation, ẽk,u i and ẽk,v j are the user/item embeddings under the kth behavior.Subsequently, we use the adaptive weight matrix W u ∈ R m×K , W v ∈ R n×K to distinguish the importance of embeddings with different behaviors, and then sum the user/item embeddings under K behaviors.Thus, we can obtain the graphlevel representations ěu i and ěv j

B. Auxiliary View Representation Learning
The basic operation of GNNs is the domain aggregation of global nodes, which can be effective in learning graph representations.However, it is prone to oversmoothing problems, in addition to the huge memory and computational cost of aggregating large-scale data.To address the aforementioned issues, we extract subgraphs from auxiliary views and use subgraph convolutional networks [31] to learn topological information in the user social network G u and item contact network G v , which reduces the computational expenditure of the auxiliary views.At the same time, considering the connections between users, we construct the similarity matrix S u .The matrix values are calculated as follows: where • is L2 regularization, e u i and e u j are the embeddings learned by users u i and u j on the main view G uv , and when there is a connection between the two users in auxiliary view G u , we compute their cosine similarity as semantic similarity comprised in the similarity matrix S u ∈ R m×m .We then normalize S u to prevent gradient vanishing and gradient explosion problems where D ∈ |U | × |U | is the pairwise degree matrix.Subsequently, we recursively propagate the user/item embeddings in the auxiliary view, and we aggregate the similarity to the user embeddings in the propagation process as follows: u j (7) where N u i is the neighbor nodes and su i,j is the normalized similarity value between u i and u j .We then sum up the user representations on each layer to obtain the local semantic representation where the superscript d is the hidden layer dimension, and M is the maximum number of layers.To capture the local topological information of user social network G u , we extract the connected subgraph g u from G u and design the adjacency matrix where the superscript s is the number of connected subgraphs.Then, based on the adjacency matrix Âu and the user embedding E g u i , we can obtain the subgraph-level representation Especially, nodes from subgraph g u are selected as positive samples, while the negative samples are randomly selected from user social network G u .Then, we can construct the positive and negative sample pairs (E pos Due to the cross-entropy rationality in mutual information maximization, we design the loss function as follows: where N pos and N neg are the number of positive and negative sample pairs, respectively.φ(•) is the indicator function, and σ(•) is the activation function.Similarly to L G u , L G v in the item contact network can also be calculated.Therefore, we can obtain the loss function L G AU X for the auxiliary view by combining L G u with L G v through a balanced hyperparameter α

C. Multibehavior Contrastive Learning
To address the sparsity issues of multibehavior interaction data and supervision signals, we design a fusion multibehavior contrastive learning framework.Considering the guidance of the auxiliary behavior to the target behavior, as well as the commonality and difference between different behaviors, we selectively fuse the embeddings of different behaviors to generate positive behavior embedding B pos and negative behavior embedding B neg .Specifically, we use the division of [7] to classify the interaction behavior data into five behavior types, namely negative, below average, average, above average, and positive behavior.Then, we fuse the fourth (i.e., above average) and fifth (i.e., positive) behavior representations as the positive behavior embedding B pos , and the first three behavior representations as the negative behavior embedding B neg .In order to map the embeddings of different behaviors to the same dimension, we use a multilayer perceptron to fuse the user/item embeddings of different behaviors, whose formulation is as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where ϕ(•) is a multilayer perceptron with hidden layers, ẽk,u ẽk,v is the multi-behavior representation calculated by (3).Then, we select the positive sample pair (b u i pos , b neg ) from B pos and B neg , and use InfoNCE [24] as the contrastive learning loss function, which is constructed by where τ is the temperature parameter, ψ(•) is the cosine similarity calculation function, and ε is a tradeoff hyperparameter.

D. Multitask Optimization
To balance the tasks of different views and optimize the model, we use a multitask optimization strategy to combine the recommendation task with the self-supervised task.Specifically, we adopt the Bayesian Personalized Ranking (BPR) [33] loss as the loss function for the recommendation task where where Θ are trainable model parameters, λ 1 and λ 2 are the hyperparameters to balance auxiliary view loss and contrastive learning loss, and λ 3 is used to control the strength of L2 regularization for overfitting alleviation.

E. Time Complexity Analysis
For clarity, we summarize the training process of TMBCL in Algorithm 1, where L and M are the numbers of convolutional layers for the main view and auxiliary view, respectively, and K is the number of behaviors.Starting from initializing the model parameters and embeddings (line 1), we then construct three graphs based on the input data (line 2).Subsequently, we perform N iterations to iteratively compute user/item embeddings e (l) k,u i and e (l) k,v j under the kth behavior layer by layer (line 7).In the lth layer, for a given user-item pair (u i , v j ), we aggregate the corresponding temporal vectors p i,j k and neighborhood information to achieve the output (line 8).Next, we derive distinct behavior representations ẽk,u i /ẽ k,v j based on the learned user/item embeddings (line 10) and obtain the graph-level representation ěu i /ě v j (line 12).Concurrently, we conduct M iterations to Obtain the graph-level representation ěu i and ěv j of the main view G uv via (4); 13: for l to M do 14: Perform representation learning for auxiliary views via ( 5)-( 10); 15: end for 16: end for 17: Perform multi-behavior contrastive learning via (13) and ( 14); 18: Calculate the pairwise BPR loss function L BP R and auxiliary view loss functions L G u and L G v via ( 11), (12), and (15); 19: Multi-task joint optimization via (16); 20: Update model parameters; 21: until Equation ( 16) converges 22: return Top-K recommendations.learn auxiliary view information (line 14).Based on the learned user/item embeddings, we calculate the loss function for different tasks (lines [18][19], until the convergence of our multitask optimization function (line 21).
As deduced from the above, TMBCL initially takes O((x + y + z) × d) for constructing views, where x, y, and z represent the number of edges in G u , G uv , and G v , respectively.In addition, we also need O(y × d) for mapping temporal information.For the three views, our model requires , respectively, for iterative aggregation.Due to the consideration of multibehavior modeling on the main view, learning on the main view actually demands O((m + n × K) × d 2 ).Furthermore, similar to the majority of GCN methods, our model also needs O((m + n × K) × d) for storing the learned embeddings of users and items.

A. Datasets
We evaluate TMBCL on three real-world datasets, i.e., Yelp, Epinions, and CiaoDVD, whose statistics are shown in Table I.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I STATISTICS OF EXPERIMENTAL DATASETS
Yelp. 1 This dataset comes from the Yelp platform and contains a large amount of businesses, reviews, and user data.The useritem interactions are segmented in the same way as [7].
Epinions. 2 This dataset is obtained from the social networking site Epinions and contains a large number of users' social relationships as well as behaviors toward different items.The rating scale goes from 1 to 5 (i.e., negative, below average, neutral, above average, positive), with each rating being considered as an individual behavioral interaction.
CiaoDVD. 3 This dataset is compiled from a wide spectrum of DVD categories.It encompasses various DVDs, along with accompanying user reviews, ratings, and social connections between users.However, it does not include item-to-item connections.We have filtered out any irrelevant user connection information from this dataset.

B. Evaluation Protocols
We choose the hit rate (HR@10) and normalized discounted cumulative gain (NDCG@10) as our evaluation metrics to compare the performance of our proposed approach against the baseline methods.The leave-one-out method is adopted to divide the training and test sets, where each positive instance is associated with 99 negative samples to ensure a fair comparison.

1)
TrustMF [1]: TrustMF is a classical social recommendation method based on matrix factorization.2) GraphRec [19]: GraphRec aggregates social relationships between users from graph data in social recommendations via graph neural structures.3) DiffNet [33]: DiffNet is a model with hierarchical influence propagation structure to simulate recursive dynamic social diffusion in social recommendation.4) DGRec [6]: DGRec is a dynamic graph attention network approach for modeling dynamic user interests and context-dependent social influences.5) DANSER [34]: It consists of two graphic attention networks that capture social impact by learning the representation of dual social effects.) LR-GCCF [35]: LR-GCCF combines linear residual graph convolution methods with graph-based CF models to better serve social recommendations.7) ConsisRec [21]: ConsisRec addresses the social inconsistency problem by employing the relation attention mechanism to assign consistent relations.8) S4Rec [36]: S4Rec is a social recommendation framework that utilizes a deep graph model and a wide attentive SVD model for rating prediction.9) KCGN [7]: It uses a knowledge-aware coupled GNN to aggregate higher-order information between the users and items.10) MCCLK [28]: MCCLK is an advanced knowledgeaware multiview contrastive learning framework.

D. Parameter Settings
The parameter settings of the baseline models follow their original setups and are fine-tuned within the suggested parameter ranges.We use PyTorch to implement the TMBCL framework and use the easy-to-use ADAM optimizer to optimize the model parameters.To ensure the convergence speed of the model and avoid instability, the initial value of the learning rate is set to 1e-3 and dynamically adjusted in the range of 1e-4 to 1e-2 during the model training process.The embedding parameters are initialized with the Xavier method, and the embedding size is chosen from {32, 64, 128}.The batch size is chosen from {512, 1024, 2048}.The hyperparameters ε and α for balancing the weights of the learning tasks on the user and the item are set to 0.5 defaultly, and are adjusted according to the dataset characteristics in {0.1, 0.2, . .., 0.9}.Other than that, the other hyperparameters follow the settings in [7] taken from {1e − 2, 1e − 3, 1e − 4, 1e − 5}.

E. Performance Comparison
In Tables II and III, we show the experimental results of TMBCL and ten baselines, where HR@k and NDCG@k are used as evaluation metrics, and k takes values of 5, 10, and 15.In general, TMBCL achieves the best results on Yelp, Epinions, and CiaoDVD.As an exemplary matrix decomposition-based social recommendation model, TrustMF achieves good performance on Epinions while compared with the mainstream social recommendation model.However, TrustMF does not perform well on Yelp and CiaoDVD, demonstrating that the traditional matrix factorization model cannot well solve the data sparsity problem.Overall, the recommendation models based on GNN outperform traditional matrix factorization methods, since the recursive propagation paradigm can capture high-order implicit connections from explicit interactions.However, compared with the basic GNN social recommendation model, such as GraphRec, there is no significant improvement for the neural network model with graph attention mechanism.A possible reason is that simulating the user's social relationships through calculating specific weights in the mathematical paradigm is challenging.
In addition, knowledge-aware recommendation methods have achieved good results, among which KCGN and MCCLK are recommendation models with knowledge-aware capabilities, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and MCCLK is a state-of-the-art multiview knowledge-aware framework.It is worth noting that both MCCLK and our model perform well on Yelp due to their capability of handling sparsity in user-item interactions, user social connections, and item relationships.Despite this, TMBCL can further well capture the data characteristics, and achieves better performances than MCCLK, whose improvements are, respectively, 2.44%, 3.64%, and 3.89% in terms of NDCG@5/10/15 on Yelp.Moreover, KCGN performs well on all three datasets, and especially on Epinions, where the incorporation of multibehavior modeling into social recommendation plays a leading role for the excellent performance.Similar to KCGN, we also take into account different user behavior interactions in TMBCL, and use the multibehavior contrastive learning framework to learn user behavior representations, where the differences between different behaviors are expanded and the main relationships between them are learned.It is worth mentioning that our model performs better than KCGN on Epinions, whose improvements are 6.14%, 3.69%, and 5.11% in terms of NDCG@5/10/15, respectively.A reasonable explanation is that our model introduces a similarity matrix in the auxiliary view learning phase, which allows better learning of users' social relationships from rich social networks, and better extraction of item interrelationships from item contact networks.S4Rec is the best single-behavior social recommendation model that fully utilizes semantic and structural views, but its performance is still slightly inferior to KCGN and TMBCL on most metrics, further demonstrating the importance of taking multiple behaviors into consideration for recommendation.
Compared to the Yelp and Epinions datasets, the CiaoDVD dataset is richer in user-item interactions but very sparse in social relationships.This data characteristic requires social recommendation models to have the ability to learn important social relationships from sparse data.In addition, there is a lack of connectivity relationships between items in this dataset.This results in multiview models, such as MCCLK and KCGN, not performing as well as they did on the first two datasets.However, TMBCL still outperforms other methods on most of the metrics on this dataset due to its excellent ability to handle sparse data, which proves the generalizability and superiority of our approach.

F. Ablation Study
In this part, the ablation study is conducted, where the following six comparative variants of TMBCL are designed to demonstrate the effectiveness of the multibehavior contrastive learning framework, time-aware GNN, multiview learning framework, and subgraph information aggregation.
2) TMBCL-SG u : Only use GCN to learn the representation of user social network G u .3) TMBCL-SG v : Only use GCN to learn the representation of item contact network G v .4) TMBCL-SG uv : Only use GCN to learn the representation of the auxiliary view.5) TMBCL-CL: We remove the multibehavior contrastive learning module.6) TMBCL-MV: We remove the auxiliary view representation learning module.Table IV presents the experimental results of TMBCL and its six variants on the two datasets, where important insights into the contribution of different components to the model can be observed.Overall, TMBCL still achieves the best performance.And TMBCL-MV, which removes the multiview framework, performs the worst among all the variants, showing a decrease of 4.33%/6.22%and 4.54%/5.30% in terms of HR@10 and NDCG@10 on the two datasets, respectively.This suggests that the social relations as well as item connections are crucial for modeling user preferences.In addition, TMBCL-SG uv , which removes the connected subgraph extraction and similarity computation on both user social network G u and item contact network G v , also shows a significant performance gap compared with TMBCL, which indicates that the subgraph extraction and similarity calculation components play an important role in improving the performance of the model.
Interestingly, we observe that TMBCL-SG u performs better than TMBCL-SG v on both datasets, indicating that the connected subgraph extraction and similarity calculation components on the item contact network are more important than those on the user social network.This could be due to the fact that there is less data on user social interactions Yelp and Epinions than that on item relationships.Furthermore, since the number of users on Epinions is less than one-tenth of the number of items and the data on user social interactions is less than the data on item relationships, the extraction of connected subgraphs and similarity calculation components on the user social network is less important on this dataset.The performance of TMBCL-SG u on Epinions (with only a slight decrease of 0.83%/1.30% in terms of HR@10 and NDCG@10) also precisely confirms this hypothesis.Moreover, the significant decrease in the effectiveness of TMBCL-TA and TMBCL-CL demonstrates the importance of temporal data for recommendations and the effectiveness of the reasonably embedded contrastive learning mechanism for solving the problem of supervised signal sparsity.However, relatively speaking, the decrease of TMBCL-TA is not particularly significant, especially with a decrease of nearly 1.49%/2.25% in terms of HR@10 and NDCG@10 on Epinions, which indicates that our proposed time-aware GNN still requires further improvement and refinement.

G. Analysis of Different Behaviors Contrastive Learning
As shown in Fig. 2, in order to prove the rationality of this scheme, we designed five comparative schemes.Among them, TMBCL-B ij represents the fusion of the ith behavior representation ẽi,u /ẽ i,v and the jth behavior representation ẽj,u /ẽ j,v as positive behavior, while other behaviors are fused as negative behavior.In particular, B u pos /B v pos in scheme TMBCL-B 5 is fused from ẽ5,u /ẽ 5,v alone.TMBCL-B 54 is our default fusion scheme as shown in (13).In Fig. 2, the performances of different fusion schemes on Yelp are reported, and TMBCL-B 54 maintains the best performance.Through comparison with other fusion schemes, it is evident that incorporating below-average behaviors into B pos for contrastive learning significantly impacts the model's performance, which highlights the importance of applying contrastive learning appropriately.
In addition, to analyze the reasons for the performance differences among different fusion schemes, we compute the alignment and uniformity of embeddings based on [37].Alignment and uniformity are closely related to the effectiveness of recommendations, and smaller values indicate better alignment and uniformity, which often leads to superior learned Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.representations.Fig. 3 shows the alignment and uniformity measures during the training process of 50 epochs.TMBCL-B 51 exhibits the worst alignment and uniformity, followed by TMBCL-B 52 .TMBCL-B 53 and TMBCL-B 5 show similar alignment and uniformity during the training process.It can be observed that incorporating below-average behaviors into B pos leads to worse alignment and uniformity, which validates the rationality of the fusion scheme in (3).

H. Parameter Sensitivity Study
1) Impact of the Number of Time-Aware GNN Layers: Fig. 4 demonstrates the impact of the recursive layers of the time-aware GNN on model performance using the Yelp dataset.We can observe that the model achieves the best performance in terms of both metrics when the number of convolutional layers is set to l = 2, outperforming the cases when l = 0 and l = 1.Furthermore, the performance of the model with l = 1 is better than that of l = 0, indicating that increasing the number of time-aware GNN layers can enhance performance.However, when l > 2, the model performance deteriorated, with a 3-layer convolutional time-aware GNN consistently achieving lower accuracy throughout the training process.This is attributed to the over-smoothing phenomenon that occurs with excessive layers in GNNs, resulting in decreased performance.
2) Impact of Balance Parameters: We set the balance parameters λ 1 and λ 2 in (16) in the range of {0.001, 0.01, 0.05, 0.1, 1}, and for α in (12) and ε in (14), the range of values is set to {0.2, 0.3, 0.4, 0.5, 0.6, 0.7}.As shown in Fig. 5, we observe that as λ 1 and λ 2 decrease, the performance of TMBCL gradually improves on both datasets, reaching its peak at λ 1 = 0.05 and λ 2 = 0.01, and then starts to decline.The reason is that when λ 1 and λ 2 are large, the model tends to prioritize these two loss functions and their corresponding tasks, resulting in the insufficient training on the main view learning task and decreased performance.However, the main view contains rich time information and multibehavior interaction data, which can provide better guidance for the model to learn.Therefore, when λ 1 = 1, TMBCL performs very poorly on the Yelp dataset.As λ 1 and λ 2 decrease, the main view loss function becomes more dominant, allowing the model to focus on learning from the main view and leverage the auxiliary views to improve the representation learning.
α is a balance parameter about the auxiliary view, with larger values of α, the focus of the auxiliary view learning task becomes more and more oriented toward the item contact network, and vice versa toward the user social network.In the Yelp and Epinions datasets, the sparsity of social and item relationships is different.Consequently, the conditions for TMBCL to reach its peak performance vary on the two datasets.For instance, in the Yelp dataset, where social relationships are relatively sparse, the peak is achieved at α = 0.6, which implies that the model works better when focusing on the learning of item relationships.This coincides with the fact that item-connected relationships in this dataset are denser and contain more information.In addition, ε functions similarly to α as a balance parameter in the contrastive learning loss function.TMBCL reaches its peak performance at ε = 0.5 on both datasets.

VI. CONCLUSION
In this article, we focused on exploring contrastive learning in social recommendation and proposed a time-aware multibehavior contrastive learning framework TMBCL.Within this framework, we conducted representation learning from multiview perspectives, and incorporate time information as well as multibehavior interactions into the social recommendation.A timeaware GNN was then specifically designed to model dynamic dependencies.In addition, we designed subgraph extraction and similarity calculation components, using subgraph GCN to capture more semantic information from auxiliary views.To address the sparsity of supervision signals, we designed a multibehavior contrastive learning framework.Extensive experiments on three public datasets demonstrated the superiority of our proposed method.
Chuyuan Wei (Member, IEEE) received the Ph.D. degree in computer science from the Beijing Institute of Technology, Beijing, China, in 2016.
He is currently an Associate Professor with the College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.His current research interests include machine learning, data mining, and natural language processing.He has authored or coauthored some academic papers in international journals and conferences such as EMNLP, IJCIS, Sensors and Materials, and Chinese Journal of Electronics.
Dr. Wei is a Senior Member of the China Computer Federation.
Chuanhao Hu received the B.S. degree in software engineering from Linyi University, Linyi, China, in 2021.He is currently working toward the master's degree in mechanical engineering with the Beijing University of Civil Engineering and Architecture, Beijing, China.His research interests include recommendation systems, specifically exploring recommendation methodologies based on graph neural networks.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.Framework of TMBCL.From top to bottom, each layer corresponds to the learning of different views, and different learning objectives are set for each module.Furthermore, to balance different learning tasks, we have designed a multitask optimization module.The bottom-right corner of the figure explains the significance of the different colored circular nodes.

Fig. 3 .
Fig. 3. Alignment and uniformity between user representations and item representations in different fusion schemes.

Fig. 4 .
Fig. 4. Performance of time-aware GNN with different numbers of layers during the first 50 epochs.

Algorithm 1 :
Training Procedure of TMBCL.Input: User-user matrix X , user-item matrix Y, item-item matrix Z, user-item interaction time data T .

TABLE II PERFORMANCE
OF TMBCL AND TEN COMPARED BASELINES ON YELP, EPINIONS, AND CIAODVD.THE BEST RESULTS IN THE BASELINE ARE BOLDED AND THE RESULTS OF OUR MODEL ARE BOLDED AND UNDERLINED.THE IMPROVEMENT OVER THE BEST DEEP LEARNING MODELS IS RECORDED Improvement TABLE III PERFORMANCE OF TMBCL AND TEN COMPARED BASELINES ON YELP, EPINIONS, AND CIAODVD.THE BEST RESULTS IN THE BASELINE ARE BOLDED AND THE RESULTS OF OUR MODEL ARE BOLDED AND UNDERLINED.THE IMPROVEMENT OVER THE BEST DEEP LEARNING MODELS IS RECORDED

TABLE IV PERFORMANCE
COMPARISON OF TMBCL AND ITS VARIANTS ON YELP AND EPINIONS