A Topicality Relevance-Aware Intent Model for Web Search

To accurately understand user information needs and provide better search experiences, various methods have been proposed to model user search intent from their search logs. Most traditional methods based on query understanding only consider the similarity between the query and documents while ignoring the user’s session interaction sequence. Some researchers also adopt neural network-based methods to model user search intent. However, most neural network-based models mainly focus on the user’s session interaction sequence without considering the role of topic relevance in modeling user search intent. In this paper, we propose a novel topicality relevance-aware intent model (TRIM) for web search. TRIM consists of a topic relevance predictor and a user short-term intent predictor. The topic relevance predictor utilizes the BERT model to predict the topic relevance between the query and documents. The user short-term search intent predictor utilizes session context information to predict the user’s short-term search intent. We further investigate several fusion strategies to integrate the topic relevance and user short-term search intent for user search intent prediction. We conduct our experiments on two public web search datasets named TianGong-QRef and TianGong-ST. The experiments show that TRIM outperforms all baselines in the document ranking task. On TianGong-QRef dataset, TRIM achieves a 15.19% increase over the best-performing baselines M-Match in terms of Mean Average Precision (MAP). On TianGong-ST dataset, TRIM achieves a 5.77% increase over the best-performing baselines CARS in terms of MAP. The experimental results indicate the effectiveness of topic relevance in user search intent modeling.


I. INTRODUCTION
Search engine is widely used as a common tool to obtain information from the web. Understanding web search user behavior can help search engines better understand their information needs and improve the performance of retrieval systems. However, according to the four-level theory of information needs [1], user information needs are dynamic. For different users, the same query may represent different search The associate editor coordinating the review of this manuscript and approving it for publication was Ali Kashif Bashir .
intents. In order to solve this problem, researchers try to model user search intent and re-rank the search results based on it.
Traditional methods based on query understanding typically assume that the query represents the user's original search intent. Therefore, these methods focus on analyzing the characteristics of keywords, phrases, and combinations in queries. Several studies [2], [3], [4], [5] have attempted to understand the user search intent through query classification, which may cause some important information loss. When the user's query is short, using query classification may have a negative impact on search results. To solve this problem, several studies have attempted to expand the query [6], [7], [8], [9], [10], [11]. However, they may introduce irrelevant words to the original query, causing search results to deviate from the user's actual intent.
With the emergence of deep learning, researchers focus on obtaining vector representations of the query and documents [12], [13], [14], [15]. Specifically, they first obtain the interactive or individual representations of the query and documents, then calculate the similarity between the query and the document through the Multi-Layer Perceptron(MLP). Finally, re-ranking the search results. The advantage of these methods is that they can better model the semantics of the queries. However, at the beginning of the search, most user search intent are unclear. So, it is difficult to predict the actual user search intent without considering the user's session interaction sequence.
Recently, some context-aware methods [16], [17], [18], [19], [20], [21] have been proposed to model user search intent. They obtain the representation of queries and documents and then utilize recurrent neural networks [22] or attention mechanism [23] to model the session process. They can better model the changes of user's search behavior, but these methods still face long-term dependency. Furthermore, they ignore the role of topical relevance. To solve this problem, our model aims to utilize the BERT model to predict the topic relevance. The BERT model has powerful language representation and feature extraction capabilities. Previous studies [24], [25] have demonstrated that the BERT performs well in deep semantic matching tasks.
To solve the shortcomings of existing methods, we propose the topicality relevance-aware intent model (TRIM) for web search. The idea is illustrated in Figure 1. We combine topic relevance with users' short-term interests to predict users' real search intent and re-rank the search results. TRIM consists of a topic relevance predictor and a user short-term intent predictor. Firstly, We utilize the BERT model to predict the topic relevance between the query and documents. We input the vector representation of the query and documents into the BERT model to obtain the topic relevance score of the query and documents. Secondly, We use Transformer to model the current session process. We input the user's current session interaction sequence into Transformer to obtain the vector representation of the users' short-term search intent. Thirdly, we further investigate several fusion strategies to integrate the topic relevance and user short-term search intent for user search intent prediction. Then, we get the final document ranking score.
The contributions of this paper can be summarized as follows: • We proposed a novel topicality relevance-aware intent model named TRIM. It models the users' short-term search intent and topic relevance with an end-to-end neural network and re-ranks the search results.
• TRIM achieves significantly better performance than all baselines in document ranking by utilizing contextual and topic relevance information.
• To explore the importance of topic relevance and user short-term search intent, we further design several fusion strategies to integrate the topic relevance and user shortterm search intent for user search intent prediction. Experiment results show that topic relevance and user short-term search intent have different importance in predicting user actual search intent.
The rest of the paper is arranged as follows. Related works are summarized in section II. The proposed method is shown in section III. We introduce the experimental settings in section IV, and analyze the results in section V. The conclusion is drawn in section VI.

II. RELATED WORK
Overall, through an in-depth investigation of existing research, we divide the study of modeling user search intent into two research directions: (1) Search intent modeling based on query understanding. (2)Search intent modeling based on session sequence.

A. SEARCH INTENT MODELING BASED ON QUERY UNDERSTANDING
Some works are already trying to understand the query to model the user search intent. Border systematically expounded the classification of query keywords, which laid a solid foundation for the later research on query classification [2]. Recently, many neural query intent classification models have been proposed [3], [4], [5]. Specifically, Xu et al. [3] proposed a hybrid deep neural network model for query intent classification. They encoded the query representations using the recurrent and recursive neural networks, respectively. Wang et al. [4] used a nature network algorithm to classify different building information-related queries. Yuan et al. [5] proposed a Multi-granularity Matching Attention Network to comprehensively extract features from the query and a querycategory interaction matrix. However, query classification may cause the loss of important information, which could result in inaccurate predictions of user search intent.
The majority of queries issued by users are short and ambiguous [26], [27]. In this case, it is necessary to expand the query. Savitha et al. [6] fused statistical information from the news corpus and topic diversity of news articles to expand the query. Wang et al. [7] generated words from local word embeddings to expand the original query. Alqahtani et al. [8] proposed a novel approach of hybrid COOT-based Cat and Mouse Optimization algorithm to select optimal candidate terms in the automatic query expansion process. The advantage of these methods is that they improve the recall rate of search engines and the diversity of search results. However, they lack feedback information from users.
Rocchio et al. [9] used explicit feedback from users to expand the query. However, it is difficult to obtain active FIGURE 1. The idea of our paper. When the user issues a query, the search engine will return a list of results based on relevance. On this basis, the user will choose the most relevant documents according to their search intent. feedback from users during retrieval. Thus, some studies attempt to expand queries through pseudo relevance feedback. Specifically, Dev and Balasubramanian [10] utilized the semantic properties of context phrases that occur within the top-ranked retrieved documents to generate diversified query expansions. Nasir et al. [11] used information from a knowledge base to improve the pseudo-relevance feedback process and further expand the query. In addition, some of the pseudo-relevant documents obtained by the user's initial search may not be relevant to the user's query, which leads to a deviation in understanding the user's search intent.
With the emergence of deep learning, Most existing models use deep neural networks to obtain vector representations of queries and documents. They then re-rank search results based on the similarity between the query and document vectors [12], [13], [14]. Specifically, Hu et al. [12] proposed ARC-I model, which obtains the text representation by two layers of 1-D convolution network. Then, the MLP is used to compute their matching score. Compared to the ARC-I model, the ARC-II model [12] considers the order of words and interaction information between sentences. It utilizes 2D-convolution network to obtain a global vector representation. Xiong et al. [13] used the kernel pooling method to obtain the ranking features on the interaction matrix between the query and documents. Mitra et al. [14] simultaneously considered the interactive information and distributed representation of queries and documents. Li et al. [15] selected key blocks of a long document as the input of the BERT model.
However, users usually put forward a series of queries to solve a search task or multiple similar search tasks [28]. Therefore, the representation of the query and documents may not be able to encode the user's actual search intent. On the basis of considering the representation of queries and documents, our model utilizes the user's session behavior sequence to enhance user search intent. By this, we attempt accurately model the actual search intent.

B. SEARCH INTENT MODELING BASED ON SESSION SEARCH
Some traditional approaches already utilize session context to infer search intent [29], [30], [31], [32], [33]. Specifically, Carterette et al. [29] proved that using session data can improve retrieval effectiveness. Van Gysel et al. [30] explored the viability of lexical query matching in session search. They found that specialized models can make better use of long session history than naive term weighting methods.
With the emergence of deep learning, researchers have focused on designing neural context-aware ranking models [16], [17], [18], [19], [20], [21]. Specifically, Ahmad et al. [16], [17] encoded the session contextual information using RNNs and attention mechanism. They jointly optimized the ranking task and the next query prediction task. Cheng et al. [18] learned user search intent from their long-term and shortterm behavior. They utilized a multi-hop memory network to infer the users' long-term search intent. Deng et al. [19] modeled multi-granularity user feedback information. Zuo et al. [20] modeled the historical query change in the user's current session. Chang et al. [21] molded user latent intent by a probabilistic modeling approach. They incorporated the latent intention model into the RNN-based sequential recommendation model.
However, these methods still face long-term dependency. The appearance of Transformer [23] technology effectively alleviates this problem. BERT4Rec [34] have proved the effectiveness of Transformer in sequence problems. Due to its powerful ability to leveraging contextual information, we apply it to encode user current session. It is worth mentioning that these methods all ignore the role of topical relevance. Our model attempt to apply the BERT model to predict the topic relevance between the query and documents. By this, we aim to predict user search intent more comprehensively. VOLUME 11, 2023

III. METHOD
In this section, we describe the architecture of TRIM. We first make a definition of the session search task, and then we introduce three components in the model: topic relevance predictor, user short-term intent predictor and fusion layer.

A. PROBLEM DEFINITION
In session search, the search engine retrieves and sorts candidate documents according to the query and session context submitted by the user. The session context contains the user's historical queries and the clicks for each query. We assume that for every query, there is at least one document clicked by a user. Suppose the given query is q t , the search session S is represented as a sequence of query-clicks documents pairs, which is defined as: . q i is the i-th query in the session and d i is the clicked document for query q i . The task is to rank the candidate documents in D t for query q t .

B. TRIM: TOPICALITY RELEVANCE-AWARE INTENT MODEL
The overall structure of TRIM is shown in Figure 2. The model can be divided into three parts: (1) topic relevance predictor. We use the BERT model to obtain a topic relevance score between the query and each candidate document. (2) user short-term intent predictor. Based on users' feedback information within the current session, we use Transformer to obtain the user's short-term search intent. Then calculate the user's short-term interest relevance score. (3) fusion layer. We fuse the topic relevance score and the user's short-term interest relevance score to calculate the final relevance score of each candidate document. Then, we re-rank candidate documents based on their relevance scores.

1) TOPIC RELEVANCE PREDICTOR
Suppose the given query is q t , the set of candidate documents D t returned for this query can be obtained, which is defined as: To obtain the joint representation of query and candidate documents, we use the BERT model (bert-base-chinese) for learning. The BERT model sends every word in the input text to the token embedding layer to map each word into a low-dimensional vector space, and then convert it into a text representation vector. The Embedding layer consists of the following three parts: 1) Token embedding. It transforms words into unified dimensions. 2) Segment embedding. It enables the model to distinguish between two texts. 3) Position embedding. It enables the model to understand the word order of words.
For the current query q t , we combine the query and candidate documents as the input of BERT, and the output of BERT is as follows: where s t refers to the vector representation of CLS.
We apply a linear layer to calculate the relevance score from the obtained representation. The relevance score can be regarded as the aggregation of local relevance information, and we take the final relevance score as the topic relevance score O t .
2) USER SHORT-TERM INTENT PREDICTOR In this part, we use Transformer to model the user's search and click behavior in the current session. The Transformer layer contains a Multi-head Self-attention layer and a Position-wise Feed-forward (FFN) layer. For the current query q t and search session S = [(q 1 , d 1 ) , . . . , (q t−1 , d t−1 )], we join the query with its clicked documents to obtain their vector representation.
For the current query q t , there is no document clicked by the user, so d t is initialized by zero vector. The embedding vector E is calculated by multi-head selfattention. Specifically, It converts E into query vector Q = EW The formula is as follows.
where head i represents the i-th self-attention. The final output is the connection of all heads. We incorporate Point-wise Feed-Forward Networks (FFN) to further enhance the model with non-linearity, which is defined as follows.
where I t is the output vector of the FFN layer as user's short-term search intent. The function φ(·) is the multilayer perceptron (MLP) with LeakyRelu(·) as the activation function. LN(·) is layer normalization to stabilize the output. D(·) is a dropout layer with 0.1 probability. FFN is a Position-wise Feed-forward layer. Finally, we connect the current query q t with the shortterm search intention I t through non-linear transformation, and apply a sigmod function to generate the user's short-term interest relevance score U t .
where W p and b P are parameters of our model, N is the number of candidate documents, d i is the i-th candidate document for query q t .
65742 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  The overall structure of TRIM. TRIM consists of a topic relevance predictor and a user short-term intent predictor. The topic relevance predictor is used to estimate the topic relevance score O t . The user short-term intent predictor is used to obtain the user short-term Interest relevance score U t . TRIM integrates O t and U t through a fusion layer to predict users' search intent.

3) FUSION LAYER
In this part, our task is to fuse the topic relevance score O t and the user's short-term interest relevance score U t and rerank the documents according to the fused score P t . Suppose the given query is q t . There are 10 candidate documents under the query. We have formulated the following three integration strategies: Strategy 1: When checking search results, users are often influenced by the position of the results [35], [36]. Users tend to check the top-ranked documents and ignore the results with lower rankings. The user's clicking behavior is usually complex, often influenced by various factors such as presentation form and credibility. Therefore, it is difficult for users to accurately express the document's relevance by clicking. To this end, we have formulated the following strategy.
Firstly, the relevance scores U t and O t are sorted in descending order, and the descending sorting lists Lu and Lo of the two relevance scores are obtained. Secondly, we reorder the candidate documents according to Lu and Lo, and get the ranking lists L1 and L2 of the candidate documents. The relevance score of any candidate document is calculated as follows.
where i ∈ {1, 2, . . . , 10}, j, k represent the position of the document d i in the list Lu, Lo, L1 and L2. Strategy 2: According to the data fusion algorithm Comb-Sum [37], we add the relevance scores U t and O t to represent the final relevance score of the document. This strategy holds that topic relevance and short-term user feedback are equally important.
Strategy 3: We use the linear combination method to obtain the final relevance score of candidate documents. This strategy holds that topic relevance and short-term user feedback have different importance to document relevance scores. Therefore, we use a trainable parameter to adjust the weights of U t and O t .  We re-rank the candidate documents according to the relevance score P t . To learn the weights and parameters of the model, we use the binary cross entropy loss function as the objective function of the model.
where N is the number of candidate documents, and C i is the real label of the i candidate document under the current query q t . C i is the prediction label.

IV. EXPERIMENT
In this section, we conduct experiments to answer the following questions: RQ1: Compared with baseline models, Does TRIM achieve the best performance in the document ranking task on TianGong-QRef and TianGong-ST datasets? RQ2: What is the effectiveness of the topic relevance prediction component in modeling user intent? RQ3: What strategy performs best in integrating topic relevance and users' short-term feedback?
A. EXPERIMENTAL SETUP 1) IMPLEMENTATION DETAILS The operating system used is Windows, the GPU is NVIDIA TITAN V. We train our model with Adam optimizer. We give detailed model parameters in Table 1.

2) DATASET
We evaluate our proposed method on two public datasets: TianGong-QRef search logs [38] and Tiangong-ST query logs [39]. The statistics of the datasets can be found in Table 2. TianGong-QRef has a longer session length than TianGong-ST, so the user's interaction behavior is more complicated than TianGong-ST. TianGong-ST has a shorter user query, which is more difficult than a long query in predicting the user search intent.
where Q represents the number of queries, AveP represents the average accuracy of each query, and the calculation formula of AveP is as follows: where k represents the ranking position in the search result list, P (k) represents the accuracy of the first k results, and rel (k) represents whether the documents in the position are relevant, with correlation being 1 and irrelevance being 0. N represents the total number of related documents.
where g i is the relevance label of the i-th document in the result list, with relevance of 1 and irrelevance of 0. IDCG@K rank the retrieved K documents from high to low in relevance (that is, the most ideal ranking order), and then calculate the value DCG in this order.
where N is the number of queries, and rank (i) is the ranking of the first related document in the retrieval results under the k-th query.

4) BASELINES
To evaluate the effectiveness of our proposed model TRIM, we compare it with two kinds of baselines: (1)Ad-hoc ranking models. These models only use the information of query and document to get the ranking score. 1) ARC-I [12] obtains the representation of query and candidate documents through a two-layer one-dimensional convolutional network. 2) ARC-II [12], it uses a 2-D convolution network to obtain the deep features of text on the interaction matrix between query and candidate documents. 3) KNRM [13], it uses the kernel pooling method to obtain the ranking features on the interaction matrix between the query and document. 4) DUET [14], the model integrates the representation-based method and the interaction-based method.
(2) Context-aware ranking models. These models attempt to understand the search intent by modeling session context. 1) M-NSRF [16] and 2) M-Match [16]. The two models combine current query and session information to solve document ranking task. They joint query suggestion and document ranking tasks. 3) CARS [17]. It introduces the user's click and attention mechanism to obtain a better representation of session context. 4) LostNet [18]. It use hierarchical sessionbased attention mechanism and multi-hop memory network to infer the user's search intent.

V. RESULTS AND ANALYSIS A. OVERALL PERFORMANCE COMPARISON (ANSWER RQ1)
We compared our model with other baseline models on TianGong-QRef and TianGong-ST datasets, and the results are shown in Table 3. We can observe: (1) Compared with all ad-hoc models, our model shows the best effect on both datasets. Compared with the best ad-hoc baseline model DUET, our model has significant improvements in all evaluation metrics. Concretely, for TianGong-QRef dataset, TRIM outperforms DUET by over 21.5% improvement on MAP, while the improvement percentage is 57.6% for TianGong-ST dataset. These results show the importance of modeling the session context containing the historical behavior of users. However, we find some session search models perform worse than some adhoc search, e.g. M-NSRF and CARS are worse than DUET on TianGong-QRef dataset. The reason may be that DUET combines the two methods of representation-based and interaction -based, which is more useful than the user's context information on the TianGong-QRef dataset.
(2) Our model shows the best effect compared with all context-aware Ranking models. Compared with the contextaware ranking models, our model significantly improved in all evaluation metrics on both datasets. This proves the effectiveness of modeling the topic relevance between the query and documents. Modeling topic relevance can better understand users' search intent. Specifically, compared with the best context-aware baseline model CARS on TianGong-ST dataset, our models have significant improvements in all evaluation metrics. TRIM outperforms CARS by over 5.7% improvement on MAP, and on the NDCG@1 Metric, it increased by 9.3%. Compared with the best session search baseline model M-Match on TianGong-QRef dataset. TRIM outperforms M-Match by over 15.1% improvement on MAP, and on the NDCG@1, it increased by 27.0%. In particular, because TianGong-ST dataset does not contain user ID information, it can't obtain the long-term historical information of users, so we compare the LostNet model's variant LostNet-Short [18] proposed by the author on TianGong-ST dataset. Compared with the LostNet-short on TianGong-ST dataset, TRIM outperforms LostNet-short by over 7.8% improvement on MAP.
(3) In all session search models, the RNN network is mainly used to process the sequence information of session context. Our model is superior to the model using the RNN network, which shows that Transformer can better learn the deeper representation of each feature in the behavior sequence.

B. ABLATION ANALYSIS(ANSWER RQ2)
To prove the influence of topic relevance on the modeling of users' search intentions, we conducted ablation experiments on TianGong-QRef and TianGong-ST datasets. Specifically, we remove one component at a time for performance comparison in the following.
• w/o. TPR. We removed the topic relevance predictor and only modeled users' search intent through user shortterm search intent predictor.
• w/o. SFM. We removed the user short-term search intent predictor and only used topic relevance predictor to model the user's search intent.
The experimental results are shown in Table 4 and Table 5, and we can get the following conclusions through the ablation analysis: (1) The introduction of topic relevance predictor can improve the model's performance. When the topic relevance predictor in the model is deleted, model's performance on all evaluation metrics will be greatly reduced. Specially, for TianGong-QRef dataset, the evaluation of MAP and MRR decreased by over 40% and NDCG@1 metrics decreased by more than 70%. For TianGong-ST dataset, the evaluation of MAP and MRR decreased by over 55% and NDCG@1 metrics decreased by more than 70%. By analyzing the model structure, we can see the importance of topic relevance predictor in modeling user search intent.
(2) The user short-term search intent predictor effectively models the user's intention. When the user short-term search intent predictor is removed, the model's performance on all evaluation metrics will be reduced. Specially, for TianGong-QRef dataset, the MAP and MRR metrics dropped by over 6%. The NDCG@1 metric dropped by over 14%. For TianGong-ST dataset, the MAP and MRR metrics drop by over 2%. The NDCG@1 metric dropped by over 3%. These   We sequentially perform the following steps: I. remove the topic relevance predictor; II. remove the user short-term search intent predictor. The best results are shown in bold.

TABLE 6.
Performances of different fusion strategies on TianGong-QRef. The best results are shown in bold. Strategy 1 is to re-rank to obtain relevance scores. Strategy 2 is to add the topic relevance score and the user's short-term interest relevance score. Strategy 3 is to assign different weights to the topic relevance score and the user's short-term interest relevance score.
results show the effectiveness of the user's short-term intent modeling. We should consider both topic relevance and users' short-term feedback.

C. COMPARISON OF FUSION LAYER STRATEGIES (ANSWER RQ3)
We study the influence of different fusion strategies on the experimental results. The fusion strategy is described in section III. The performance of different fusion strategies on TianGong-QRef and TianGong-ST is shown in Table 6 and Table 7, respectively. From Table 6 and Table 7, we can observe that strategy 3 can achieve the best results in the fusion layer. The result of using strategy 2 is worse than strategy 3. This shows that the topic relevance and the users' Performances of different fusion strategies on TianGong-ST. The best results are shown in bold. Strategy 1 is to re-rank to obtain relevance scores. Strategy 2 is to add the topic relevance score and the user's short-term interest relevance score. Strategy 3 is to assign different weights to the topic relevance score and the user's short-term interest relevance score.
short-term feedback have different influences on modeling user search intent.

VI. CONCLUSION
In this paper, we propose a topicality relevance-aware intent model (TRIM) for web search. TRIM consists of a topic relevance predictor and a users' short-term intent predictor. We further investigate the performances of several fusion strategies which integrate topic relevance and user short-term intent for user actual search intent prediction.
We conduct extensive experiments on two open Web search datasets. We can find that: 1) TRIM significantly outperforms all baselines on the document ranking task. On TianGong-QRef dataset, TRIM achieves a 15.19% increase over the best-performing baselines M-Match in terms of MAP. On TianGong-ST dataset, TRIM achieves a 5.77% increase over the best performing baselines CARS in terms of MAP.
2) The ablation experiment indicates the effectiveness of topic relevance information in modeling user search intent. 3) The fusion strategy 3 has the best overall performance among all the fusion strategies. The reason is that using a learnable method to assign weights is more flexible than manually assigning them.
Modeling user search intent is a challenging task due to the complex and dynamic nature of user search intent. We combine topic relevance and users' short-term intent in modeling user search intent. However, our work only considers the current session information of users. Compared with the user's current session information, the longer-term session information contains the user's more stable interests. In future work, we plan to apply user's long-term sessions so as to enhance the current session search. Also, External domain knowledge can supplement the information learned by the model. So, we would like to integrate external domain knowledge to further improve the performance of TRIM.
JIANPING LIU received the Ph.D. degree in information technology and digital agriculture from the Chinese Academy of Agricultural Sciences. He is currently a Lecturer in information sciences with the College of Computer Science and Engineering, North Minzu University. His research has been published in Library and Information Science Research, in 2019, and Data Science Journal, in 2020.
JIAN WANG received the Ph.D. degree in geoinformatics from the Chinese Academy of Sciences. He is currently a Professor in information sciences with the Agricultural Information Institute, Chinese Academy of Agricultural Sciences. His research was published in Sensor Letters, in 2010, Data Science Journal, in 2020, and Library and Information Science Research, in 2019.
YINGFEI WANG is currently pursuing the master's degree with the College of Computer Science and Engineering, North Minzu University. Her research interests include interactive information retrieval and click model. Her research has been published in the International Conference on Cloud Computing and Intelligent Systems (CCIS 2022).
XINTAO CHU is currently pursuing the master's degree with the College of Computer Science and Engineering, North Minzu University. His research interests include interactive information retrieval and deep learning. His research has been published in the International Conference on Cloud Computing and Intelligent Systems (CCIS 2022).