Skip to Main Content
The identification of the intention of the users behind the queries can be useful to improve the precision of the list of documents recommended by the Web search engines. That is why, recent works have focused themselves in the construction of query classifiers following the categories proposed in the scientific literature. These works have based on query representations using two sources of main information: text and click-through data. Despite of the before mentioned we have little understanding about the nature and behaviour of the variables used to characterize queries. In this work we analyse the behaviour of the variables looking for a way to improve their comprehension and to identify the characteristics that exactly allow that the query classifiers improve their precision. The analysis shows that the variables based on text have a better performance in the discrimination of the categories than the ones based on click-through data. Among these variables, the query length (number of terms that compound a query), the Levenshtein distance between snippets and queries, and the PageRank metric are recommendable features to work with query type classifiers.