TY - GEN
T1 - Determining the user intent of web search engine queries
AU - Jansen, Bernard J.
AU - Booth, Danielle L.
AU - Spink, Amanda
PY - 2007
Y1 - 2007
N2 - Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze samples of queries from seven transaction logs from three different Web search engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and transactional represent the type of content destination the searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web search engine transaction log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of searcher intent might be used to enhance future Web search engines.
AB - Determining the user intent of Web searches is a difficult problem due to the sparse data available concerning the searcher. In this paper, we examine a method to determine the user intent underlying Web search engine queries. We qualitatively analyze samples of queries from seven transaction logs from three different Web search engines containing more than five million queries. From this analysis, we identified characteristics of user queries based on three broad classifications of user intent. The classifications of informational, navigational, and transactional represent the type of content destination the searcher desired as expressed by their query. We implemented our classification algorithm and automatically classified a separate Web search engine transaction log of over a million queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the classification to the results from our algorithm. This comparison showed that our automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is generally vague or multi-faceted, pointing to the need to for probabilistic classification. We illustrate how knowledge of searcher intent might be used to enhance future Web search engines.
UR - http://www.scopus.com/inward/record.url?scp=35348844063&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35348844063&partnerID=8YFLogxK
U2 - 10.1145/1242572.1242739
DO - 10.1145/1242572.1242739
M3 - Conference contribution
AN - SCOPUS:35348844063
SN - 1595936548
SN - 9781595936547
T3 - 16th International World Wide Web Conference, WWW2007
SP - 1149
EP - 1150
BT - 16th International World Wide Web Conference, WWW2007
T2 - 16th International World Wide Web Conference, WWW2007
Y2 - 8 May 2007 through 12 May 2007
ER -