fbpx

Research Papers Library

Identifying Similarities, Periodicities and Bursts for Online Search Queries

 

We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving’ or 'Christmas gifts’) where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to ef- ficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform ’query-by-burst’ on the database of timeseries. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.

Download PDF

 

AOFIRS

World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.