Research Papers Library

Effective and efficient sampling methods for deep web aggregation queries

 

A large part of the data on the World Wide Web resides in the deep web. Executing structured, high-level queries on deep web data sources involves a number of challenges,several of which arise because query execution engines have a very limited access to data.In this paper, we consider the problem of executing aggregation queries involving data enumeration on these data sources, which requires sampling. The existing work in this area (HDSampler and its variants) is based on simple random sampling.We observe that this approach cannot obtain good estimates when the data is skewed.While there has been a lot of work on sampling skewed data, the existing methods are based on prior knowledge of data, and are therefore not applicable to hidden databases.

Download PDF

 

AOFIRS

World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.