- Nevena Gojkovic Turunz
- Research Papers
- Deep Web & Dark Web
Sampling Methods for Deep Web Aggregation Queries
Effective and efficient sampling methods for deep web aggregation queries
A large part of the data on the World Wide Web resides in the deep web. Executing structured, high-level queries on deep web data sources involves a number of challenges,several of which arise because query execution engines have a very limited access to data.In this paper, we consider the problem of executing aggregation queries involving data enumeration on these data sources, which requires sampling. The existing work in this area (HDSampler and its variants) is based on simple random sampling.We observe that this approach cannot obtain good estimates when the data is skewed.While there has been a lot of work on sampling skewed data, the existing methods are based on prior knowledge of data, and are therefore not applicable to hidden databases.
- Hits: 666