fbpx

Research Papers Library

Estimating deep web data source size by capture-recapture method

 

This paper addresses the problem of estimating the size of a deep web data source that is accessible by queries only. Since most deep web data sources are noncooperative, a data source size can only be estimated by sending queries and analyzing the returning results. We propose an efficient estimator based on the capture-recapture method. First we derive an equation between the overlapping rate and the percentage of the data examined when random samples are retrieved from a uniform distribution. This equation is conceptually simple and leads to the derivation of an estimator for samples obtained by random queries.

Download PDF

 

AOFIRS

World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.