fbpx

Research Papers Library

Abstracting Keywords from Hypertext Documents

This paper presents a process for abstracting keywords from hypertext or text documents. The abstracted keywords, like keywords listed in a paper, identify the contents of a document. Our proposed process can be used, for example, to identify the contents of HTML documents returned from a search engine, to allow users to quickly find their needed information. The proposed process not only considers the occurrent frequency of a word in a document, like other related works, but also considers the occurrent frequency of its synonyms. It also considers key phrases consisting of two or three words. To increase the accuracy of the frequency count of words, a stemming algorithm is used to remove suffixes. Our tests show that the stemming algorithm consumed on average 56.7% of the total computation time, and that the proposed process can on average abstract 52% of the keywords provided by the authors of the tested documents.

Download PDF

AOFIRS

World's leading professional association of Internet Research Specialists - We deliver Knowledge, Education, Training, and Certification in the field of Professional Online Research. The AOFIRS is considered a major contributor in improving Web Search Skills and recognizes Online Research work as a full-time occupation for those that use the Internet as their primary source of information.

Get Exclusive Research Tips in Your Inbox

Receive Great tips via email, enter your email to Subscribe.