- Dorothy Allen
- Research Papers
- Keywords Research
Abstracting Keywords from Hypertext Documents
This paper presents a process for abstracting keywords from hypertext or text documents. The abstracted keywords, like keywords listed in a paper, identify the contents of a document. Our proposed process can be used, for example, to identify the contents of HTML documents returned from a search engine, to allow users to quickly find their needed information. The proposed process not only considers the occurrent frequency of a word in a document, like other related works, but also considers the occurrent frequency of its synonyms. It also considers key phrases consisting of two or three words. To increase the accuracy of the frequency count of words, a stemming algorithm is used to remove suffixes. Our tests show that the stemming algorithm consumed on average 56.7% of the total computation time, and that the proposed process can on average abstract 52% of the keywords provided by the authors of the tested documents.
- Hits: 677