Online Library Content Generation Using Focused Crawling Based Upon Meta Tags and Tf-Idf

2013 International Symposium on Computational and Business Intelligence Pub Date : 2013-08-24 DOI:10.1109/ISCBI.2013.73

Mukesh Kumar, R. Vig

引用次数: 0

Abstract

Electronic library is the collection of digital information related to an individual domain and in turn to all domains. A focused crawler traverses the Web looking for the pages most relevant to a domain and at the same time discarding the irrelevant pages and hence is helpful for generating the-e contents for digital library related to a particular domain. In this paper a focused crawling technique to generate online contents for e-library is proposed. The applicability of the proposed approach is shown by retrieving the documents which are highly related to a single domain. The quality of the pages included into the library is derived from the relevancy measure of the page with the content of domain related pages.

查看原文本刊更多论文

基于元标签和Tf-Idf的集中爬行在线图书馆内容生成

电子图书馆是与一个单独领域相关的数字信息的集合，反过来又与所有领域相关。有重点的爬虫遍历Web，寻找与域最相关的页面，同时丢弃不相关的页面，因此有助于生成与特定域相关的数字图书馆的e内容。本文提出了一种针对电子图书馆在线内容生成的聚焦爬行技术。通过检索与单一领域高度相关的文档，证明了该方法的适用性。包含在库中的页面的质量来源于页面与域相关页面内容的相关性度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Symposium on Computational and Business Intelligence

自引率

0.00%

发文量