{"title":"基于元标签和Tf-Idf的集中爬行在线图书馆内容生成","authors":"Mukesh Kumar, R. Vig","doi":"10.1109/ISCBI.2013.73","DOIUrl":null,"url":null,"abstract":"Electronic library is the collection of digital information related to an individual domain and in turn to all domains. A focused crawler traverses the Web looking for the pages most relevant to a domain and at the same time discarding the irrelevant pages and hence is helpful for generating the-e contents for digital library related to a particular domain. In this paper a focused crawling technique to generate online contents for e-library is proposed. The applicability of the proposed approach is shown by retrieving the documents which are highly related to a single domain. The quality of the pages included into the library is derived from the relevancy measure of the page with the content of domain related pages.","PeriodicalId":311471,"journal":{"name":"2013 International Symposium on Computational and Business Intelligence","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Library Content Generation Using Focused Crawling Based Upon Meta Tags and Tf-Idf\",\"authors\":\"Mukesh Kumar, R. Vig\",\"doi\":\"10.1109/ISCBI.2013.73\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Electronic library is the collection of digital information related to an individual domain and in turn to all domains. A focused crawler traverses the Web looking for the pages most relevant to a domain and at the same time discarding the irrelevant pages and hence is helpful for generating the-e contents for digital library related to a particular domain. In this paper a focused crawling technique to generate online contents for e-library is proposed. The applicability of the proposed approach is shown by retrieving the documents which are highly related to a single domain. The quality of the pages included into the library is derived from the relevancy measure of the page with the content of domain related pages.\",\"PeriodicalId\":311471,\"journal\":{\"name\":\"2013 International Symposium on Computational and Business Intelligence\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Symposium on Computational and Business Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCBI.2013.73\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Symposium on Computational and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCBI.2013.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online Library Content Generation Using Focused Crawling Based Upon Meta Tags and Tf-Idf
Electronic library is the collection of digital information related to an individual domain and in turn to all domains. A focused crawler traverses the Web looking for the pages most relevant to a domain and at the same time discarding the irrelevant pages and hence is helpful for generating the-e contents for digital library related to a particular domain. In this paper a focused crawling technique to generate online contents for e-library is proposed. The applicability of the proposed approach is shown by retrieving the documents which are highly related to a single domain. The quality of the pages included into the library is derived from the relevancy measure of the page with the content of domain related pages.