{"title":"Removing non-informative blocks from the web pages","authors":"R. Gunasundari, S. Karthikeyan","doi":"10.1109/ICCCCT.2010.5670731","DOIUrl":null,"url":null,"abstract":"With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. But user is interested only in the informative contents and not in non-informative content blocks. Web pages often contain navigation sidebars, advertisements, search blocks, copyright notices, etc which are not content blocks. The information contained in these non-content blocks can harm web mining. So it is important to separate the informative primary content blocks from non-informative blocks. In this paper are proposed three different algorithms for removing non-content blocks from the web pages. Removal of non-informative content blocks from web pages can achieve significant storage and time saving.","PeriodicalId":250834,"journal":{"name":"2010 INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCCT.2010.5670731","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. But user is interested only in the informative contents and not in non-informative content blocks. Web pages often contain navigation sidebars, advertisements, search blocks, copyright notices, etc which are not content blocks. The information contained in these non-content blocks can harm web mining. So it is important to separate the informative primary content blocks from non-informative blocks. In this paper are proposed three different algorithms for removing non-content blocks from the web pages. Removal of non-informative content blocks from web pages can achieve significant storage and time saving.