{"title":"基于DTB的智能主题网络爬虫","authors":"Ming-sheng Zhao, Peng Zhu, Tianchi He","doi":"10.1109/WISM.2010.155","DOIUrl":null,"url":null,"abstract":"Web crawling is a fundamental step in many Web applications, such as search engine and data mining. This paper proposes an intelligent topic Web crawler based on DTB (dynamic topic base), which through studying on Web crawlers which filter URLs based on different methods. This Web crawler can update the topic base automatically and improve the accuracy of URL filtering. Experimental results show that the proposed Web crawler can fetch more topic relevant Web pages by crawling less Web space and in less time.","PeriodicalId":119569,"journal":{"name":"2010 International Conference on Web Information Systems and Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An Intelligent Topic Web Crawler Based on DTB\",\"authors\":\"Ming-sheng Zhao, Peng Zhu, Tianchi He\",\"doi\":\"10.1109/WISM.2010.155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web crawling is a fundamental step in many Web applications, such as search engine and data mining. This paper proposes an intelligent topic Web crawler based on DTB (dynamic topic base), which through studying on Web crawlers which filter URLs based on different methods. This Web crawler can update the topic base automatically and improve the accuracy of URL filtering. Experimental results show that the proposed Web crawler can fetch more topic relevant Web pages by crawling less Web space and in less time.\",\"PeriodicalId\":119569,\"journal\":{\"name\":\"2010 International Conference on Web Information Systems and Mining\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 International Conference on Web Information Systems and Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WISM.2010.155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Web Information Systems and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISM.2010.155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Web crawling is a fundamental step in many Web applications, such as search engine and data mining. This paper proposes an intelligent topic Web crawler based on DTB (dynamic topic base), which through studying on Web crawlers which filter URLs based on different methods. This Web crawler can update the topic base automatically and improve the accuracy of URL filtering. Experimental results show that the proposed Web crawler can fetch more topic relevant Web pages by crawling less Web space and in less time.