{"title":"基于Web结构的新闻采集系统的研究与实现","authors":"Jianguo Chen, Minrong Lu, Xiao Ke","doi":"10.1109/ICFCSE.2011.128","DOIUrl":null,"url":null,"abstract":"On the basis of depth studying the technology of web information gathering, a web structure-based news gathering model is proposed. Firstly, it load the gathering entry address, find the news list page with the Information Gathering and Filter Algorithm, then identify and improve the news content page link address according to the rules set by acquisition and combined with regular expression technology automatically, and then load the target page-news content page, gather the news information with the algorithm automatically. At the same time, it can filter any information that is set in this page such as embedded advertising messages. Practical results show that the proposed model works well, it can gather news information efficiently and automatically.","PeriodicalId":279889,"journal":{"name":"2011 International Conference on Future Computer Science and Education","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research and Implementation of Web Structure-Based News Gathering System\",\"authors\":\"Jianguo Chen, Minrong Lu, Xiao Ke\",\"doi\":\"10.1109/ICFCSE.2011.128\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"On the basis of depth studying the technology of web information gathering, a web structure-based news gathering model is proposed. Firstly, it load the gathering entry address, find the news list page with the Information Gathering and Filter Algorithm, then identify and improve the news content page link address according to the rules set by acquisition and combined with regular expression technology automatically, and then load the target page-news content page, gather the news information with the algorithm automatically. At the same time, it can filter any information that is set in this page such as embedded advertising messages. Practical results show that the proposed model works well, it can gather news information efficiently and automatically.\",\"PeriodicalId\":279889,\"journal\":{\"name\":\"2011 International Conference on Future Computer Science and Education\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Future Computer Science and Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFCSE.2011.128\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Future Computer Science and Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFCSE.2011.128","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research and Implementation of Web Structure-Based News Gathering System
On the basis of depth studying the technology of web information gathering, a web structure-based news gathering model is proposed. Firstly, it load the gathering entry address, find the news list page with the Information Gathering and Filter Algorithm, then identify and improve the news content page link address according to the rules set by acquisition and combined with regular expression technology automatically, and then load the target page-news content page, gather the news information with the algorithm automatically. At the same time, it can filter any information that is set in this page such as embedded advertising messages. Practical results show that the proposed model works well, it can gather news information efficiently and automatically.