Guixian Xu, D. Zhong, Xu Gao, Yuan Lin, Xiaobing Zhao, Guosheng Yang
{"title":"藏文网络信息采集系统","authors":"Guixian Xu, D. Zhong, Xu Gao, Yuan Lin, Xiaobing Zhao, Guosheng Yang","doi":"10.1109/ICINIS.2012.46","DOIUrl":null,"url":null,"abstract":"Nutch is an open source web-search software project. This paper introduces a system called Tibetan web information collection system, which bases on Apache Nutch. It points out original program's shortcomings and proposes an improved method, which can utilize the Nutch to deal with Tibetan web pages and generate the files that we need. Besides, this paper shows how to update the data regularly and delete the duplicate data. It is useful and helpful for the study of Tibetan information processing.","PeriodicalId":302503,"journal":{"name":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Tibetan Web Information Collection System\",\"authors\":\"Guixian Xu, D. Zhong, Xu Gao, Yuan Lin, Xiaobing Zhao, Guosheng Yang\",\"doi\":\"10.1109/ICINIS.2012.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nutch is an open source web-search software project. This paper introduces a system called Tibetan web information collection system, which bases on Apache Nutch. It points out original program's shortcomings and proposes an improved method, which can utilize the Nutch to deal with Tibetan web pages and generate the files that we need. Besides, this paper shows how to update the data regularly and delete the duplicate data. It is useful and helpful for the study of Tibetan information processing.\",\"PeriodicalId\":302503,\"journal\":{\"name\":\"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICINIS.2012.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICINIS.2012.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nutch is an open source web-search software project. This paper introduces a system called Tibetan web information collection system, which bases on Apache Nutch. It points out original program's shortcomings and proposes an improved method, which can utilize the Nutch to deal with Tibetan web pages and generate the files that we need. Besides, this paper shows how to update the data regularly and delete the duplicate data. It is useful and helpful for the study of Tibetan information processing.