Guixian Xu, D. Zhong, Xu Gao, Yuan Lin, Xiaobing Zhao, Guosheng Yang
{"title":"Tibetan Web Information Collection System","authors":"Guixian Xu, D. Zhong, Xu Gao, Yuan Lin, Xiaobing Zhao, Guosheng Yang","doi":"10.1109/ICINIS.2012.46","DOIUrl":null,"url":null,"abstract":"Nutch is an open source web-search software project. This paper introduces a system called Tibetan web information collection system, which bases on Apache Nutch. It points out original program's shortcomings and proposes an improved method, which can utilize the Nutch to deal with Tibetan web pages and generate the files that we need. Besides, this paper shows how to update the data regularly and delete the duplicate data. It is useful and helpful for the study of Tibetan information processing.","PeriodicalId":302503,"journal":{"name":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICINIS.2012.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Nutch is an open source web-search software project. This paper introduces a system called Tibetan web information collection system, which bases on Apache Nutch. It points out original program's shortcomings and proposes an improved method, which can utilize the Nutch to deal with Tibetan web pages and generate the files that we need. Besides, this paper shows how to update the data regularly and delete the duplicate data. It is useful and helpful for the study of Tibetan information processing.