{"title":"Model of Data Gathering and Processing on Tibetan and Uyghur Language","authors":"Yunfeng Weng, Hanxin Jia, Qing Ma","doi":"10.1109/ICINIS.2012.81","DOIUrl":null,"url":null,"abstract":"A model of web data gathering and processing on Tibetan and Uyghur language is introduced in this paper, including page crawler, content extraction, word segmentation and frequency statistics and display. Firstly, It extracts the website's templates and use the template to extract the content and title of the web page, then the software transforms the HTML file to the XML file. The second step is to segment the content of XML file into words and to count the number of words, in order to store the statistics into database. Finally\", \"there is a web page to display the the result of the frequency statistics.","PeriodicalId":302503,"journal":{"name":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fifth International Conference on Intelligent Networks and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICINIS.2012.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A model of web data gathering and processing on Tibetan and Uyghur language is introduced in this paper, including page crawler, content extraction, word segmentation and frequency statistics and display. Firstly, It extracts the website's templates and use the template to extract the content and title of the web page, then the software transforms the HTML file to the XML file. The second step is to segment the content of XML file into words and to count the number of words, in order to store the statistics into database. Finally", "there is a web page to display the the result of the frequency statistics.