{"title":"基于粗糙集及关联分析的KNN文本分类研究","authors":"Guo Ai-zhang, Yang Tao","doi":"10.1109/DCABES.2015.127","DOIUrl":null,"url":null,"abstract":"With the rapid development of network information technology, the text is as a basic information carrier and begins to present exponential growth. The existing text classification methods haven't got information from the vast amounts of information resources timely and accurately. In order to solve the problem, the paper puts forward a new method about text categorization. It is a KNN algorithm based on rough set and correlation analysis. Firstly, we introduce the concept of rough set. In the training set of text vector space, we divide all kinds of text vector spaces into certain and uncertain areas. For certain areas, we can directly judge its category. For uncertain areas, we determine the type of text vector through KNN text classification algorithm based on correlation analysis. Experimental results show that the KNN text classification algorithm based on rough sets and the associated analysis have greatly improved the efficiency and accuracy of text categorization. It can meet the requirements of processing large amounts of text data.","PeriodicalId":444588,"journal":{"name":"2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Based on Rough Sets and the Associated Analysis of KNN Text Classification Research\",\"authors\":\"Guo Ai-zhang, Yang Tao\",\"doi\":\"10.1109/DCABES.2015.127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid development of network information technology, the text is as a basic information carrier and begins to present exponential growth. The existing text classification methods haven't got information from the vast amounts of information resources timely and accurately. In order to solve the problem, the paper puts forward a new method about text categorization. It is a KNN algorithm based on rough set and correlation analysis. Firstly, we introduce the concept of rough set. In the training set of text vector space, we divide all kinds of text vector spaces into certain and uncertain areas. For certain areas, we can directly judge its category. For uncertain areas, we determine the type of text vector through KNN text classification algorithm based on correlation analysis. Experimental results show that the KNN text classification algorithm based on rough sets and the associated analysis have greatly improved the efficiency and accuracy of text categorization. It can meet the requirements of processing large amounts of text data.\",\"PeriodicalId\":444588,\"journal\":{\"name\":\"2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCABES.2015.127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCABES.2015.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Based on Rough Sets and the Associated Analysis of KNN Text Classification Research
With the rapid development of network information technology, the text is as a basic information carrier and begins to present exponential growth. The existing text classification methods haven't got information from the vast amounts of information resources timely and accurately. In order to solve the problem, the paper puts forward a new method about text categorization. It is a KNN algorithm based on rough set and correlation analysis. Firstly, we introduce the concept of rough set. In the training set of text vector space, we divide all kinds of text vector spaces into certain and uncertain areas. For certain areas, we can directly judge its category. For uncertain areas, we determine the type of text vector through KNN text classification algorithm based on correlation analysis. Experimental results show that the KNN text classification algorithm based on rough sets and the associated analysis have greatly improved the efficiency and accuracy of text categorization. It can meet the requirements of processing large amounts of text data.