{"title":"Knowledge Based Neural Network for Text Classification","authors":"R. D. Goyal","doi":"10.1109/GrC.2007.108","DOIUrl":null,"url":null,"abstract":"Automatic text classification has gained huge popularity with the advancement of information technology. Bayesian method has been found highly appropriate for text classification but it suffers from a number of problems. When there is large number of categories, lack of uniformity in training data becomes a big problem. Some nodes may get less training documents, while other may get a very large number. Therefore, some nodes are biased over others. Besides, presence of noise data or outliers also creates problems. Moreover, when documents are very small, just like a line item describing a product, the problem becomes more difficult. In this paper we describe a method that combines naive Bayesian text classification technique and neural networks to handle these problems. We start with a naive Bayesian classifier, which has the linear separating surfaces. We modify the separating surfaces using neural network to find better separating surfaces and hence better classification accuracy over validation data.","PeriodicalId":259430,"journal":{"name":"2007 IEEE International Conference on Granular Computing (GRC 2007)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE International Conference on Granular Computing (GRC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GrC.2007.108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 42
Abstract
Automatic text classification has gained huge popularity with the advancement of information technology. Bayesian method has been found highly appropriate for text classification but it suffers from a number of problems. When there is large number of categories, lack of uniformity in training data becomes a big problem. Some nodes may get less training documents, while other may get a very large number. Therefore, some nodes are biased over others. Besides, presence of noise data or outliers also creates problems. Moreover, when documents are very small, just like a line item describing a product, the problem becomes more difficult. In this paper we describe a method that combines naive Bayesian text classification technique and neural networks to handle these problems. We start with a naive Bayesian classifier, which has the linear separating surfaces. We modify the separating surfaces using neural network to find better separating surfaces and hence better classification accuracy over validation data.