Wei Liu, Renze Xiong, Ning N. Cheng, Yiming Y. Sun
{"title":"Text Classification Method with Combination of Fuzzy Relation and Feature Distribution Variance","authors":"Wei Liu, Renze Xiong, Ning N. Cheng, Yiming Y. Sun","doi":"10.1145/3437802.3437829","DOIUrl":null,"url":null,"abstract":"To accurately express the fuzzy relation between word features and texts, and fuzzy relation between word features and categories respectively. A text classification method is proposed based on Fuzzy Relation and Feature Distribution Variance (FRFDV). This method firstly performs feature reduction and category feature word extraction according to the distribution of features in inter-category and intra-category. Then the method defines the word feature set, test text set and category set as fuzzy sets. Next, each text and category are represented respectively by defining the membership function of the word feature set to the test text set and the category set. When using word feature sets to represent categories, pay attention to the membership degree of features to categories and their distribution between categories; when using feature sets to represent test texts, give categorical feature words and non-categorical feature words with different weights. Finally, the fuzzy set correlation formula is used to calculate the correlation between the text and each category, and the category with the largest correlation is the category of the text. Comparing with the XGBOOST [Fang, 2020, Gong and Wang, 2018] algorithm and SVM algorithm, it is proved that the text classification method based on FRFDV is feasible. The accuracy of the results is higher by 2 % and 4 % respectively.","PeriodicalId":429866,"journal":{"name":"Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3437802.3437829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
To accurately express the fuzzy relation between word features and texts, and fuzzy relation between word features and categories respectively. A text classification method is proposed based on Fuzzy Relation and Feature Distribution Variance (FRFDV). This method firstly performs feature reduction and category feature word extraction according to the distribution of features in inter-category and intra-category. Then the method defines the word feature set, test text set and category set as fuzzy sets. Next, each text and category are represented respectively by defining the membership function of the word feature set to the test text set and the category set. When using word feature sets to represent categories, pay attention to the membership degree of features to categories and their distribution between categories; when using feature sets to represent test texts, give categorical feature words and non-categorical feature words with different weights. Finally, the fuzzy set correlation formula is used to calculate the correlation between the text and each category, and the category with the largest correlation is the category of the text. Comparing with the XGBOOST [Fang, 2020, Gong and Wang, 2018] algorithm and SVM algorithm, it is proved that the text classification method based on FRFDV is feasible. The accuracy of the results is higher by 2 % and 4 % respectively.
准确表达词特征与文本的模糊关系、词特征与类别的模糊关系。提出一种基于模糊关系和特征分布方差(FRFDV)的文本分类方法。该方法首先根据特征在类别间和类别内的分布进行特征约简和类别特征词提取。然后将词特征集、测试文本集和类别集定义为模糊集。接下来,通过定义单词特征集对测试文本集和类别集的隶属函数来分别表示每个文本和类别。在使用词特征集表示类别时,要注意特征与类别的隶属度及其在类别之间的分布;在使用特征集表示测试文本时,给出不同权重的分类特征词和非分类特征词。最后,利用模糊集关联公式计算文本与各类别之间的关联,关联度最大的类别即为该文本所属的类别。对比XGBOOST [Fang, 2020, Gong and Wang, 2018]算法和SVM算法,证明了基于FRFDV的文本分类方法是可行的。结果的准确度分别提高了2%和4%。