{"title":"文本分类采用基于分数的k-NN方法和术语到类别的相关性加权方案","authors":"Ahmed Ben Afia, H. Amiri","doi":"10.1504/IJSISE.2016.078268","DOIUrl":null,"url":null,"abstract":"Text categorisation is the task of deciding whether a document belongs to a set of pre-specified classes of documents. To reach this goal, a TC system must include two basic stages. First stage consists on features extraction using a term weighting scheme. Second stage is the classification using a machine learning algorithm. After proposing, a new term to category relevance weighting scheme, called TF.IDF.TCR, we focus on finding a new algorithm to perform classification step. Results of our experiments, in which we use many classifiers, show promising performances. On the other hand, using relevance to category to improve the term's discriminating power appears to be inapplicable when classifying an unlabelled document. As a solution, we propose a k-NN based approach using scores calculating in order to resolve the problem of unknown category.","PeriodicalId":56359,"journal":{"name":"International Journal of Signal and Imaging Systems Engineering","volume":"9 1","pages":"283"},"PeriodicalIF":0.6000,"publicationDate":"2016-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/IJSISE.2016.078268","citationCount":"0","resultStr":"{\"title\":\"Text classification using scores based k-NN approach and term to category relevance weighting scheme\",\"authors\":\"Ahmed Ben Afia, H. Amiri\",\"doi\":\"10.1504/IJSISE.2016.078268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text categorisation is the task of deciding whether a document belongs to a set of pre-specified classes of documents. To reach this goal, a TC system must include two basic stages. First stage consists on features extraction using a term weighting scheme. Second stage is the classification using a machine learning algorithm. After proposing, a new term to category relevance weighting scheme, called TF.IDF.TCR, we focus on finding a new algorithm to perform classification step. Results of our experiments, in which we use many classifiers, show promising performances. On the other hand, using relevance to category to improve the term's discriminating power appears to be inapplicable when classifying an unlabelled document. As a solution, we propose a k-NN based approach using scores calculating in order to resolve the problem of unknown category.\",\"PeriodicalId\":56359,\"journal\":{\"name\":\"International Journal of Signal and Imaging Systems Engineering\",\"volume\":\"9 1\",\"pages\":\"283\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2016-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/IJSISE.2016.078268\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Signal and Imaging Systems Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJSISE.2016.078268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Signal and Imaging Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJSISE.2016.078268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
Text classification using scores based k-NN approach and term to category relevance weighting scheme
Text categorisation is the task of deciding whether a document belongs to a set of pre-specified classes of documents. To reach this goal, a TC system must include two basic stages. First stage consists on features extraction using a term weighting scheme. Second stage is the classification using a machine learning algorithm. After proposing, a new term to category relevance weighting scheme, called TF.IDF.TCR, we focus on finding a new algorithm to perform classification step. Results of our experiments, in which we use many classifiers, show promising performances. On the other hand, using relevance to category to improve the term's discriminating power appears to be inapplicable when classifying an unlabelled document. As a solution, we propose a k-NN based approach using scores calculating in order to resolve the problem of unknown category.