{"title":"What Number of Features is Optimal: A New Method Based on Approximation Function for Stance Detection Task","authors":"S. Vychegzhanin, E. Razova, E. Kotelnikov","doi":"10.1145/3357419.3357430","DOIUrl":null,"url":null,"abstract":"Selecting a text representation model faces a crucial problem of choosing an optimal number of features. The optimality criterion is the minimum number of features, which allows to achieve (or preserve) the maximum performance. The article suggests a new method of determining the optimal number of features, in which both components of the optimality criterion are taken into consideration. Using the proposed method, we first construct the dependence of task performance on the number of features, then the obtained dependence is approximated on the basis of Weibull distribution function, and the optimal number of features is determined by analyzing the growth rate of this function. We called this method DOFNAF (Determining the Optimal Feature Number by the Approximating Function). The proposed method is tested on stance detection task, consisting in identifying the position (\"for\" or \"against\"), which the author of the text supports towards the object (or objects) under discussion. The comparison involves constant methods, a method based on the function of the total number of features, a method of performance maximum, as well as Recursive Feature Elimination with Cross-Validation (RFECV) and Correlation-based Feature Selection (CFS) methods. The DOFNAF method allows to determine the minimum number of features compared with the existing methods and at the same time to maintain the classification performance.","PeriodicalId":261951,"journal":{"name":"Proceedings of the 9th International Conference on Information Communication and Management","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Information Communication and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3357419.3357430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Selecting a text representation model faces a crucial problem of choosing an optimal number of features. The optimality criterion is the minimum number of features, which allows to achieve (or preserve) the maximum performance. The article suggests a new method of determining the optimal number of features, in which both components of the optimality criterion are taken into consideration. Using the proposed method, we first construct the dependence of task performance on the number of features, then the obtained dependence is approximated on the basis of Weibull distribution function, and the optimal number of features is determined by analyzing the growth rate of this function. We called this method DOFNAF (Determining the Optimal Feature Number by the Approximating Function). The proposed method is tested on stance detection task, consisting in identifying the position ("for" or "against"), which the author of the text supports towards the object (or objects) under discussion. The comparison involves constant methods, a method based on the function of the total number of features, a method of performance maximum, as well as Recursive Feature Elimination with Cross-Validation (RFECV) and Correlation-based Feature Selection (CFS) methods. The DOFNAF method allows to determine the minimum number of features compared with the existing methods and at the same time to maintain the classification performance.