{"title":"基于交叉采样的非平衡数据集遗传规划二元文本分类","authors":"Mona Khalifa A. Aljero, Nazife Dimililer","doi":"10.55730/1300-0632.3978","DOIUrl":null,"url":null,"abstract":": It is well known that classifiers trained using imbalanced datasets usually have a bias toward the majority class. In this context, classification models can present a high classification performance overall and for the majority class, even when the performance for the minority class is significantly lower. This paper presents a genetic programming (GP) model with a crossover-based oversampling technique for oversampling the imbalanced dataset for binary text classification. The aim of this study is to apply an oversampling technique to solve the imbalanced issue and improve the performance of the GP model that employed the proposed technique. The proposed technique employs a crossover operator for generating new samples for the minority class in an imbalanced text dataset. By using a combination of this crossover-based oversampling technique with GP, the performance was improved. It is shown that the proposed combination outperforms all GP applications that use the original dataset without resampling. Moreover, the performance of the proposed system surpassed GP approaches using the synthetic minority oversampling technique (SMOTE) and random oversampling. Further comparison with the state-of-the-art on five imbalanced text datasets in terms of F1-score shows the superior performance of the proposed approach.","PeriodicalId":23352,"journal":{"name":"Turkish J. Electr. Eng. Comput. Sci.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Binary text classification using genetic programming with crossover-based oversampling for imbalanced datasets\",\"authors\":\"Mona Khalifa A. Aljero, Nazife Dimililer\",\"doi\":\"10.55730/1300-0632.3978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": It is well known that classifiers trained using imbalanced datasets usually have a bias toward the majority class. In this context, classification models can present a high classification performance overall and for the majority class, even when the performance for the minority class is significantly lower. This paper presents a genetic programming (GP) model with a crossover-based oversampling technique for oversampling the imbalanced dataset for binary text classification. The aim of this study is to apply an oversampling technique to solve the imbalanced issue and improve the performance of the GP model that employed the proposed technique. The proposed technique employs a crossover operator for generating new samples for the minority class in an imbalanced text dataset. By using a combination of this crossover-based oversampling technique with GP, the performance was improved. It is shown that the proposed combination outperforms all GP applications that use the original dataset without resampling. Moreover, the performance of the proposed system surpassed GP approaches using the synthetic minority oversampling technique (SMOTE) and random oversampling. Further comparison with the state-of-the-art on five imbalanced text datasets in terms of F1-score shows the superior performance of the proposed approach.\",\"PeriodicalId\":23352,\"journal\":{\"name\":\"Turkish J. Electr. Eng. Comput. Sci.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Turkish J. Electr. Eng. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.55730/1300-0632.3978\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish J. Electr. Eng. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55730/1300-0632.3978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Binary text classification using genetic programming with crossover-based oversampling for imbalanced datasets
: It is well known that classifiers trained using imbalanced datasets usually have a bias toward the majority class. In this context, classification models can present a high classification performance overall and for the majority class, even when the performance for the minority class is significantly lower. This paper presents a genetic programming (GP) model with a crossover-based oversampling technique for oversampling the imbalanced dataset for binary text classification. The aim of this study is to apply an oversampling technique to solve the imbalanced issue and improve the performance of the GP model that employed the proposed technique. The proposed technique employs a crossover operator for generating new samples for the minority class in an imbalanced text dataset. By using a combination of this crossover-based oversampling technique with GP, the performance was improved. It is shown that the proposed combination outperforms all GP applications that use the original dataset without resampling. Moreover, the performance of the proposed system surpassed GP approaches using the synthetic minority oversampling technique (SMOTE) and random oversampling. Further comparison with the state-of-the-art on five imbalanced text datasets in terms of F1-score shows the superior performance of the proposed approach.