Mikel Hernandez, Gorka Epelde, R. Gil-Redondo, N. Embade, Ane Alberdi, I. Macía, Ó. Millet
{"title":"平衡代谢谱的过采样技术的比较评价","authors":"Mikel Hernandez, Gorka Epelde, R. Gil-Redondo, N. Embade, Ane Alberdi, I. Macía, Ó. Millet","doi":"10.1145/3569192.3569200","DOIUrl":null,"url":null,"abstract":"The problem of imbalanced data is common when applying data analytics paradigms to binary and multiclass data, such as statistical analyses, predictive models, and classification metrics sensitive to imbalanced data, i.e., accuracy. Although there exist some pre-processing, algorithms, and hybrid approaches, none of them has a special focus on balancing metabolic profiles for Metabolic Syndrome analysis. Since the insights and conclusions obtained from data analysis paradigms applied to metabolic data are relevant to the topic, statistical power may be lost due to an imbalance between the Metabolic Syndrome related subclasses. Thus, there is a need to balance metabolic data to improve the insights derived from these types of analyses. In this context, this paper presents a comparative evaluation of six oversampling techniques for balancing metabolic profiles (SMOTE, B-SMOTE, ADASYN, ROS, K-SMOTE, and SVM-SMOTE). An imbalanced dataset with 16 classes from the combinations of 4 binary metabolic conditions is used for this analysis. Additionally, a methodology is defined to objectively evaluate and compare the six oversampling techniques in terms of representativity and variety. The results have shown that ROS and SMOTE have been the best oversampling techniques to balance metabolic data, generating high-quality synthetic profiles that resemble the real ones while balancing all classes equally. This demonstrates that metabolomics studies focused on metabolic syndrome can trust in these oversampling methods to improve their conclusions.","PeriodicalId":249004,"journal":{"name":"Proceedings of the 9th International Conference on Bioinformatics Research and Applications","volume":"493 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Evaluation of Oversampling Techniques for Balancing Metabolic Profiles\",\"authors\":\"Mikel Hernandez, Gorka Epelde, R. Gil-Redondo, N. Embade, Ane Alberdi, I. Macía, Ó. Millet\",\"doi\":\"10.1145/3569192.3569200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of imbalanced data is common when applying data analytics paradigms to binary and multiclass data, such as statistical analyses, predictive models, and classification metrics sensitive to imbalanced data, i.e., accuracy. Although there exist some pre-processing, algorithms, and hybrid approaches, none of them has a special focus on balancing metabolic profiles for Metabolic Syndrome analysis. Since the insights and conclusions obtained from data analysis paradigms applied to metabolic data are relevant to the topic, statistical power may be lost due to an imbalance between the Metabolic Syndrome related subclasses. Thus, there is a need to balance metabolic data to improve the insights derived from these types of analyses. In this context, this paper presents a comparative evaluation of six oversampling techniques for balancing metabolic profiles (SMOTE, B-SMOTE, ADASYN, ROS, K-SMOTE, and SVM-SMOTE). An imbalanced dataset with 16 classes from the combinations of 4 binary metabolic conditions is used for this analysis. Additionally, a methodology is defined to objectively evaluate and compare the six oversampling techniques in terms of representativity and variety. The results have shown that ROS and SMOTE have been the best oversampling techniques to balance metabolic data, generating high-quality synthetic profiles that resemble the real ones while balancing all classes equally. This demonstrates that metabolomics studies focused on metabolic syndrome can trust in these oversampling methods to improve their conclusions.\",\"PeriodicalId\":249004,\"journal\":{\"name\":\"Proceedings of the 9th International Conference on Bioinformatics Research and Applications\",\"volume\":\"493 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th International Conference on Bioinformatics Research and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3569192.3569200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Bioinformatics Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3569192.3569200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Evaluation of Oversampling Techniques for Balancing Metabolic Profiles
The problem of imbalanced data is common when applying data analytics paradigms to binary and multiclass data, such as statistical analyses, predictive models, and classification metrics sensitive to imbalanced data, i.e., accuracy. Although there exist some pre-processing, algorithms, and hybrid approaches, none of them has a special focus on balancing metabolic profiles for Metabolic Syndrome analysis. Since the insights and conclusions obtained from data analysis paradigms applied to metabolic data are relevant to the topic, statistical power may be lost due to an imbalance between the Metabolic Syndrome related subclasses. Thus, there is a need to balance metabolic data to improve the insights derived from these types of analyses. In this context, this paper presents a comparative evaluation of six oversampling techniques for balancing metabolic profiles (SMOTE, B-SMOTE, ADASYN, ROS, K-SMOTE, and SVM-SMOTE). An imbalanced dataset with 16 classes from the combinations of 4 binary metabolic conditions is used for this analysis. Additionally, a methodology is defined to objectively evaluate and compare the six oversampling techniques in terms of representativity and variety. The results have shown that ROS and SMOTE have been the best oversampling techniques to balance metabolic data, generating high-quality synthetic profiles that resemble the real ones while balancing all classes equally. This demonstrates that metabolomics studies focused on metabolic syndrome can trust in these oversampling methods to improve their conclusions.