{"title":"数据评估的进化方法","authors":"Natalia Khuri, Sapana Bhandari, Esteban Murillo Burford, Nathan P. Whitener, Konghao Zhao","doi":"10.1145/3535508.3545522","DOIUrl":null,"url":null,"abstract":"Data valuation in machine learning comprises computational methods for the estimation of the importance of individual training instances. It has been used to remove noise, uncover biases, and improve the accuracy of trained models. Current data valuation techniques do not scale up for large datasets and do not work for regression tasks, where the objective is to predict a numerical outcome rather than a small number of nominal class labels. In this work, an evolutionary approach for qualitative and quantitative data valuation, is presented. The proposed approach is tested on regression and classification benchmarks, and on several bioinformatics and health informatics datasets. In addition, models trained with most valuable subsets of data are validated on independently acquired tests, demonstrating the generalizability as well as the practical utility of the proposed approach.","PeriodicalId":354504,"journal":{"name":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An evolutionary approach to data valuation\",\"authors\":\"Natalia Khuri, Sapana Bhandari, Esteban Murillo Burford, Nathan P. Whitener, Konghao Zhao\",\"doi\":\"10.1145/3535508.3545522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data valuation in machine learning comprises computational methods for the estimation of the importance of individual training instances. It has been used to remove noise, uncover biases, and improve the accuracy of trained models. Current data valuation techniques do not scale up for large datasets and do not work for regression tasks, where the objective is to predict a numerical outcome rather than a small number of nominal class labels. In this work, an evolutionary approach for qualitative and quantitative data valuation, is presented. The proposed approach is tested on regression and classification benchmarks, and on several bioinformatics and health informatics datasets. In addition, models trained with most valuable subsets of data are validated on independently acquired tests, demonstrating the generalizability as well as the practical utility of the proposed approach.\",\"PeriodicalId\":354504,\"journal\":{\"name\":\"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3535508.3545522\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3535508.3545522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data valuation in machine learning comprises computational methods for the estimation of the importance of individual training instances. It has been used to remove noise, uncover biases, and improve the accuracy of trained models. Current data valuation techniques do not scale up for large datasets and do not work for regression tasks, where the objective is to predict a numerical outcome rather than a small number of nominal class labels. In this work, an evolutionary approach for qualitative and quantitative data valuation, is presented. The proposed approach is tested on regression and classification benchmarks, and on several bioinformatics and health informatics datasets. In addition, models trained with most valuable subsets of data are validated on independently acquired tests, demonstrating the generalizability as well as the practical utility of the proposed approach.