mlscorecheck:测试机器学习中报告的性能分数和实验的一致性

arXiv (Cornell University) Pub Date : 2023-11-13 DOI:10.48550/arxiv.2311.07541

Kovács, György, Fazekas, Attila

{"title":"mlscorecheck:测试机器学习中报告的性能分数和实验的一致性","authors":"Kovács, György, Fazekas, Attila","doi":"10.48550/arxiv.2311.07541","DOIUrl":null,"url":null,"abstract":"Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"107 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"mlscorecheck: Testing the consistency of reported performance scores and\\n experiments in machine learning\",\"authors\":\"Kovács, György, Fazekas, Attila\",\"doi\":\"10.48550/arxiv.2311.07541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.\",\"PeriodicalId\":496270,\"journal\":{\"name\":\"arXiv (Cornell University)\",\"volume\":\"107 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv (Cornell University)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arxiv.2311.07541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv (Cornell University)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arxiv.2311.07541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

通过验证报告的实验结果来解决人工智能中的可重复性危机是一项具有挑战性的任务。它需要重新实施技术或对论文进行细致的评估，以确定是否偏离科学方法和最佳统计实践。为了促进报告结果的验证，我们开发了能够识别机器学习问题中报告的性能分数与各种实验设置之间不一致的数值技术，包括二元/多类分类和回归。这些一致性测试被集成到开源包mlscorecheck中，该包还提供了专门的测试包，用于系统地检测各个领域中反复出现的缺陷，例如视网膜图像处理和合成少数派过采样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

mlscorecheck: Testing the consistency of reported performance scores and experiments in machine learning

Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers for deviations from the scientific method and best statistical practices. To facilitate the validation of reported results, we have developed numerical techniques capable of identifying inconsistencies between reported performance scores and various experimental setups in machine learning problems, including binary/multiclass classification and regression. These consistency tests are integrated into the open-source package mlscorecheck, which also provides specific test bundles designed to detect systematically recurring flaws in various fields, such as retina image processing and synthetic minority oversampling.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv (Cornell University)

自引率

0.00%

发文量