基于机器学习的研究的可重复性:概述、障碍和驱动因素

IF 2.5 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ai Magazine Pub Date : 2025-04-14 DOI:10.1002/aaai.70002
Harald Semmelrock, Tony Ross-Hellauer, Simone Kopeinik, Dieter Theiler, Armin Haberl, Stefan Thalmann, Dominik Kowald
{"title":"基于机器学习的研究的可重复性:概述、障碍和驱动因素","authors":"Harald Semmelrock,&nbsp;Tony Ross-Hellauer,&nbsp;Simone Kopeinik,&nbsp;Dieter Theiler,&nbsp;Armin Haberl,&nbsp;Stefan Thalmann,&nbsp;Dominik Kowald","doi":"10.1002/aaai.70002","DOIUrl":null,"url":null,"abstract":"<p>Many research fields are currently reckoning with issues of poor levels of reproducibility. Some label it a “crisis,” and research employing or building machine learning (ML) models is no exception. Issues including lack of transparency, data or code, poor adherence to standards, and the sensitivity of ML training conditions mean that many papers are not even reproducible in principle. Where they are, though, reproducibility experiments have found worryingly low degrees of similarity with original results. Despite previous appeals from ML researchers on this topic and various initiatives from conference reproducibility tracks to the ACM's new Emerging Interest Group on Reproducibility and Replicability, we contend that the general community continues to take this issue too lightly. Poor reproducibility threatens trust in and integrity of research results. Therefore, in this article, we lay out a new perspective on the key barriers and drivers (both procedural and technical) to increased reproducibility at various levels (methods, code, data, and experiments). We then map the drivers to the barriers to give concrete advice for strategies for researchers to mitigate reproducibility issues in their own work, to lay out key areas where further research is needed in specific areas, and to further ignite discussion on the threat presented by these urgent issues.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"46 2","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70002","citationCount":"0","resultStr":"{\"title\":\"Reproducibility in machine-learning-based research: Overview, barriers, and drivers\",\"authors\":\"Harald Semmelrock,&nbsp;Tony Ross-Hellauer,&nbsp;Simone Kopeinik,&nbsp;Dieter Theiler,&nbsp;Armin Haberl,&nbsp;Stefan Thalmann,&nbsp;Dominik Kowald\",\"doi\":\"10.1002/aaai.70002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Many research fields are currently reckoning with issues of poor levels of reproducibility. Some label it a “crisis,” and research employing or building machine learning (ML) models is no exception. Issues including lack of transparency, data or code, poor adherence to standards, and the sensitivity of ML training conditions mean that many papers are not even reproducible in principle. Where they are, though, reproducibility experiments have found worryingly low degrees of similarity with original results. Despite previous appeals from ML researchers on this topic and various initiatives from conference reproducibility tracks to the ACM's new Emerging Interest Group on Reproducibility and Replicability, we contend that the general community continues to take this issue too lightly. Poor reproducibility threatens trust in and integrity of research results. Therefore, in this article, we lay out a new perspective on the key barriers and drivers (both procedural and technical) to increased reproducibility at various levels (methods, code, data, and experiments). We then map the drivers to the barriers to give concrete advice for strategies for researchers to mitigate reproducibility issues in their own work, to lay out key areas where further research is needed in specific areas, and to further ignite discussion on the threat presented by these urgent issues.</p>\",\"PeriodicalId\":7854,\"journal\":{\"name\":\"Ai Magazine\",\"volume\":\"46 2\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.70002\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ai Magazine\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/aaai.70002\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ai Magazine","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aaai.70002","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

许多研究领域目前都在考虑可重复性差的问题。一些人将其称为“危机”,使用或构建机器学习(ML)模型的研究也不例外。缺乏透明度、数据或代码、对标准的依从性差以及ML训练条件的敏感性等问题意味着许多论文在原则上甚至是不可复制的。然而,可重复性实验发现,与原始结果的相似度低得令人担忧。尽管之前机器学习研究人员对这个主题提出了呼吁,并且从会议可重复性跟踪到ACM新成立的可重复性和可复制性兴趣小组,我们认为一般社区仍然对这个问题过于轻视。低可重复性威胁到对研究结果的信任和完整性。因此,在本文中,我们将从一个新的角度来看待在不同层次(方法、代码、数据和实验)上提高再现性的关键障碍和驱动因素(程序和技术)。然后,我们将驱动因素映射到障碍,为研究人员提供具体的策略建议,以减轻他们自己工作中的可重复性问题,列出在特定领域需要进一步研究的关键领域,并进一步引发对这些紧迫问题所带来的威胁的讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Reproducibility in machine-learning-based research: Overview, barriers, and drivers

Reproducibility in machine-learning-based research: Overview, barriers, and drivers

Many research fields are currently reckoning with issues of poor levels of reproducibility. Some label it a “crisis,” and research employing or building machine learning (ML) models is no exception. Issues including lack of transparency, data or code, poor adherence to standards, and the sensitivity of ML training conditions mean that many papers are not even reproducible in principle. Where they are, though, reproducibility experiments have found worryingly low degrees of similarity with original results. Despite previous appeals from ML researchers on this topic and various initiatives from conference reproducibility tracks to the ACM's new Emerging Interest Group on Reproducibility and Replicability, we contend that the general community continues to take this issue too lightly. Poor reproducibility threatens trust in and integrity of research results. Therefore, in this article, we lay out a new perspective on the key barriers and drivers (both procedural and technical) to increased reproducibility at various levels (methods, code, data, and experiments). We then map the drivers to the barriers to give concrete advice for strategies for researchers to mitigate reproducibility issues in their own work, to lay out key areas where further research is needed in specific areas, and to further ignite discussion on the threat presented by these urgent issues.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Ai Magazine
Ai Magazine 工程技术-计算机:人工智能
CiteScore
3.90
自引率
11.10%
发文量
61
审稿时长
>12 weeks
期刊介绍: AI Magazine publishes original articles that are reasonably self-contained and aimed at a broad spectrum of the AI community. Technical content should be kept to a minimum. In general, the magazine does not publish articles that have been published elsewhere in whole or in part. The magazine welcomes the contribution of articles on the theory and practice of AI as well as general survey articles, tutorial articles on timely topics, conference or symposia or workshop reports, and timely columns on topics of interest to AI scientists.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信