数据集伦理:前进需要后退

Arvind Narayanan
{"title":"数据集伦理:前进需要后退","authors":"Arvind Narayanan","doi":"10.1145/3461702.3462643","DOIUrl":null,"url":null,"abstract":"Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The Ethics of Datasets: Moving Forward Requires Stepping Back\",\"authors\":\"Arvind Narayanan\",\"doi\":\"10.1145/3461702.3462643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.\",\"PeriodicalId\":197336,\"journal\":{\"name\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461702.3462643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

与大多数其他研究领域相比,机器学习研究文化在更大程度上受到基准数据集的驱动。但数据集的中心化也放大了与数据相关的危害,包括侵犯隐私和某些人群的代表性不足或被抹去。这引发了一场关于数据集创建者和用户的道德责任的争论。我认为,要弄清楚这场争论,需要退一步来更好地理解数据集驱动方法的好处。我展示了基准数据集至少扮演六种不同的角色,并且潜在的危害取决于数据集扮演的角色。通过理解这种关系,我们可以减轻危害,同时保留主流方法的科学价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The Ethics of Datasets: Moving Forward Requires Stepping Back
Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信