理解人工智能数据集中年龄的表征和代表性

J. Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar, M. Morris
{"title":"理解人工智能数据集中年龄的表征和代表性","authors":"J. Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar, M. Morris","doi":"10.1145/3461702.3462590","DOIUrl":null,"url":null,"abstract":"A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects' ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of their subjects included older adults (defined as older than 65 years), while only one included oldest-old adults (defined as older than 85 years). Additionally, we find that only 24 of the data sets include any age-related information in their documentation or metadata, and that there is no consistent method followed across these data sets to collect and record the subjects' ages. We recognize the unique difficulties in creating representative data sets in terms of age, but raise it as an important dimension that researchers and engineers interested in inclusive AI should consider.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Understanding the Representation and Representativeness of Age in AI Data Sets\",\"authors\":\"J. Park, Michael S. Bernstein, Robin N. Brewer, Ece Kamar, M. Morris\",\"doi\":\"10.1145/3461702.3462590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects' ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of their subjects included older adults (defined as older than 65 years), while only one included oldest-old adults (defined as older than 85 years). Additionally, we find that only 24 of the data sets include any age-related information in their documentation or metadata, and that there is no consistent method followed across these data sets to collect and record the subjects' ages. We recognize the unique difficulties in creating representative data sets in terms of age, but raise it as an important dimension that researchers and engineers interested in inclusive AI should consider.\",\"PeriodicalId\":197336,\"journal\":{\"name\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461702.3462590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

在人工智能训练数据集中,不同人口群体的多样化表示对于确保模型适用于大范围的用户非常重要。为此,最近在人工智能公平性和包容性方面的努力提倡创建在种族、性别、社会经济地位和残疾状况之间保持良好平衡的人工智能数据集。在本文中,我们通过询问老年人在人工智能数据集中是否与总体人口成比例地代表年龄,从而关注年龄的表示,从而为这一工作做出贡献。我们研究了关于92个面部数据集的公开信息,以了解它们是如何将年龄编纂为案例研究,以调查受试者的年龄是如何记录的,以及是否代表了老一辈。我们发现老年人的比例非常低;研究中有五个数据集明确记录了受试者的封闭年龄间隔,其中包括老年人(定义为65岁以上),而只有一个数据集包括老年人(定义为85岁以上)。此外,我们发现只有24个数据集在其文档或元数据中包含任何与年龄相关的信息,并且这些数据集没有遵循一致的方法来收集和记录受试者的年龄。我们认识到在年龄方面创建代表性数据集的独特困难,但将其作为对包容性人工智能感兴趣的研究人员和工程师应该考虑的重要维度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Understanding the Representation and Representativeness of Age in AI Data Sets
A diverse representation of different demographic groups in AI training data sets is important in ensuring that the models will work for a large range of users. To this end, recent efforts in AI fairness and inclusion have advocated for creating AI data sets that are well-balanced across race, gender, socioeconomic status, and disability status. In this paper, we contribute to this line of work by focusing on the representation of age by asking whether older adults are represented proportionally to the population at large in AI data sets. We examine publicly-available information about 92 face data sets to understand how they codify age as a case study to investigate how the subjects' ages are recorded and whether older generations are represented. We find that older adults are very under-represented; five data sets in the study that explicitly documented the closed age intervals of their subjects included older adults (defined as older than 65 years), while only one included oldest-old adults (defined as older than 85 years). Additionally, we find that only 24 of the data sets include any age-related information in their documentation or metadata, and that there is no consistent method followed across these data sets to collect and record the subjects' ages. We recognize the unique difficulties in creating representative data sets in terms of age, but raise it as an important dimension that researchers and engineers interested in inclusive AI should consider.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信