生物信号数据集的人口统计学报告:对开放存取的 PhysioNet 数据库的综合分析。

IF 23.8 1区 医学 Q1 MEDICAL INFORMATICS
Sarah Jiang , Perisa Ashar , Md Mobashir Hasan Shandhi , Jessilyn Dunn
{"title":"生物信号数据集的人口统计学报告:对开放存取的 PhysioNet 数据库的综合分析。","authors":"Sarah Jiang ,&nbsp;Perisa Ashar ,&nbsp;Md Mobashir Hasan Shandhi ,&nbsp;Jessilyn Dunn","doi":"10.1016/S2589-7500(24)00170-5","DOIUrl":null,"url":null,"abstract":"<div><div>The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial—specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.</div></div>","PeriodicalId":48534,"journal":{"name":"Lancet Digital Health","volume":null,"pages":null},"PeriodicalIF":23.8000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database\",\"authors\":\"Sarah Jiang ,&nbsp;Perisa Ashar ,&nbsp;Md Mobashir Hasan Shandhi ,&nbsp;Jessilyn Dunn\",\"doi\":\"10.1016/S2589-7500(24)00170-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial—specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.</div></div>\",\"PeriodicalId\":48534,\"journal\":{\"name\":\"Lancet Digital Health\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":23.8000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Lancet Digital Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589750024001705\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lancet Digital Health","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589750024001705","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

PhysioNet 开放存取数据库 (PND) 是世界上最大、最全面的生物信号数据存储库之一,被研究人员广泛用于开发、训练和验证算法。要使这些算法的结果符合实际情况,了解数据的基本人口分布至关重要,特别是研究参与者的种族、民族、性别和年龄。我们试图了解 PND 数据集人口统计数据的基本报告模式和特征。截至 2023 年 7 月 6 日,PND 上有 181 个独特的数据集,其中 175 个涉及人类参与者,只有不到 7% 的研究报告了所有四个关键人口统计学变量。此外,我们发现报告性别和年龄的比例高于报告种族和民族的比例。在包含参与者性别的研究中,样本大多为男性。此外,我们发现大多数研究都是在北美进行的,尤其是美国。这些不平衡和代表性报告的不足引起了人们对依赖于这些数据集的算法中潜在的嵌入式偏见的担忧。它们还强调了普遍和全面报告实践的必要性,以确保医学中人工智能和机器学习工具的公平开发和部署。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database
The PhysioNet open access database (PND) is one of the world's largest and most comprehensive repositories of biosignal data and is widely used by researchers to develop, train, and validate algorithms. To contextualise the results of such algorithms, understanding the underlying demographic distribution of the data is crucial—specifically, the race, ethnicity, sex or gender, and age of study participants. We sought to understand the underlying reporting patterns and characteristics of the demographic data of the datasets available on PND. Of the 181 unique datasets present in the PND as of July 6, 2023, 175 involved human participants, with less than 7% of studies reporting on all four of the key demographic variables. Furthermore, we found a higher rate of reporting sex or gender and age than race and ethnicity. In the studies that did include participant sex or gender, the samples were mostly male. Additionally, we found that most studies were done in North America, particularly in the USA. These imbalances and poor reporting of representation raise concerns regarding potential embedded biases in the algorithms that rely on these datasets. They also underscore the need for universal and comprehensive reporting practices to ensure equitable development and deployment of artificial intelligence and machine learning tools in medicine.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
41.20
自引率
1.60%
发文量
232
审稿时长
13 weeks
期刊介绍: The Lancet Digital Health publishes important, innovative, and practice-changing research on any topic connected with digital technology in clinical medicine, public health, and global health. The journal’s open access content crosses subject boundaries, building bridges between health professionals and researchers.By bringing together the most important advances in this multidisciplinary field,The Lancet Digital Health is the most prominent publishing venue in digital health. We publish a range of content types including Articles,Review, Comment, and Correspondence, contributing to promoting digital technologies in health practice worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信