Diversity, inclusivity and traceability of mammography datasets used in development of Artificial Intelligence technologies: a systematic review

IF 1.8 4区 医学 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Elinor Laws , Joanne Palmer , Joseph Alderman , Ojasvi Sharma , Victoria Ngai , Thomas Salisbury , Gulmeena Hussain , Sumiya Ahmed , Gagandeep Sachdeva , Sonam Vadera , Bilal Mateen , Rubeta Matin , Stephanie Kuku , Melanie Calvert , Jacqui Gath , Darren Treanor , Melissa McCradden , Maxine Mackintosh , Judy Gichoya , Hari Trivedi , Xiaoxuan Liu
{"title":"Diversity, inclusivity and traceability of mammography datasets used in development of Artificial Intelligence technologies: a systematic review","authors":"Elinor Laws ,&nbsp;Joanne Palmer ,&nbsp;Joseph Alderman ,&nbsp;Ojasvi Sharma ,&nbsp;Victoria Ngai ,&nbsp;Thomas Salisbury ,&nbsp;Gulmeena Hussain ,&nbsp;Sumiya Ahmed ,&nbsp;Gagandeep Sachdeva ,&nbsp;Sonam Vadera ,&nbsp;Bilal Mateen ,&nbsp;Rubeta Matin ,&nbsp;Stephanie Kuku ,&nbsp;Melanie Calvert ,&nbsp;Jacqui Gath ,&nbsp;Darren Treanor ,&nbsp;Melissa McCradden ,&nbsp;Maxine Mackintosh ,&nbsp;Judy Gichoya ,&nbsp;Hari Trivedi ,&nbsp;Xiaoxuan Liu","doi":"10.1016/j.clinimag.2024.110369","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>There are many radiological datasets for breast cancer, some which have supported the development of AI medical devices for breast cancer screening and image classification. This review aims to identify mammography datasets (including digitised screen film mammography, 2D digital mammography and digital breast tomosynthesis) used in the development of AI technologies and present their characteristics, including their transparency of documentation, content, populations included and accessibility.</div></div><div><h3>Materials and methods</h3><div>MEDLINE and Google Dataset searches identified studies describing AI technology development and referencing breast imaging datasets up to June 2024. The characteristics of each dataset are summarised. In particular, the accompanying documentation was reviewed with a focus on diversity and inclusion of populations represented within each dataset.</div></div><div><h3>Results</h3><div>254 datasets were referenced in the literature search, 190 were privately held, 36 had barriers which prevented access, and 28 were accessible. Most datasets originated from Europe, East Asia and North America. There was poor reporting of individuals' attributes: 32 (12 %) datasets reported race or ethnicity; 76 (30 %) reported female/male categories with only one dataset explicitly defining whether these categories represented sex or gender attributes.</div></div><div><h3>Conclusion</h3><div>Through this review, we demonstrate gaps in the data landscape for mammography, highlighting poor representation globally. To ensure datasets in breast imaging have maximum utility for researchers, their characteristics should be documented and limitations of datasets, such as their representativeness of populations and settings, should inform scientific efforts to translate data-driven insights into technologies and discoveries.</div></div>","PeriodicalId":50680,"journal":{"name":"Clinical Imaging","volume":"118 ","pages":"Article 110369"},"PeriodicalIF":1.8000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Imaging","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0899707124002997","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

There are many radiological datasets for breast cancer, some which have supported the development of AI medical devices for breast cancer screening and image classification. This review aims to identify mammography datasets (including digitised screen film mammography, 2D digital mammography and digital breast tomosynthesis) used in the development of AI technologies and present their characteristics, including their transparency of documentation, content, populations included and accessibility.

Materials and methods

MEDLINE and Google Dataset searches identified studies describing AI technology development and referencing breast imaging datasets up to June 2024. The characteristics of each dataset are summarised. In particular, the accompanying documentation was reviewed with a focus on diversity and inclusion of populations represented within each dataset.

Results

254 datasets were referenced in the literature search, 190 were privately held, 36 had barriers which prevented access, and 28 were accessible. Most datasets originated from Europe, East Asia and North America. There was poor reporting of individuals' attributes: 32 (12 %) datasets reported race or ethnicity; 76 (30 %) reported female/male categories with only one dataset explicitly defining whether these categories represented sex or gender attributes.

Conclusion

Through this review, we demonstrate gaps in the data landscape for mammography, highlighting poor representation globally. To ensure datasets in breast imaging have maximum utility for researchers, their characteristics should be documented and limitations of datasets, such as their representativeness of populations and settings, should inform scientific efforts to translate data-driven insights into technologies and discoveries.
人工智能技术开发中乳房x光检查数据集的多样性、包容性和可追溯性:系统综述
目的乳腺癌放射学数据集很多,其中一些数据集支持了用于乳腺癌筛查和图像分类的人工智能医疗设备的开发。本综述旨在确定用于人工智能技术开发的乳房x光检查数据集(包括数字化屏幕胶片乳房x光检查、2D数字乳房x光检查和数字乳房断层合成),并介绍其特征,包括文档的透明度、内容、包括的人群和可及性。材料和方法medline和谷歌数据集搜索确定了描述人工智能技术发展的研究,并参考了截至2024年6月的乳房成像数据集。总结了每个数据集的特征。特别地,对随附的文件进行了审查,重点关注每个数据集中所代表的人口的多样性和包容性。结果共检索到254个数据集,其中私有数据集190个,存在访问障碍数据集36个,可访问数据集28个。大多数数据集来自欧洲、东亚和北美。对个人属性的报告很差:32个(12%)数据集报告了种族或民族;76个(30%)报告了女性/男性类别,只有一个数据集明确定义了这些类别是否代表性别或性别属性。通过这篇综述,我们展示了乳房x光检查数据格局的差距,突出了全球代表性不足。为了确保乳腺成像中的数据集对研究人员有最大的效用,它们的特征应该被记录下来,数据集的局限性,比如它们对人群和环境的代表性,应该为科学工作提供信息,将数据驱动的见解转化为技术和发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Clinical Imaging
Clinical Imaging 医学-核医学
CiteScore
4.60
自引率
0.00%
发文量
265
审稿时长
35 days
期刊介绍: The mission of Clinical Imaging is to publish, in a timely manner, the very best radiology research from the United States and around the world with special attention to the impact of medical imaging on patient care. The journal''s publications cover all imaging modalities, radiology issues related to patients, policy and practice improvements, and clinically-oriented imaging physics and informatics. The journal is a valuable resource for practicing radiologists, radiologists-in-training and other clinicians with an interest in imaging. Papers are carefully peer-reviewed and selected by our experienced subject editors who are leading experts spanning the range of imaging sub-specialties, which include: -Body Imaging- Breast Imaging- Cardiothoracic Imaging- Imaging Physics and Informatics- Molecular Imaging and Nuclear Medicine- Musculoskeletal and Emergency Imaging- Neuroradiology- Practice, Policy & Education- Pediatric Imaging- Vascular and Interventional Radiology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信