MIDAS: a technology-enabled hub-and-spoke system for the collection and dissemination of high-quality medical datasets in India.

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Dibyajyoti Maity, Rohit Satish, Raghu Dharmaraju, Vijay Chandru, Rajesh Sundaresan, Harpreet Singh, Debnath Pal
{"title":"MIDAS: a technology-enabled hub-and-spoke system for the collection and dissemination of high-quality medical datasets in India.","authors":"Dibyajyoti Maity, Rohit Satish, Raghu Dharmaraju, Vijay Chandru, Rajesh Sundaresan, Harpreet Singh, Debnath Pal","doi":"10.1186/s12911-025-03092-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The need for better AI models fuels the demand for larger and larger high-quality datasets with significant diversity. Over the years, many medical imaging datasets have been published globally, but existing datasets do not contain enough samples from the population of the Indian subcontinent, leading to subpar performance of developed AI models when deployed in India. The Medical Imaging and Information Datasets (MIDAS) India initiative was launched to address this by developing standards, protocols, and policies for gathering medical imaging data nationwide.</p><p><strong>Methods: </strong>MIDAS employs a hub-and-spoke system for data collection, where each thematic hub works with a set of spokes to collect data for a specific disease or medical condition from primary, secondary, and tertiary health centers. The data gathering is guided by standard operating procedures developed from the collaborative efforts of the participating medical institutions. The annotation protocols are based on a combination of gold-standard tests and/or agreement between experts to achieve the required labeling accuracy, depending on the data type and the intended purpose of the dataset.</p><p><strong>Results: </strong>The MIDAS platform is accessible at https://midas.iisc.ac.in/ . Two datasets are already available on MIDAS, one for oral cancer and another for dural-based pathologies, for free download. Many others are under development and review. Annotated and curated data are also available under various licenses as shared by the platform partners for the registered users. The datasets use standardized ontologies for annotations at both image and pixel-level regions of interest. The annotations undergo a review process before being published and accessible for download. Standards and guidelines for creating the datasets are evolving due to the complexity of the elements involved. Challenges are steeper, especially for data originating from early or pre-onset stages of diseases, such as dysplasia in oral cancer, where the manifestation of the disease feature(s) is sometimes unclear.</p><p><strong>Conclusion: </strong>MIDAS India aims to catalyze the AI-driven transformation of healthcare by providing high-quality annotated imaging data tailored to local needs. It supports innovation, regulatory assessment, and clinical adoption of AI tools, serving as a scalable model for other countries looking to build similar data infrastructure to enhance digital healthcare delivery.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"252"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232576/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03092-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The need for better AI models fuels the demand for larger and larger high-quality datasets with significant diversity. Over the years, many medical imaging datasets have been published globally, but existing datasets do not contain enough samples from the population of the Indian subcontinent, leading to subpar performance of developed AI models when deployed in India. The Medical Imaging and Information Datasets (MIDAS) India initiative was launched to address this by developing standards, protocols, and policies for gathering medical imaging data nationwide.

Methods: MIDAS employs a hub-and-spoke system for data collection, where each thematic hub works with a set of spokes to collect data for a specific disease or medical condition from primary, secondary, and tertiary health centers. The data gathering is guided by standard operating procedures developed from the collaborative efforts of the participating medical institutions. The annotation protocols are based on a combination of gold-standard tests and/or agreement between experts to achieve the required labeling accuracy, depending on the data type and the intended purpose of the dataset.

Results: The MIDAS platform is accessible at https://midas.iisc.ac.in/ . Two datasets are already available on MIDAS, one for oral cancer and another for dural-based pathologies, for free download. Many others are under development and review. Annotated and curated data are also available under various licenses as shared by the platform partners for the registered users. The datasets use standardized ontologies for annotations at both image and pixel-level regions of interest. The annotations undergo a review process before being published and accessible for download. Standards and guidelines for creating the datasets are evolving due to the complexity of the elements involved. Challenges are steeper, especially for data originating from early or pre-onset stages of diseases, such as dysplasia in oral cancer, where the manifestation of the disease feature(s) is sometimes unclear.

Conclusion: MIDAS India aims to catalyze the AI-driven transformation of healthcare by providing high-quality annotated imaging data tailored to local needs. It supports innovation, regulatory assessment, and clinical adoption of AI tools, serving as a scalable model for other countries looking to build similar data infrastructure to enhance digital healthcare delivery.

MIDAS:印度用于收集和传播高质量医疗数据集的技术支持的轮辐系统。
背景:对更好的人工智能模型的需求推动了对越来越大的、具有显著多样性的高质量数据集的需求。多年来,全球已经发布了许多医学成像数据集,但现有数据集不包含来自印度次大陆人口的足够样本,导致开发的人工智能模型在印度部署时表现不佳。印度医学成像和信息数据集(MIDAS)倡议旨在通过制定在全国范围内收集医学成像数据的标准、协议和政策来解决这一问题。方法:MIDAS采用轮辐系统进行数据收集,其中每个专题中心与一组轮辐一起工作,从初级、二级和三级卫生中心收集特定疾病或医疗状况的数据。数据收集工作以参与医疗机构合作制定的标准作业程序为指导。注释协议基于金标准测试和/或专家之间的协议的组合,以实现所需的标注准确性,具体取决于数据类型和数据集的预期目的。结果:MIDAS平台可访问https://midas.iisc.ac.in/。MIDAS已经提供了两个数据集,一个用于口腔癌,另一个用于硬脑膜病理,可供免费下载。许多其他的正在开发和审查中。在平台合作伙伴为注册用户共享的各种许可下,也可以使用注释和管理数据。数据集在图像和像素级感兴趣的区域使用标准化的本体进行注释。注释在发布和下载之前要经过一个审查过程。由于所涉及的元素的复杂性,创建数据集的标准和指南正在不断发展。挑战更大,特别是对于来自疾病早期或发病前阶段的数据,例如口腔癌的发育不良,其中疾病特征的表现有时不清楚。结论:MIDAS India旨在通过提供适合当地需求的高质量注释成像数据,促进人工智能驱动的医疗保健转型。它支持人工智能工具的创新、监管评估和临床应用,为其他希望建立类似数据基础设施以加强数字医疗保健服务的国家提供可扩展的模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信