{"title":"MIDAS: a technology-enabled hub-and-spoke system for the collection and dissemination of high-quality medical datasets in India.","authors":"Dibyajyoti Maity, Rohit Satish, Raghu Dharmaraju, Vijay Chandru, Rajesh Sundaresan, Harpreet Singh, Debnath Pal","doi":"10.1186/s12911-025-03092-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The need for better AI models fuels the demand for larger and larger high-quality datasets with significant diversity. Over the years, many medical imaging datasets have been published globally, but existing datasets do not contain enough samples from the population of the Indian subcontinent, leading to subpar performance of developed AI models when deployed in India. The Medical Imaging and Information Datasets (MIDAS) India initiative was launched to address this by developing standards, protocols, and policies for gathering medical imaging data nationwide.</p><p><strong>Methods: </strong>MIDAS employs a hub-and-spoke system for data collection, where each thematic hub works with a set of spokes to collect data for a specific disease or medical condition from primary, secondary, and tertiary health centers. The data gathering is guided by standard operating procedures developed from the collaborative efforts of the participating medical institutions. The annotation protocols are based on a combination of gold-standard tests and/or agreement between experts to achieve the required labeling accuracy, depending on the data type and the intended purpose of the dataset.</p><p><strong>Results: </strong>The MIDAS platform is accessible at https://midas.iisc.ac.in/ . Two datasets are already available on MIDAS, one for oral cancer and another for dural-based pathologies, for free download. Many others are under development and review. Annotated and curated data are also available under various licenses as shared by the platform partners for the registered users. The datasets use standardized ontologies for annotations at both image and pixel-level regions of interest. The annotations undergo a review process before being published and accessible for download. Standards and guidelines for creating the datasets are evolving due to the complexity of the elements involved. Challenges are steeper, especially for data originating from early or pre-onset stages of diseases, such as dysplasia in oral cancer, where the manifestation of the disease feature(s) is sometimes unclear.</p><p><strong>Conclusion: </strong>MIDAS India aims to catalyze the AI-driven transformation of healthcare by providing high-quality annotated imaging data tailored to local needs. It supports innovation, regulatory assessment, and clinical adoption of AI tools, serving as a scalable model for other countries looking to build similar data infrastructure to enhance digital healthcare delivery.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"252"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12232576/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03092-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The need for better AI models fuels the demand for larger and larger high-quality datasets with significant diversity. Over the years, many medical imaging datasets have been published globally, but existing datasets do not contain enough samples from the population of the Indian subcontinent, leading to subpar performance of developed AI models when deployed in India. The Medical Imaging and Information Datasets (MIDAS) India initiative was launched to address this by developing standards, protocols, and policies for gathering medical imaging data nationwide.
Methods: MIDAS employs a hub-and-spoke system for data collection, where each thematic hub works with a set of spokes to collect data for a specific disease or medical condition from primary, secondary, and tertiary health centers. The data gathering is guided by standard operating procedures developed from the collaborative efforts of the participating medical institutions. The annotation protocols are based on a combination of gold-standard tests and/or agreement between experts to achieve the required labeling accuracy, depending on the data type and the intended purpose of the dataset.
Results: The MIDAS platform is accessible at https://midas.iisc.ac.in/ . Two datasets are already available on MIDAS, one for oral cancer and another for dural-based pathologies, for free download. Many others are under development and review. Annotated and curated data are also available under various licenses as shared by the platform partners for the registered users. The datasets use standardized ontologies for annotations at both image and pixel-level regions of interest. The annotations undergo a review process before being published and accessible for download. Standards and guidelines for creating the datasets are evolving due to the complexity of the elements involved. Challenges are steeper, especially for data originating from early or pre-onset stages of diseases, such as dysplasia in oral cancer, where the manifestation of the disease feature(s) is sometimes unclear.
Conclusion: MIDAS India aims to catalyze the AI-driven transformation of healthcare by providing high-quality annotated imaging data tailored to local needs. It supports innovation, regulatory assessment, and clinical adoption of AI tools, serving as a scalable model for other countries looking to build similar data infrastructure to enhance digital healthcare delivery.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.