Yibei Chen, Dorota Jarecka, Sanu Ann Abraham, Remi Gau, Evan Ng, Daniel M Low, Isaac Bevers, Alistair Johnson, Anisha Keshavan, Arno Klein, Jon Clucas, Zaliqa Rosli, Steven M Hodge, Janosch Linkersdörfer, Hauke Bartsch, Samir Das, Damien Fair, David Kennedy, Satrajit S Ghosh
{"title":"标准化调查数据收集提高再现性:再现模式生态系统的发展与比较评价。","authors":"Yibei Chen, Dorota Jarecka, Sanu Ann Abraham, Remi Gau, Evan Ng, Daniel M Low, Isaac Bevers, Alistair Johnson, Anisha Keshavan, Arno Klein, Jon Clucas, Zaliqa Rosli, Steven M Hodge, Janosch Linkersdörfer, Hauke Bartsch, Samir Das, Damien Fair, David Kennedy, Satrajit S Ghosh","doi":"10.2196/63343","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Inconsistencies in survey-based (eg, questionnaire) data collection across biomedical, clinical, behavioral, and social sciences pose challenges to research reproducibility. ReproSchema is an ecosystem that standardizes survey design and facilitates reproducible data collection through a schema-centric framework, a library of reusable assessments, and computational tools for validation and conversion. Unlike conventional survey platforms that primarily offer graphical user interface-based survey creation, ReproSchema provides a structured, modular approach for defining and managing survey components, enabling interoperability and adaptability across diverse research settings.</p><p><strong>Objective: </strong>This study examines ReproSchema's role in enhancing research reproducibility and reliability. We introduce its conceptual and practical foundations, compare it against 12 platforms to assess its effectiveness in addressing inconsistencies in data collection, and demonstrate its application through 3 use cases: standardizing required mental health survey common data elements, tracking changes in longitudinal data collection, and creating interactive checklists for neuroimaging research.</p><p><strong>Methods: </strong>We describe ReproSchema's core components, including its schema-based design; reusable assessment library with >90 assessments; and tools to validate data, convert survey formats (eg, REDCap [Research Electronic Data Capture] and Fast Healthcare Interoperability Resources), and build protocols. We compared 12 platforms-Center for Expanded Data Annotation and Retrieval, formr, KoboToolbox, Longitudinal Online Research and Imaging System, MindLogger, OpenClinica, Pavlovia, PsyToolkit, Qualtrics, REDCap, SurveyCTO, and SurveyMonkey-against 14 findability, accessibility, interoperability, and reusability (FAIR) principles and assessed their support of 8 survey functionalities (eg, multilingual support and automated scoring). Finally, we applied ReproSchema to 3 use cases-NIMH-Minimal, the Adolescent Brain Cognitive Development and HEALthy Brain and Child Development Studies, and the Committee on Best Practices in Data Analysis and Sharing Checklist-to illustrate ReproSchema's versatility.</p><p><strong>Results: </strong>ReproSchema provides a structured framework for standardizing survey-based data collection while ensuring compatibility with existing survey tools. Our comparison results showed that ReproSchema met 14 of 14 FAIR criteria and supported 6 of 8 key survey functionalities: provision of standardized assessments, multilingual support, multimedia integration, data validation, advanced branching logic, and automated scoring. Three use cases illustrating ReproSchema's flexibility include standardizing essential mental health assessments (NIMH-Minimal), systematically tracking changes in longitudinal studies (Adolescent Brain Cognitive Development and HEALthy Brain and Child Development), and converting a 71-page neuroimaging best practices guide into an interactive checklist (Committee on Best Practices in Data Analysis and Sharing).</p><p><strong>Conclusions: </strong>ReproSchema enhances reproducibility by structuring survey-based data collection through a structured, schema-driven approach. It integrates version control, manages metadata, and ensures interoperability, maintaining consistency across studies and compatibility with common survey tools. Planned developments, including ontology mappings and semantic search, will broaden its use, supporting transparent, scalable, and reproducible research across disciplines.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"27 ","pages":"e63343"},"PeriodicalIF":5.8000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem.\",\"authors\":\"Yibei Chen, Dorota Jarecka, Sanu Ann Abraham, Remi Gau, Evan Ng, Daniel M Low, Isaac Bevers, Alistair Johnson, Anisha Keshavan, Arno Klein, Jon Clucas, Zaliqa Rosli, Steven M Hodge, Janosch Linkersdörfer, Hauke Bartsch, Samir Das, Damien Fair, David Kennedy, Satrajit S Ghosh\",\"doi\":\"10.2196/63343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Inconsistencies in survey-based (eg, questionnaire) data collection across biomedical, clinical, behavioral, and social sciences pose challenges to research reproducibility. ReproSchema is an ecosystem that standardizes survey design and facilitates reproducible data collection through a schema-centric framework, a library of reusable assessments, and computational tools for validation and conversion. Unlike conventional survey platforms that primarily offer graphical user interface-based survey creation, ReproSchema provides a structured, modular approach for defining and managing survey components, enabling interoperability and adaptability across diverse research settings.</p><p><strong>Objective: </strong>This study examines ReproSchema's role in enhancing research reproducibility and reliability. We introduce its conceptual and practical foundations, compare it against 12 platforms to assess its effectiveness in addressing inconsistencies in data collection, and demonstrate its application through 3 use cases: standardizing required mental health survey common data elements, tracking changes in longitudinal data collection, and creating interactive checklists for neuroimaging research.</p><p><strong>Methods: </strong>We describe ReproSchema's core components, including its schema-based design; reusable assessment library with >90 assessments; and tools to validate data, convert survey formats (eg, REDCap [Research Electronic Data Capture] and Fast Healthcare Interoperability Resources), and build protocols. We compared 12 platforms-Center for Expanded Data Annotation and Retrieval, formr, KoboToolbox, Longitudinal Online Research and Imaging System, MindLogger, OpenClinica, Pavlovia, PsyToolkit, Qualtrics, REDCap, SurveyCTO, and SurveyMonkey-against 14 findability, accessibility, interoperability, and reusability (FAIR) principles and assessed their support of 8 survey functionalities (eg, multilingual support and automated scoring). Finally, we applied ReproSchema to 3 use cases-NIMH-Minimal, the Adolescent Brain Cognitive Development and HEALthy Brain and Child Development Studies, and the Committee on Best Practices in Data Analysis and Sharing Checklist-to illustrate ReproSchema's versatility.</p><p><strong>Results: </strong>ReproSchema provides a structured framework for standardizing survey-based data collection while ensuring compatibility with existing survey tools. Our comparison results showed that ReproSchema met 14 of 14 FAIR criteria and supported 6 of 8 key survey functionalities: provision of standardized assessments, multilingual support, multimedia integration, data validation, advanced branching logic, and automated scoring. Three use cases illustrating ReproSchema's flexibility include standardizing essential mental health assessments (NIMH-Minimal), systematically tracking changes in longitudinal studies (Adolescent Brain Cognitive Development and HEALthy Brain and Child Development), and converting a 71-page neuroimaging best practices guide into an interactive checklist (Committee on Best Practices in Data Analysis and Sharing).</p><p><strong>Conclusions: </strong>ReproSchema enhances reproducibility by structuring survey-based data collection through a structured, schema-driven approach. It integrates version control, manages metadata, and ensures interoperability, maintaining consistency across studies and compatibility with common survey tools. Planned developments, including ontology mappings and semantic search, will broaden its use, supporting transparent, scalable, and reproducible research across disciplines.</p>\",\"PeriodicalId\":16337,\"journal\":{\"name\":\"Journal of Medical Internet Research\",\"volume\":\"27 \",\"pages\":\"e63343\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Internet Research\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/63343\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/63343","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
Standardizing Survey Data Collection to Enhance Reproducibility: Development and Comparative Evaluation of the ReproSchema Ecosystem.
Background: Inconsistencies in survey-based (eg, questionnaire) data collection across biomedical, clinical, behavioral, and social sciences pose challenges to research reproducibility. ReproSchema is an ecosystem that standardizes survey design and facilitates reproducible data collection through a schema-centric framework, a library of reusable assessments, and computational tools for validation and conversion. Unlike conventional survey platforms that primarily offer graphical user interface-based survey creation, ReproSchema provides a structured, modular approach for defining and managing survey components, enabling interoperability and adaptability across diverse research settings.
Objective: This study examines ReproSchema's role in enhancing research reproducibility and reliability. We introduce its conceptual and practical foundations, compare it against 12 platforms to assess its effectiveness in addressing inconsistencies in data collection, and demonstrate its application through 3 use cases: standardizing required mental health survey common data elements, tracking changes in longitudinal data collection, and creating interactive checklists for neuroimaging research.
Methods: We describe ReproSchema's core components, including its schema-based design; reusable assessment library with >90 assessments; and tools to validate data, convert survey formats (eg, REDCap [Research Electronic Data Capture] and Fast Healthcare Interoperability Resources), and build protocols. We compared 12 platforms-Center for Expanded Data Annotation and Retrieval, formr, KoboToolbox, Longitudinal Online Research and Imaging System, MindLogger, OpenClinica, Pavlovia, PsyToolkit, Qualtrics, REDCap, SurveyCTO, and SurveyMonkey-against 14 findability, accessibility, interoperability, and reusability (FAIR) principles and assessed their support of 8 survey functionalities (eg, multilingual support and automated scoring). Finally, we applied ReproSchema to 3 use cases-NIMH-Minimal, the Adolescent Brain Cognitive Development and HEALthy Brain and Child Development Studies, and the Committee on Best Practices in Data Analysis and Sharing Checklist-to illustrate ReproSchema's versatility.
Results: ReproSchema provides a structured framework for standardizing survey-based data collection while ensuring compatibility with existing survey tools. Our comparison results showed that ReproSchema met 14 of 14 FAIR criteria and supported 6 of 8 key survey functionalities: provision of standardized assessments, multilingual support, multimedia integration, data validation, advanced branching logic, and automated scoring. Three use cases illustrating ReproSchema's flexibility include standardizing essential mental health assessments (NIMH-Minimal), systematically tracking changes in longitudinal studies (Adolescent Brain Cognitive Development and HEALthy Brain and Child Development), and converting a 71-page neuroimaging best practices guide into an interactive checklist (Committee on Best Practices in Data Analysis and Sharing).
Conclusions: ReproSchema enhances reproducibility by structuring survey-based data collection through a structured, schema-driven approach. It integrates version control, manages metadata, and ensures interoperability, maintaining consistency across studies and compatibility with common survey tools. Planned developments, including ontology mappings and semantic search, will broaden its use, supporting transparent, scalable, and reproducible research across disciplines.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.