Ambroise Wonkam, Nchangwi Syntia Munung, Mario Jonas, Wilson Mupfurirwa, Arthemon Nguweneza, Kevin Esoh, Chandre Oosterwyk-Liu, Zimkita Magangana, Khuthala Mnika, Valentina Ngo Bitoungui, Martha Kamkuemah, Kambe Banda, Nabeelah Samie, Jade Hotchkis, Victoria Nembaware, Andre-Pascal Kengne, Nicola Mulder
{"title":"The Sickle Africa Data Coordinating Centre (SADaCC): a data science hub for interdisciplinary sickle cell disease research and training.","authors":"Ambroise Wonkam, Nchangwi Syntia Munung, Mario Jonas, Wilson Mupfurirwa, Arthemon Nguweneza, Kevin Esoh, Chandre Oosterwyk-Liu, Zimkita Magangana, Khuthala Mnika, Valentina Ngo Bitoungui, Martha Kamkuemah, Kambe Banda, Nabeelah Samie, Jade Hotchkis, Victoria Nembaware, Andre-Pascal Kengne, Nicola Mulder","doi":"10.1093/database/baag007","DOIUrl":null,"url":null,"abstract":"<p><p>Sickle cell disease (SCD) is one of the most prevalent monogenic disorders worldwide, with the highest burden in Africa, where ~75% of the 7.74 million global cases occur. Scientific progress in understanding its epidemiology, clinical heterogeneity, and treatment outcomes has been constrained by heterogeneous, non-standardized, and non-interoperable datasets that limit data integration and cross-country analyses. To address this, the Sickle Africa Data Coordinating Centre (SADaCC) was established as the data science hub of the SickleInAfrica consortium to support the development and expansion of Pan-African SCD registry. SADaCC now coordinates one of the largest patient-consented SCD datasets globally, with data from over 40 000 persons living with SCD in seven countries (Ghana, Mali, Nigeria, Tanzania, Uganda, Zambia, and Zimbabwe) within the Sickle Pan-African Research Consortium (SPARCo), as well as genomic data from SADaCC satellite sites in Cameroon, South Africa, and Malawi. The registry is built on FAIR-compliant architecture, the Sickle Cell Disease Ontology, and powered by a suite of digital platforms such as REDCap, NextCloud, RStudio, GitHub, Docker, and Jupyter. In partnership with SPARCo, SADaCC is also piloting a biobank that will link biospecimens with data in the registry to advance multi-omics research. Beyond infrastructure, SADaCC leads training and/or research in big data analytics, genomics, bioethics, implementation science, qualitative research, and psychosocial studies. Ethical, legal, and social considerations are embedded across all operations with emphasis on equitable intra-African collaboration and patient involvement in research. Looking ahead, SADaCC will integrate real-time data streams, AI-driven analytics, and multi-omics data to drive big data and genetic medicine research for SCD in Africa.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12923167/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baag007","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Sickle cell disease (SCD) is one of the most prevalent monogenic disorders worldwide, with the highest burden in Africa, where ~75% of the 7.74 million global cases occur. Scientific progress in understanding its epidemiology, clinical heterogeneity, and treatment outcomes has been constrained by heterogeneous, non-standardized, and non-interoperable datasets that limit data integration and cross-country analyses. To address this, the Sickle Africa Data Coordinating Centre (SADaCC) was established as the data science hub of the SickleInAfrica consortium to support the development and expansion of Pan-African SCD registry. SADaCC now coordinates one of the largest patient-consented SCD datasets globally, with data from over 40 000 persons living with SCD in seven countries (Ghana, Mali, Nigeria, Tanzania, Uganda, Zambia, and Zimbabwe) within the Sickle Pan-African Research Consortium (SPARCo), as well as genomic data from SADaCC satellite sites in Cameroon, South Africa, and Malawi. The registry is built on FAIR-compliant architecture, the Sickle Cell Disease Ontology, and powered by a suite of digital platforms such as REDCap, NextCloud, RStudio, GitHub, Docker, and Jupyter. In partnership with SPARCo, SADaCC is also piloting a biobank that will link biospecimens with data in the registry to advance multi-omics research. Beyond infrastructure, SADaCC leads training and/or research in big data analytics, genomics, bioethics, implementation science, qualitative research, and psychosocial studies. Ethical, legal, and social considerations are embedded across all operations with emphasis on equitable intra-African collaboration and patient involvement in research. Looking ahead, SADaCC will integrate real-time data streams, AI-driven analytics, and multi-omics data to drive big data and genetic medicine research for SCD in Africa.
期刊介绍:
Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data.
Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.