Diana Martínez-Minguet, René Noel, Alberto García S, Mireia Costa, Oscar Pastor
{"title":"Review of autism spectrum disorder databases for the identification of candidate genes.","authors":"Diana Martínez-Minguet, René Noel, Alberto García S, Mireia Costa, Oscar Pastor","doi":"10.1093/database/baaf067","DOIUrl":null,"url":null,"abstract":"<p><p>Research into the genetics of autism spectrum disorder (ASD) seeks to unravel its complex genetic background by identifying genes associated with the condition at varying levels of confidence. While these findings hold significant potential for clinical applications, the dispersed nature of scientific evidence presents a challenge for the reliable identification of ASD candidate genes. Although ASD candidate genes are gathered in genetic databases, these vary widely in the gene sets, biological information, and confidence level classification methods, leading to inconsistencies and complicating research efforts. This study aims to identify and assess the quality and reliability of ASD genetic databases to support more robust identification of ASD candidate genes. Using a Systematic Mapping Study, we identified 13 specialized databases. We then followed a Data Quality Approach in two stages, first assessing Accessibility, Currency, and Relevance dimensions to select the potentially relevant databases to be used as ASD candidate gene sources. The selected databases were analysed, assessing Completeness-at schema and data level-, and Consistency between high-confidence ASD genes. The four selected databases are: AutDB, SFARI Gene, GeisingerDBD, and SysNDD. SFARI Gene demonstrated the highest completeness at schema level (89%), while AutDB showed the highest completeness at data level (90%). However, only 1.5% consistency was observed across the four databases in their classification of high-confidence ASD candidate genes. Our findings highlight the unique contributions of each database and reveal substantial inconsistencies in gene classification, driven by differences in scoring criteria and the scientific evidence considered. These inconsistencies have important implications for both clinical users and researchers, as conclusions may vary depending on the database used. This study supports researchers when using ASD genetic databases, promoting consistent interpretation and improved clinical decisions.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527254/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf067","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Research into the genetics of autism spectrum disorder (ASD) seeks to unravel its complex genetic background by identifying genes associated with the condition at varying levels of confidence. While these findings hold significant potential for clinical applications, the dispersed nature of scientific evidence presents a challenge for the reliable identification of ASD candidate genes. Although ASD candidate genes are gathered in genetic databases, these vary widely in the gene sets, biological information, and confidence level classification methods, leading to inconsistencies and complicating research efforts. This study aims to identify and assess the quality and reliability of ASD genetic databases to support more robust identification of ASD candidate genes. Using a Systematic Mapping Study, we identified 13 specialized databases. We then followed a Data Quality Approach in two stages, first assessing Accessibility, Currency, and Relevance dimensions to select the potentially relevant databases to be used as ASD candidate gene sources. The selected databases were analysed, assessing Completeness-at schema and data level-, and Consistency between high-confidence ASD genes. The four selected databases are: AutDB, SFARI Gene, GeisingerDBD, and SysNDD. SFARI Gene demonstrated the highest completeness at schema level (89%), while AutDB showed the highest completeness at data level (90%). However, only 1.5% consistency was observed across the four databases in their classification of high-confidence ASD candidate genes. Our findings highlight the unique contributions of each database and reveal substantial inconsistencies in gene classification, driven by differences in scoring criteria and the scientific evidence considered. These inconsistencies have important implications for both clinical users and researchers, as conclusions may vary depending on the database used. This study supports researchers when using ASD genetic databases, promoting consistent interpretation and improved clinical decisions.
期刊介绍:
Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data.
Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.