Review of autism spectrum disorder databases for the identification of candidate genes.

IF 3.6 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation Pub Date : 2025-01-18 DOI:10.1093/database/baaf067

Diana Martínez-Minguet, René Noel, Alberto García S, Mireia Costa, Oscar Pastor

{"title":"Review of autism spectrum disorder databases for the identification of candidate genes.","authors":"Diana Martínez-Minguet, René Noel, Alberto García S, Mireia Costa, Oscar Pastor","doi":"10.1093/database/baaf067","DOIUrl":null,"url":null,"abstract":"<p><p>Research into the genetics of autism spectrum disorder (ASD) seeks to unravel its complex genetic background by identifying genes associated with the condition at varying levels of confidence. While these findings hold significant potential for clinical applications, the dispersed nature of scientific evidence presents a challenge for the reliable identification of ASD candidate genes. Although ASD candidate genes are gathered in genetic databases, these vary widely in the gene sets, biological information, and confidence level classification methods, leading to inconsistencies and complicating research efforts. This study aims to identify and assess the quality and reliability of ASD genetic databases to support more robust identification of ASD candidate genes. Using a Systematic Mapping Study, we identified 13 specialized databases. We then followed a Data Quality Approach in two stages, first assessing Accessibility, Currency, and Relevance dimensions to select the potentially relevant databases to be used as ASD candidate gene sources. The selected databases were analysed, assessing Completeness-at schema and data level-, and Consistency between high-confidence ASD genes. The four selected databases are: AutDB, SFARI Gene, GeisingerDBD, and SysNDD. SFARI Gene demonstrated the highest completeness at schema level (89%), while AutDB showed the highest completeness at data level (90%). However, only 1.5% consistency was observed across the four databases in their classification of high-confidence ASD candidate genes. Our findings highlight the unique contributions of each database and reveal substantial inconsistencies in gene classification, driven by differences in scoring criteria and the scientific evidence considered. These inconsistencies have important implications for both clinical users and researchers, as conclusions may vary depending on the database used. This study supports researchers when using ASD genetic databases, promoting consistent interpretation and improved clinical decisions.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527254/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database: The Journal of Biological Databases and Curation","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/database/baaf067","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Research into the genetics of autism spectrum disorder (ASD) seeks to unravel its complex genetic background by identifying genes associated with the condition at varying levels of confidence. While these findings hold significant potential for clinical applications, the dispersed nature of scientific evidence presents a challenge for the reliable identification of ASD candidate genes. Although ASD candidate genes are gathered in genetic databases, these vary widely in the gene sets, biological information, and confidence level classification methods, leading to inconsistencies and complicating research efforts. This study aims to identify and assess the quality and reliability of ASD genetic databases to support more robust identification of ASD candidate genes. Using a Systematic Mapping Study, we identified 13 specialized databases. We then followed a Data Quality Approach in two stages, first assessing Accessibility, Currency, and Relevance dimensions to select the potentially relevant databases to be used as ASD candidate gene sources. The selected databases were analysed, assessing Completeness-at schema and data level-, and Consistency between high-confidence ASD genes. The four selected databases are: AutDB, SFARI Gene, GeisingerDBD, and SysNDD. SFARI Gene demonstrated the highest completeness at schema level (89%), while AutDB showed the highest completeness at data level (90%). However, only 1.5% consistency was observed across the four databases in their classification of high-confidence ASD candidate genes. Our findings highlight the unique contributions of each database and reveal substantial inconsistencies in gene classification, driven by differences in scoring criteria and the scientific evidence considered. These inconsistencies have important implications for both clinical users and researchers, as conclusions may vary depending on the database used. This study supports researchers when using ASD genetic databases, promoting consistent interpretation and improved clinical decisions.

查看原文本刊更多论文

筛选候选基因的自闭症谱系障碍数据库综述。

对自闭症谱系障碍（ASD）的遗传学研究试图通过在不同程度上确定与该疾病相关的基因来揭示其复杂的遗传背景。虽然这些发现具有重要的临床应用潜力，但科学证据的分散性为可靠地鉴定ASD候选基因提出了挑战。虽然ASD候选基因收集在遗传数据库中，但这些候选基因在基因集、生物信息和置信度分类方法上差异很大，导致不一致，使研究工作复杂化。本研究旨在鉴定和评估ASD遗传数据库的质量和可靠性，以支持更可靠的ASD候选基因鉴定。通过系统测绘研究，我们确定了13个专门的数据库。然后，我们分两个阶段采用数据质量方法，首先评估可访问性、流通性和相关性维度，以选择可能相关的数据库作为ASD候选基因来源。对选定的数据库进行分析，评估模式和数据级别的完整性以及高可信度ASD基因之间的一致性。选择的四个数据库是：AutDB、SFARI Gene、GeisingerDBD和SysNDD。SFARI Gene在图式水平上的完备性最高（89%），而AutDB在数据水平上的完备性最高（90%）。然而，四个数据库在高置信度ASD候选基因分类中仅观察到1.5%的一致性。我们的研究结果突出了每个数据库的独特贡献，并揭示了基因分类的实质性不一致，这是由评分标准和科学证据的差异所驱动的。这些不一致对临床用户和研究人员都有重要意义，因为结论可能因所使用的数据库而异。本研究支持研究人员使用ASD遗传数据库，促进一致的解释和改进的临床决策。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Database: The Journal of Biological Databases and Curation MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

9.00

自引率

3.40%

发文量

100

审稿时长

>12 weeks

期刊介绍： Huge volumes of primary data are archived in numerous open-access databases, and with new generation technologies becoming more common in laboratories, large datasets will become even more prevalent. The archiving, curation, analysis and interpretation of all of these data are a challenge. Database development and biocuration are at the forefront of the endeavor to make sense of this mounting deluge of data. Database: The Journal of Biological Databases and Curation provides an open access platform for the presentation of novel ideas in database research and biocuration, and aims to help strengthen the bridge between database developers, curators, and users.