Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction.

IF 1.5 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Journal of Integrative Bioinformatics Pub Date : 2024-07-15 eCollection Date: 2024-06-01 DOI:10.1515/jib-2023-0042
Daniel Martins, Maryam Abbasi, Conceição Egas, Joel P Arrais
{"title":"Detecting outliers in case-control cohorts for improving deep learning networks on Schizophrenia prediction.","authors":"Daniel Martins, Maryam Abbasi, Conceição Egas, Joel P Arrais","doi":"10.1515/jib-2023-0042","DOIUrl":null,"url":null,"abstract":"<p><p>This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377398/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Integrative Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jib-2023-0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

This study delves into the intricate genetic and clinical aspects of Schizophrenia, a complex mental disorder with uncertain etiology. Deep Learning (DL) holds promise for analyzing large genomic datasets to uncover new risk factors. However, based on reports of non-negligible misdiagnosis rates for SCZ, case-control cohorts may contain outlying genetic profiles, hindering compelling performances of classification models. The research employed a case-control dataset sourced from the Swedish populace. A gene-annotation-based DL architecture was developed and employed in two stages. First, the model was trained on the entire dataset to highlight differences between cases and controls. Then, samples likely to be misclassified were excluded, and the model was retrained on the refined dataset for performance evaluation. The results indicate that SCZ prevalence and misdiagnosis rates can affect case-control cohorts, potentially compromising future studies reliant on such datasets. However, by detecting and filtering outliers, the study demonstrates the feasibility of adapting DL methodologies to large-scale biological problems, producing results more aligned with existing heritability estimates for SCZ. This approach not only advances the comprehension of the genetic background of SCZ but also opens doors for adapting DL techniques in complex research for precision medicine in mental health.

检测病例对照队列中的异常值,改进深度学习网络对精神分裂症的预测。
精神分裂症是一种病因不确定的复杂精神障碍,本研究深入探讨了精神分裂症错综复杂的遗传和临床问题。深度学习(DL)有望通过分析大型基因组数据集来发现新的风险因素。然而,根据有关精神分裂症不可忽视的误诊率的报道,病例对照队列可能包含离谱的遗传特征,从而阻碍了分类模型令人信服的性能。研究采用的病例对照数据集来自瑞典人群。研究分两个阶段开发并使用了基于基因注释的 DL 架构。首先,对整个数据集进行模型训练,以突出病例与对照之间的差异。然后,排除可能被错误分类的样本,并在改进后的数据集上重新训练模型,以进行性能评估。结果表明,SCZ 的患病率和误诊率会影响病例对照队列,可能会影响未来依赖此类数据集进行的研究。不过,通过检测和过滤异常值,该研究证明了将 DL 方法应用于大规模生物问题的可行性,得出的结果与 SCZ 的现有遗传率估计更加一致。这种方法不仅促进了对 SCZ 遗传背景的理解,还为在复杂研究中应用 DL 技术以实现心理健康的精准医疗打开了大门。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Integrative Bioinformatics
Journal of Integrative Bioinformatics Medicine-Medicine (all)
CiteScore
3.10
自引率
5.30%
发文量
27
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信