使用奈夫贝叶斯算法对阿尔茨海默病基因关联进行分类

IF 0.5 Q4 GENETICS & HEREDITY

Human Gene Pub Date : 2024-06-19 DOI:10.1016/j.humgen.2024.201309

Sushrutha Raj , Anchal Vishnoi , Alok Srivastava

{"title":"使用奈夫贝叶斯算法对阿尔茨海默病基因关联进行分类","authors":"Sushrutha Raj , Anchal Vishnoi , Alok Srivastava","doi":"10.1016/j.humgen.2024.201309","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Alzheimer's disease, the most common form of dementia, accounts for 60–80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights.</p></div><div><h3>Methods</h3><p>The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier.</p></div><div><h3>Results</h3><p>With an average accuracy of 87.33% and confidence level of 90.10% +/− 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease.</p></div><div><h3>Conclusions</h3><p>The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.</p></div>","PeriodicalId":29686,"journal":{"name":"Human Gene","volume":"41 ","pages":"Article 201309"},"PeriodicalIF":0.5000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classify Alzheimer genes association using Naïve Bayes algorithm\",\"authors\":\"Sushrutha Raj , Anchal Vishnoi , Alok Srivastava\",\"doi\":\"10.1016/j.humgen.2024.201309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Alzheimer's disease, the most common form of dementia, accounts for 60–80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights.</p></div><div><h3>Methods</h3><p>The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier.</p></div><div><h3>Results</h3><p>With an average accuracy of 87.33% and confidence level of 90.10% +/− 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease.</p></div><div><h3>Conclusions</h3><p>The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.</p></div>\",\"PeriodicalId\":29686,\"journal\":{\"name\":\"Human Gene\",\"volume\":\"41 \",\"pages\":\"Article 201309\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Human Gene\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773044124000536\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Gene","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773044124000536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

背景阿尔茨海默病是最常见的痴呆症，占痴呆症病例的 60-80%，随着老龄化人口的增长，其发病率预计还会增加。到 2050 年，全球阿尔茨海默氏症和痴呆症患者人数预计将达到 1.52 亿。遗传在其中扮演着重要角色，约占总体风险的 70%，这凸显了了解遗传基础对开发针对性干预措施的重要性。本研究介绍了一种结合文本挖掘和机器学习技术的系统，该系统可识别阿尔茨海默氏症的前瞻性候选基因并确定其优先次序，还可根据权重将其划分为三个关联类别。方法基于机器学习的分类器是通过精心策划的金标准数据集进行训练的，然后利用 10 倍交叉验证方法进行了严格验证，证明了其在所有数据折叠中的一致性。这个开发出来的集合学习系统将 PubMed 摘要分为三个不同的组：是、否和模糊。结果平均准确率为 87.33%，置信度为 90.10% +/- 0.142，该方案有效提取了 2031 个相关基因，其中 1162 个、489 个和 1439 个分别属于阳性、阴性和模糊类别，阈值为 0.9。与已建立的疾病基因数据库相比，我们的系统发现了 915 个以前未曾报道过的阳性基因。我们可以利用这些阳性基因深入了解阿尔茨海默病，并利用模糊基因进一步探索它们与阿尔茨海默病的关联。结论该系统能够生成准确的预测结果，这证明了它的稳健性，并为人们深入了解阿尔茨海默病的遗传因素提供了宝贵的资料。因此，本研究为现有知识做出了贡献，并为该领域的未来研究铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classify Alzheimer genes association using Naïve Bayes algorithm

Background

Alzheimer's disease, the most common form of dementia, accounts for 60–80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights.

Methods

The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier.

Results

With an average accuracy of 87.33% and confidence level of 90.10% +/− 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease.

Conclusions

The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Human Gene Biochemistry, Genetics and Molecular Biology (General), Genetics

CiteScore

1.60

自引率

0.00%

发文量

审稿时长

54 days