{"title":"使用奈夫贝叶斯算法对阿尔茨海默病基因关联进行分类","authors":"Sushrutha Raj , Anchal Vishnoi , Alok Srivastava","doi":"10.1016/j.humgen.2024.201309","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Alzheimer's disease, the most common form of dementia, accounts for 60–80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights.</p></div><div><h3>Methods</h3><p>The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier.</p></div><div><h3>Results</h3><p>With an average accuracy of 87.33% and confidence level of 90.10% +/− 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease.</p></div><div><h3>Conclusions</h3><p>The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.</p></div>","PeriodicalId":29686,"journal":{"name":"Human Gene","volume":"41 ","pages":"Article 201309"},"PeriodicalIF":0.5000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Classify Alzheimer genes association using Naïve Bayes algorithm\",\"authors\":\"Sushrutha Raj , Anchal Vishnoi , Alok Srivastava\",\"doi\":\"10.1016/j.humgen.2024.201309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Alzheimer's disease, the most common form of dementia, accounts for 60–80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights.</p></div><div><h3>Methods</h3><p>The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier.</p></div><div><h3>Results</h3><p>With an average accuracy of 87.33% and confidence level of 90.10% +/− 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease.</p></div><div><h3>Conclusions</h3><p>The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.</p></div>\",\"PeriodicalId\":29686,\"journal\":{\"name\":\"Human Gene\",\"volume\":\"41 \",\"pages\":\"Article 201309\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Human Gene\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773044124000536\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Gene","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773044124000536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Classify Alzheimer genes association using Naïve Bayes algorithm
Background
Alzheimer's disease, the most common form of dementia, accounts for 60–80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights.
Methods
The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier.
Results
With an average accuracy of 87.33% and confidence level of 90.10% +/− 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease.
Conclusions
The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.