{"title":"利用深度学习进行基因组学中基于snp的疾病预测。","authors":"Colten Alme, Harun Pirim, M Mishkatur Rahman","doi":"10.1007/s41870-025-02624-8","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigates the use of deep learning models to predict disease status from single nucleotide polymorphism (SNP) data. Eight GEO datasets were processed using a consistent pipeline involving genotype encoding, data cleaning, and multiple feature selection strategies. A variety of DL architectures-including feedforward networks, autoencoders, CNNs, and RNNs-were trained and evaluated. The novelty of this work lies in the standardized preprocessing, feature selection, and model training pipeline applied across all datasets, allowing for a direct and fair comparison of model performance. Results consistently showed that feedforward networks and autoencoders performed best across most datasets. This work offers a practical approach to applying deep learning in genomics with potential for future extensions.</p>","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356146/pdf/","citationCount":"0","resultStr":"{\"title\":\"Harnessing deep learning for SNP-based disease prediction in genomics.\",\"authors\":\"Colten Alme, Harun Pirim, M Mishkatur Rahman\",\"doi\":\"10.1007/s41870-025-02624-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study investigates the use of deep learning models to predict disease status from single nucleotide polymorphism (SNP) data. Eight GEO datasets were processed using a consistent pipeline involving genotype encoding, data cleaning, and multiple feature selection strategies. A variety of DL architectures-including feedforward networks, autoencoders, CNNs, and RNNs-were trained and evaluated. The novelty of this work lies in the standardized preprocessing, feature selection, and model training pipeline applied across all datasets, allowing for a direct and fair comparison of model performance. Results consistently showed that feedforward networks and autoencoders performed best across most datasets. This work offers a practical approach to applying deep learning in genomics with potential for future extensions.</p>\",\"PeriodicalId\":73455,\"journal\":{\"name\":\"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356146/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41870-025-02624-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-025-02624-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Harnessing deep learning for SNP-based disease prediction in genomics.
This study investigates the use of deep learning models to predict disease status from single nucleotide polymorphism (SNP) data. Eight GEO datasets were processed using a consistent pipeline involving genotype encoding, data cleaning, and multiple feature selection strategies. A variety of DL architectures-including feedforward networks, autoencoders, CNNs, and RNNs-were trained and evaluated. The novelty of this work lies in the standardized preprocessing, feature selection, and model training pipeline applied across all datasets, allowing for a direct and fair comparison of model performance. Results consistently showed that feedforward networks and autoencoders performed best across most datasets. This work offers a practical approach to applying deep learning in genomics with potential for future extensions.