Wei Jing Fong, Hong Ming Tan, Rishabh Garg, Ai Ling Teh, Hong Pan, Varsha Gupta, Bernadus Krishna, Zou Hui Chen, Natania Yovela Purwanto, Fabian Yap, Kok Hian Tan, Kok Yen Jerry Chan, Shiao-Yng Chan, Nicole Goh, Nikita Rane, Ethel Siew Ee Tan, Yuheng Jiang, Mei Han, Michael Meaney, Dennis Wang, Jussi Keppo, Geoffrey Chern-Yee Tan
{"title":"比较从遗传变异中预测 CYP2D6 甲基化的特征选择和机器学习方法","authors":"Wei Jing Fong, Hong Ming Tan, Rishabh Garg, Ai Ling Teh, Hong Pan, Varsha Gupta, Bernadus Krishna, Zou Hui Chen, Natania Yovela Purwanto, Fabian Yap, Kok Hian Tan, Kok Yen Jerry Chan, Shiao-Yng Chan, Nicole Goh, Nikita Rane, Ethel Siew Ee Tan, Yuheng Jiang, Mei Han, Michael Meaney, Dennis Wang, Jussi Keppo, Geoffrey Chern-Yee Tan","doi":"10.3389/fninf.2023.1244336","DOIUrl":null,"url":null,"abstract":"<sec><title>Introduction</title><p>Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (<italic>CYP2D6</italic>) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to <italic>CYP2D6</italic> in children from the GUSTO cohort.</p></sec><sec><title>Methods</title><p>Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with <italic>CYP2D6</italic> were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the <italic>CYP2D6</italic> gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models’ performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites.</p></sec><sec><title>Results</title><p>Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model.</p></sec><sec><title>Discussion</title><p>The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.</p></sec>","PeriodicalId":12462,"journal":{"name":"Frontiers in Neuroinformatics","volume":"10 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation\",\"authors\":\"Wei Jing Fong, Hong Ming Tan, Rishabh Garg, Ai Ling Teh, Hong Pan, Varsha Gupta, Bernadus Krishna, Zou Hui Chen, Natania Yovela Purwanto, Fabian Yap, Kok Hian Tan, Kok Yen Jerry Chan, Shiao-Yng Chan, Nicole Goh, Nikita Rane, Ethel Siew Ee Tan, Yuheng Jiang, Mei Han, Michael Meaney, Dennis Wang, Jussi Keppo, Geoffrey Chern-Yee Tan\",\"doi\":\"10.3389/fninf.2023.1244336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<sec><title>Introduction</title><p>Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (<italic>CYP2D6</italic>) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to <italic>CYP2D6</italic> in children from the GUSTO cohort.</p></sec><sec><title>Methods</title><p>Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with <italic>CYP2D6</italic> were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the <italic>CYP2D6</italic> gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models’ performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites.</p></sec><sec><title>Results</title><p>Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model.</p></sec><sec><title>Discussion</title><p>The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.</p></sec>\",\"PeriodicalId\":12462,\"journal\":{\"name\":\"Frontiers in Neuroinformatics\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Neuroinformatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fninf.2023.1244336\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroinformatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fninf.2023.1244336","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation
Introduction
Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort.
Methods
Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models’ performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites.
Results
Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model.
Discussion
The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.
期刊介绍:
Frontiers in Neuroinformatics publishes rigorously peer-reviewed research on the development and implementation of numerical/computational models and analytical tools used to share, integrate and analyze experimental data and advance theories of the nervous system functions. Specialty Chief Editors Jan G. Bjaalie at the University of Oslo and Sean L. Hill at the École Polytechnique Fédérale de Lausanne are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics and the public worldwide.
Neuroscience is being propelled into the information age as the volume of information explodes, demanding organization and synthesis. Novel synthesis approaches are opening up a new dimension for the exploration of the components of brain elements and systems and the vast number of variables that underlie their functions. Neural data is highly heterogeneous with complex inter-relations across multiple levels, driving the need for innovative organizing and synthesizing approaches from genes to cognition, and covering a range of species and disease states.
Frontiers in Neuroinformatics therefore welcomes submissions on existing neuroscience databases, development of data and knowledge bases for all levels of neuroscience, applications and technologies that can facilitate data sharing (interoperability, formats, terminologies, and ontologies), and novel tools for data acquisition, analyses, visualization, and dissemination of nervous system data. Our journal welcomes submissions on new tools (software and hardware) that support brain modeling, and the merging of neuroscience databases with brain models used for simulation and visualization.