{"title":"药物-靶标相互作用预测的集成学习算法","authors":"Sudipta Pathak, Xingyu Cai","doi":"10.1109/ICCABS.2017.8114292","DOIUrl":null,"url":null,"abstract":"Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production and making drugs more affordable to its consumers. We proposed a new algorithm to combine multiple heterogeneous information for identification of new interactions between the drugs and targets. The algorithm proposed in this paper employs the stacking based approach namely KronRLS-Stacking, to combine models in a linear (or non-linear way), to address the drug-target interaction prediction problem. Our Algorithm is developed on top of RLS and KronRLS algorithms. The novelty of our approach is in combining heterogeneous sources of information using ensemble method called Stacking. Also, our algorithm is embarrassingly parallel and easy to distribute over multiple computing nodes. We compared our results with seventeen other algorithms. Like the other algorithms, we use Area Under Precision Recall (AUPR) curve as a measurement of goodness. We compared our results on Nuclear Receptor(NR), GPCR, Ion Channel(IC) and Enzyme(E) datasets respectively. KronRLS-Stacking obtained highest AUPR in NR, GPCR and IC datasets. In the experiments, we take average over five runs for all the datasets. For each run we performed a 5-fold cross validation. We chose the top 10 best performing kernels on the validation set to generate all results for testing datasets. Even though KronRLS-Stacking offers slightly worse standard deviation, our lowest AUPR score is still better than the best performing algorithms we compared with.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"33 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Ensemble learning algorithm for drug-target interaction prediction\",\"authors\":\"Sudipta Pathak, Xingyu Cai\",\"doi\":\"10.1109/ICCABS.2017.8114292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production and making drugs more affordable to its consumers. We proposed a new algorithm to combine multiple heterogeneous information for identification of new interactions between the drugs and targets. The algorithm proposed in this paper employs the stacking based approach namely KronRLS-Stacking, to combine models in a linear (or non-linear way), to address the drug-target interaction prediction problem. Our Algorithm is developed on top of RLS and KronRLS algorithms. The novelty of our approach is in combining heterogeneous sources of information using ensemble method called Stacking. Also, our algorithm is embarrassingly parallel and easy to distribute over multiple computing nodes. We compared our results with seventeen other algorithms. Like the other algorithms, we use Area Under Precision Recall (AUPR) curve as a measurement of goodness. We compared our results on Nuclear Receptor(NR), GPCR, Ion Channel(IC) and Enzyme(E) datasets respectively. KronRLS-Stacking obtained highest AUPR in NR, GPCR and IC datasets. In the experiments, we take average over five runs for all the datasets. For each run we performed a 5-fold cross validation. We chose the top 10 best performing kernels on the validation set to generate all results for testing datasets. Even though KronRLS-Stacking offers slightly worse standard deviation, our lowest AUPR score is still better than the best performing algorithms we compared with.\",\"PeriodicalId\":89933,\"journal\":{\"name\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"volume\":\"33 1\",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2017.8114292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2017.8114292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ensemble learning algorithm for drug-target interaction prediction
Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production and making drugs more affordable to its consumers. We proposed a new algorithm to combine multiple heterogeneous information for identification of new interactions between the drugs and targets. The algorithm proposed in this paper employs the stacking based approach namely KronRLS-Stacking, to combine models in a linear (or non-linear way), to address the drug-target interaction prediction problem. Our Algorithm is developed on top of RLS and KronRLS algorithms. The novelty of our approach is in combining heterogeneous sources of information using ensemble method called Stacking. Also, our algorithm is embarrassingly parallel and easy to distribute over multiple computing nodes. We compared our results with seventeen other algorithms. Like the other algorithms, we use Area Under Precision Recall (AUPR) curve as a measurement of goodness. We compared our results on Nuclear Receptor(NR), GPCR, Ion Channel(IC) and Enzyme(E) datasets respectively. KronRLS-Stacking obtained highest AUPR in NR, GPCR and IC datasets. In the experiments, we take average over five runs for all the datasets. For each run we performed a 5-fold cross validation. We chose the top 10 best performing kernels on the validation set to generate all results for testing datasets. Even though KronRLS-Stacking offers slightly worse standard deviation, our lowest AUPR score is still better than the best performing algorithms we compared with.