药物-靶标相互作用预测的集成学习算法

IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences Pub Date : 2017-10-01 DOI:10.1109/ICCABS.2017.8114292

Sudipta Pathak, Xingyu Cai

{"title":"药物-靶标相互作用预测的集成学习算法","authors":"Sudipta Pathak, Xingyu Cai","doi":"10.1109/ICCABS.2017.8114292","DOIUrl":null,"url":null,"abstract":"Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production and making drugs more affordable to its consumers. We proposed a new algorithm to combine multiple heterogeneous information for identification of new interactions between the drugs and targets. The algorithm proposed in this paper employs the stacking based approach namely KronRLS-Stacking, to combine models in a linear (or non-linear way), to address the drug-target interaction prediction problem. Our Algorithm is developed on top of RLS and KronRLS algorithms. The novelty of our approach is in combining heterogeneous sources of information using ensemble method called Stacking. Also, our algorithm is embarrassingly parallel and easy to distribute over multiple computing nodes. We compared our results with seventeen other algorithms. Like the other algorithms, we use Area Under Precision Recall (AUPR) curve as a measurement of goodness. We compared our results on Nuclear Receptor(NR), GPCR, Ion Channel(IC) and Enzyme(E) datasets respectively. KronRLS-Stacking obtained highest AUPR in NR, GPCR and IC datasets. In the experiments, we take average over five runs for all the datasets. For each run we performed a 5-fold cross validation. We chose the top 10 best performing kernels on the validation set to generate all results for testing datasets. Even though KronRLS-Stacking offers slightly worse standard deviation, our lowest AUPR score is still better than the best performing algorithms we compared with.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"33 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Ensemble learning algorithm for drug-target interaction prediction\",\"authors\":\"Sudipta Pathak, Xingyu Cai\",\"doi\":\"10.1109/ICCABS.2017.8114292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production and making drugs more affordable to its consumers. We proposed a new algorithm to combine multiple heterogeneous information for identification of new interactions between the drugs and targets. The algorithm proposed in this paper employs the stacking based approach namely KronRLS-Stacking, to combine models in a linear (or non-linear way), to address the drug-target interaction prediction problem. Our Algorithm is developed on top of RLS and KronRLS algorithms. The novelty of our approach is in combining heterogeneous sources of information using ensemble method called Stacking. Also, our algorithm is embarrassingly parallel and easy to distribute over multiple computing nodes. We compared our results with seventeen other algorithms. Like the other algorithms, we use Area Under Precision Recall (AUPR) curve as a measurement of goodness. We compared our results on Nuclear Receptor(NR), GPCR, Ion Channel(IC) and Enzyme(E) datasets respectively. KronRLS-Stacking obtained highest AUPR in NR, GPCR and IC datasets. In the experiments, we take average over five runs for all the datasets. For each run we performed a 5-fold cross validation. We chose the top 10 best performing kernels on the validation set to generate all results for testing datasets. Even though KronRLS-Stacking offers slightly worse standard deviation, our lowest AUPR score is still better than the best performing algorithms we compared with.\",\"PeriodicalId\":89933,\"journal\":{\"name\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"volume\":\"33 1\",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2017.8114292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2017.8114292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

通过模拟预测药物-靶标相互作用是一个非常重要的问题。它对制药行业的药物发现产生了巨大的影响。FDA报告称，将一种新药推向市场需要近50亿美元。在该领域预测准确性上的微小改进可能会节省数百万美元的投资，因为它降低了生产成本，使消费者更能负担得起药物。我们提出了一种新的算法来结合多种异构信息来识别药物与靶点之间的新相互作用。本文提出的算法采用基于堆叠的KronRLS-Stacking方法，以线性(或非线性)方式组合模型，解决药物-靶标相互作用预测问题。我们的算法是在RLS和KronRLS算法的基础上开发的。我们方法的新颖之处在于使用称为堆叠的集成方法组合异构信息源。此外，我们的算法具有令人尴尬的并行性，易于分布在多个计算节点上。我们将我们的结果与其他17种算法进行了比较。与其他算法一样，我们使用精确召回面积(AUPR)曲线作为良度的度量。我们分别比较了核受体(NR)、GPCR、离子通道(IC)和酶(E)数据集的结果。KronRLS-Stacking在NR、GPCR和IC数据集中获得最高的AUPR。在实验中，我们对所有数据集取5次以上的平均值。对于每次运行，我们执行5倍交叉验证。我们选择验证集上性能最好的10个内核来生成测试数据集的所有结果。尽管KronRLS-Stacking提供了稍差的标准偏差，但我们的最低AUPR得分仍然优于我们比较的最佳性能算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Ensemble learning algorithm for drug-target interaction prediction

Predicting drug-target interaction through simulation is an immensely important problem. It has a huge impact in drug discovery in pharmaceutical industry. FDA reports that it takes close to five billion dollars to introduce a new drug to the market. A slight improvement in accuracy of prediction in the domain may save millions of dollars in the investment, there by lowering down the cost of production and making drugs more affordable to its consumers. We proposed a new algorithm to combine multiple heterogeneous information for identification of new interactions between the drugs and targets. The algorithm proposed in this paper employs the stacking based approach namely KronRLS-Stacking, to combine models in a linear (or non-linear way), to address the drug-target interaction prediction problem. Our Algorithm is developed on top of RLS and KronRLS algorithms. The novelty of our approach is in combining heterogeneous sources of information using ensemble method called Stacking. Also, our algorithm is embarrassingly parallel and easy to distribute over multiple computing nodes. We compared our results with seventeen other algorithms. Like the other algorithms, we use Area Under Precision Recall (AUPR) curve as a measurement of goodness. We compared our results on Nuclear Receptor(NR), GPCR, Ion Channel(IC) and Enzyme(E) datasets respectively. KronRLS-Stacking obtained highest AUPR in NR, GPCR and IC datasets. In the experiments, we take average over five runs for all the datasets. For each run we performed a 5-fold cross validation. We chose the top 10 best performing kernels on the validation set to generate all results for testing datasets. Even though KronRLS-Stacking offers slightly worse standard deviation, our lowest AUPR score is still better than the best performing algorithms we compared with.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences

自引率

0.00%

发文量