A multi-view feature representation for predicting drugs combination synergy based on ensemble and multi-task attention models

IF 7.1 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2024-09-27 DOI:10.1186/s13321-024-00903-3

Samar Monem, Aboul Ella Hassanien, Alaa H. Abdel-Hamid

{"title":"A multi-view feature representation for predicting drugs combination synergy based on ensemble and multi-task attention models","authors":"Samar Monem, Aboul Ella Hassanien, Alaa H. Abdel-Hamid","doi":"10.1186/s13321-024-00903-3","DOIUrl":null,"url":null,"abstract":"<div><p>This paper proposes a novel multi-view ensemble predictor model that is designed to address the challenge of determining synergistic drug combinations by predicting both the synergy score value values and synergy class label of drug combinations with cancer cell lines. The proposed methodology involves representing drug features through four distinct views: Simplified Molecular-Input Line-Entry System (SMILES) features, molecular graph features, fingerprint features, and drug-target features. On the other hand, cell line features are captured through four views: gene expression features, copy number features, mutation features, and proteomics features. To prevent overfitting of the model, two techniques are employed. First, each view feature of a drug is paired with each corresponding cell line view and input into a multi-task attention deep learning model. This multi-task model is trained to simultaneously predict both the synergy score value and synergy class label. This process results in sixteen input view features being fed into the multi-task model, producing sixteen prediction values. Subsequently, these prediction values are utilized as inputs for an ensemble model, which outputs the final prediction value. The ‘MVME’ model is assessed using the O’Neil dataset, which includes 38 distinct drugs combined across 39 distinct cancer cell lines to output 22,737 drug combination pairs. For the synergy score value, the proposed model scores a mean square error (MSE) of 206.57, a root mean square error (RMSE) of 14.30, and a Pearson score of 0.76. For the synergy class label, the model scores 0.90 for accuracy, 0.96 for precision, 0.57 for kappa, 0.96 for the area under the ROC curve (ROC-AUC), and 0.88 for the area under the precision-recall curve (PR-AUC).</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00903-3","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00903-3","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a novel multi-view ensemble predictor model that is designed to address the challenge of determining synergistic drug combinations by predicting both the synergy score value values and synergy class label of drug combinations with cancer cell lines. The proposed methodology involves representing drug features through four distinct views: Simplified Molecular-Input Line-Entry System (SMILES) features, molecular graph features, fingerprint features, and drug-target features. On the other hand, cell line features are captured through four views: gene expression features, copy number features, mutation features, and proteomics features. To prevent overfitting of the model, two techniques are employed. First, each view feature of a drug is paired with each corresponding cell line view and input into a multi-task attention deep learning model. This multi-task model is trained to simultaneously predict both the synergy score value and synergy class label. This process results in sixteen input view features being fed into the multi-task model, producing sixteen prediction values. Subsequently, these prediction values are utilized as inputs for an ensemble model, which outputs the final prediction value. The ‘MVME’ model is assessed using the O’Neil dataset, which includes 38 distinct drugs combined across 39 distinct cancer cell lines to output 22,737 drug combination pairs. For the synergy score value, the proposed model scores a mean square error (MSE) of 206.57, a root mean square error (RMSE) of 14.30, and a Pearson score of 0.76. For the synergy class label, the model scores 0.90 for accuracy, 0.96 for precision, 0.57 for kappa, 0.96 for the area under the ROC curve (ROC-AUC), and 0.88 for the area under the precision-recall curve (PR-AUC).

查看原文本刊更多论文

基于集合和多任务注意力模型预测药物组合协同作用的多视角特征表征

本文提出了一种新颖的多视角集合预测模型，旨在通过预测药物组合与癌细胞株的协同作用评分值和协同作用类别标签，解决确定协同作用药物组合的难题。所提出的方法包括通过四种不同的视图来表示药物特征：简化分子输入线输入系统（SMILES）特征、分子图特征、指纹特征和药物靶点特征。另一方面，通过四种视图捕捉细胞系特征：基因表达特征、拷贝数特征、突变特征和蛋白质组学特征。为防止模型过度拟合，我们采用了两种技术。首先，药物的每个视图特征与每个相应的细胞系视图配对，并输入多任务注意力深度学习模型。该多任务模型经过训练，可同时预测协同作用得分值和协同作用类别标签。这一过程会将十六个输入视图特征输入多任务模型，产生十六个预测值。随后，这些预测值被用作集合模型的输入，输出最终预测值。MVME "模型使用 O'Neil 数据集进行评估，该数据集包括 38 种不同药物在 39 种不同癌症细胞系中的组合，共输出 22737 对药物组合。在协同作用分值方面，建议模型的均方误差 (MSE) 为 206.57，均方根误差 (RMSE) 为 14.30，皮尔逊分值为 0.76。对于协同类标签，该模型的准确度得分为 0.90，精确度得分为 0.96，卡帕得分为 0.57，ROC 曲线下面积（ROC-AUC）得分为 0.96，精确度-召回曲线下面积（PR-AUC）得分为 0.88。本文利用四种不同的药物特征视图和四种癌症细胞系视图，提出了一种增强型协同药物组合模型。然后将每个视图输入多任务深度学习模型，以同时预测协同作用得分和类别标签。为了应对管理不同视图及其相应预测值的挑战，同时避免过拟合，应用了一个集合模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.