基于 BERT 预训练模型的深度学习模型,用于预测抗癌化学物质的抗增殖活性。

IF 2.3 3区 环境科学与生态学 Q3 CHEMISTRY, MULTIDISCIPLINARY
M Torabi, I Haririan, A Foroumadi, H Ghanbari, F Ghasemi
{"title":"基于 BERT 预训练模型的深度学习模型,用于预测抗癌化学物质的抗增殖活性。","authors":"M Torabi, I Haririan, A Foroumadi, H Ghanbari, F Ghasemi","doi":"10.1080/1062936X.2024.2431486","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying new compounds with minimal side effects to enhance patients' quality of life is the ultimate goal of drug discovery. Due to the expensive and time-consuming nature of experimental investigations and the scarcity of data in traditional QSAR studies, deep transfer learning models, such as the BERT model, have recently been suggested. This study evaluated the model's performance in predicting the anti-proliferative activity of five cancer cell lines (HeLa, MCF7, MDA-MB231, PC3, and MDA-MB) using over 3,000 synthesized molecules from PubChem. The results indicated that the model could predict the class of designed small molecules with acceptable accuracy for most cell lines, except for PC3 and MDA-MB. The model's performance was further tested on an in-house dataset of approximately 25 small molecules per cell line, based on IC50 values. The model accurately predicted the biological activity class for HeLa with an accuracy of <math><mn>0.77</mn><mo>±</mo><mn>0.4</mn></math> and demonstrated acceptable performance for MCF7 and MDA-MB231, with accuracy between 0.56 and 0.66. However, the results were less reliable for PC3 and HepG2. In conclusion, the ChemBERTa fine-tuned model shows potential for predicting outcomes on in-house datasets.</p>","PeriodicalId":21446,"journal":{"name":"SAR and QSAR in Environmental Research","volume":" ","pages":"1-22"},"PeriodicalIF":2.3000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep learning model based on the BERT pre-trained model to predict the antiproliferative activity of anti-cancer chemical compounds.\",\"authors\":\"M Torabi, I Haririan, A Foroumadi, H Ghanbari, F Ghasemi\",\"doi\":\"10.1080/1062936X.2024.2431486\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identifying new compounds with minimal side effects to enhance patients' quality of life is the ultimate goal of drug discovery. Due to the expensive and time-consuming nature of experimental investigations and the scarcity of data in traditional QSAR studies, deep transfer learning models, such as the BERT model, have recently been suggested. This study evaluated the model's performance in predicting the anti-proliferative activity of five cancer cell lines (HeLa, MCF7, MDA-MB231, PC3, and MDA-MB) using over 3,000 synthesized molecules from PubChem. The results indicated that the model could predict the class of designed small molecules with acceptable accuracy for most cell lines, except for PC3 and MDA-MB. The model's performance was further tested on an in-house dataset of approximately 25 small molecules per cell line, based on IC50 values. The model accurately predicted the biological activity class for HeLa with an accuracy of <math><mn>0.77</mn><mo>±</mo><mn>0.4</mn></math> and demonstrated acceptable performance for MCF7 and MDA-MB231, with accuracy between 0.56 and 0.66. However, the results were less reliable for PC3 and HepG2. In conclusion, the ChemBERTa fine-tuned model shows potential for predicting outcomes on in-house datasets.</p>\",\"PeriodicalId\":21446,\"journal\":{\"name\":\"SAR and QSAR in Environmental Research\",\"volume\":\" \",\"pages\":\"1-22\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2024-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SAR and QSAR in Environmental Research\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://doi.org/10.1080/1062936X.2024.2431486\",\"RegionNum\":3,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SAR and QSAR in Environmental Research","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1080/1062936X.2024.2431486","RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

发现副作用最小的新化合物以提高患者的生活质量是药物发现的终极目标。由于传统 QSAR 研究中的实验研究既昂贵又耗时,而且数据稀缺,最近有人提出了深度迁移学习模型,如 BERT 模型。本研究利用来自 PubChem 的 3,000 多种合成分子,评估了该模型在预测五种癌细胞系(HeLa、MCF7、MDA-MB231、PC3 和 MDA-MB)的抗增殖活性方面的性能。结果表明,除 PC3 和 MDA-MB 外,该模型能以可接受的准确度预测大多数细胞系的设计小分子类别。根据 IC50 值,对每个细胞系约 25 个小分子的内部数据集进一步测试了该模型的性能。该模型准确预测了 HeLa 的生物活性等级,准确率为 0.77±0.4;对 MCF7 和 MDA-MB231 的预测结果也可接受,准确率介于 0.56 和 0.66 之间。不过,PC3 和 HepG2 的结果不太可靠。总之,ChemBERTa 微调模型显示了在内部数据集上预测结果的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A deep learning model based on the BERT pre-trained model to predict the antiproliferative activity of anti-cancer chemical compounds.

Identifying new compounds with minimal side effects to enhance patients' quality of life is the ultimate goal of drug discovery. Due to the expensive and time-consuming nature of experimental investigations and the scarcity of data in traditional QSAR studies, deep transfer learning models, such as the BERT model, have recently been suggested. This study evaluated the model's performance in predicting the anti-proliferative activity of five cancer cell lines (HeLa, MCF7, MDA-MB231, PC3, and MDA-MB) using over 3,000 synthesized molecules from PubChem. The results indicated that the model could predict the class of designed small molecules with acceptable accuracy for most cell lines, except for PC3 and MDA-MB. The model's performance was further tested on an in-house dataset of approximately 25 small molecules per cell line, based on IC50 values. The model accurately predicted the biological activity class for HeLa with an accuracy of 0.77±0.4 and demonstrated acceptable performance for MCF7 and MDA-MB231, with accuracy between 0.56 and 0.66. However, the results were less reliable for PC3 and HepG2. In conclusion, the ChemBERTa fine-tuned model shows potential for predicting outcomes on in-house datasets.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
20.00%
发文量
78
审稿时长
>24 weeks
期刊介绍: SAR and QSAR in Environmental Research is an international journal welcoming papers on the fundamental and practical aspects of the structure-activity and structure-property relationships in the fields of environmental science, agrochemistry, toxicology, pharmacology and applied chemistry. A unique aspect of the journal is the focus on emerging techniques for the building of SAR and QSAR models in these widely varying fields. The scope of the journal includes, but is not limited to, the topics of topological and physicochemical descriptors, mathematical, statistical and graphical methods for data analysis, computer methods and programs, original applications and comparative studies. In addition to primary scientific papers, the journal contains reviews of books and software and news of conferences. Special issues on topics of current and widespread interest to the SAR and QSAR community will be published from time to time.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信