PTML Model of ChEMBL Compounds Assays for Vitamin Derivatives

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Ricardo Santana*, Robin Zuluaga, Piedad Gañán, Sonia Arrasate, Enrique Onieva Caracuel, Humbert González-Díaz*
{"title":"PTML Model of ChEMBL Compounds Assays for Vitamin Derivatives","authors":"Ricardo Santana*,&nbsp;Robin Zuluaga,&nbsp;Piedad Gañán,&nbsp;Sonia Arrasate,&nbsp;Enrique Onieva Caracuel,&nbsp;Humbert González-Díaz*","doi":"10.1021/acscombsci.9b00166","DOIUrl":null,"url":null,"abstract":"<p >Determining the biological activity of vitamin derivatives is needed given that organic synthesis of analogs of vitamins is an active field of interest for medicinal chemistry, pharmaceuticals, and food additives. Accordingly, scientists from different disciplines perform preclinical assays (<i>n</i><sub><i>ij</i></sub>) with a considerable combination of assay conditions (<b>c</b><sub><i>j</i></sub>). Indeed, the ChEMBL platform contains a database that includes results from 36?220 different biological activity bioassays of 21?240 different vitamins and vitamin derivatives. These assays present are heterogeneous in terms of assay combinations of <b>c</b><sub><i>j</i></sub>. They are focused on &gt;500 different biological activity parameters (<i>c</i><sub>0</sub>), &gt;340 different targets (<i>c</i><sub>1</sub>), &gt;6200 types of cell (<i>c</i><sub>2</sub>), &gt;120 organisms of assay (<i>c</i><sub>3</sub>), and &gt;60 assay strains (<i>c</i><sub>4</sub>). It includes a total of &gt;1850 niacin assays, &gt;1580 tretinoin assays, &gt;1580 retinol assays, 857 ascorbic acid assays, etc. Given the complexity of this combinatorial data in terms of being assimilated by researchers, we propose to build a model by combining perturbation theory (PT) and machine learning (ML). Through this study, we propose a PTML (PT + ML) combinatorial model for ChEMBL results on biological activity of vitamins and vitamins derivatives. The linear discriminant analysis (LDA) model presented the following results for training subset a: specificity (%) = 90.38, sensitivity (%) = 87.51, and accuracy (%) = 89.89. The model showed the following results for the external validation subset: specificity (%) = 90.58, sensitivity (%) = 87.72, and accuracy (%) = 90.09. Different types of linear and nonlinear PTML models, such as logistic regression (LR), classification tree (CT), n?ive Bayes (NB), and random Forest (RF), were applied to contrast the capacity of prediction. The PTML-LDA model predicts with more accuracy by applying combinatorial descriptors. In addition, a PCA experiment with chemical structure descriptors allowed us to characterize the high structural diversity of the chemical space studied. In any case, PTML models using chemical structure descriptors do not improve the performance of the PTML-LDA model based on ALOGP and PSA. We can conclude that the three variable PTML-LDA model is a simplified and adaptable tool for the prediction, for different experiment combinations, the biological activity of derivative vitamins.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2020-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1021/acscombsci.9b00166","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acscombsci.9b00166","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 8

Abstract

Determining the biological activity of vitamin derivatives is needed given that organic synthesis of analogs of vitamins is an active field of interest for medicinal chemistry, pharmaceuticals, and food additives. Accordingly, scientists from different disciplines perform preclinical assays (nij) with a considerable combination of assay conditions (cj). Indeed, the ChEMBL platform contains a database that includes results from 36?220 different biological activity bioassays of 21?240 different vitamins and vitamin derivatives. These assays present are heterogeneous in terms of assay combinations of cj. They are focused on >500 different biological activity parameters (c0), >340 different targets (c1), >6200 types of cell (c2), >120 organisms of assay (c3), and >60 assay strains (c4). It includes a total of >1850 niacin assays, >1580 tretinoin assays, >1580 retinol assays, 857 ascorbic acid assays, etc. Given the complexity of this combinatorial data in terms of being assimilated by researchers, we propose to build a model by combining perturbation theory (PT) and machine learning (ML). Through this study, we propose a PTML (PT + ML) combinatorial model for ChEMBL results on biological activity of vitamins and vitamins derivatives. The linear discriminant analysis (LDA) model presented the following results for training subset a: specificity (%) = 90.38, sensitivity (%) = 87.51, and accuracy (%) = 89.89. The model showed the following results for the external validation subset: specificity (%) = 90.58, sensitivity (%) = 87.72, and accuracy (%) = 90.09. Different types of linear and nonlinear PTML models, such as logistic regression (LR), classification tree (CT), n?ive Bayes (NB), and random Forest (RF), were applied to contrast the capacity of prediction. The PTML-LDA model predicts with more accuracy by applying combinatorial descriptors. In addition, a PCA experiment with chemical structure descriptors allowed us to characterize the high structural diversity of the chemical space studied. In any case, PTML models using chemical structure descriptors do not improve the performance of the PTML-LDA model based on ALOGP and PSA. We can conclude that the three variable PTML-LDA model is a simplified and adaptable tool for the prediction, for different experiment combinations, the biological activity of derivative vitamins.

Abstract Image

ChEMBL化合物的PTML模型
鉴于维生素类似物的有机合成是药物化学、制药和食品添加剂的一个活跃研究领域,确定维生素衍生物的生物活性是必要的。因此,来自不同学科的科学家在相当大的分析条件组合下进行临床前分析(nij)。事实上,ChEMBL平台包含一个数据库,其中包括来自36?220种不同生物活性的21?240种不同的维生素和维生素衍生物。这些检测方法在cj的检测组合方面存在异质性。他们专注于500种不同的生物活性参数(c0), 340种不同的靶标(c1), 6200种细胞类型(c2), 120种试验生物(c3)和60种试验菌株(c4)。它包括总共1850种烟酸测定法,1580种维甲酸测定法,1580种视黄醇测定法,857种抗坏血酸测定法等。考虑到这些组合数据在被研究人员吸收方面的复杂性,我们建议通过结合微扰理论(PT)和机器学习(ML)来建立一个模型。通过这项研究,我们提出了一个PTML (PT + ML)组合模型,用于维生素及其衍生物的化学bl结果的生物活性。线性判别分析(LDA)模型对训练子集a的结果如下:特异性(%)= 90.38,敏感性(%)= 87.51,准确性(%)= 89.89。该模型显示外部验证子集的结果如下:特异性(%)= 90.58,敏感性(%)= 87.72,准确性(%)= 90.09。不同类型的线性和非线性PTML模型,如逻辑回归(LR),分类树(CT), n?采用五贝叶斯(NB)和随机森林(RF)来比较预测能力。PTML-LDA模型通过组合描述符的应用提高了预测精度。此外,利用化学结构描述符的PCA实验使我们能够表征所研究的化学空间的高度结构多样性。在任何情况下,使用化学结构描述符的PTML模型都不能提高基于ALOGP和PSA的PTML- lda模型的性能。由此可见,三变量PTML-LDA模型是一种简化的、适应性强的预测工具,适用于不同实验组合对维生素衍生物生物活性的预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信