Stack-HDAC3i：利用堆叠集合学习框架，高精度识别 HDAC3 抑制剂。

IF 4.2 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2024-08-25 DOI:10.1016/j.ymeth.2024.08.003

Watshara Shoombuatong , Ittipat Meewan , Lawankorn Mookdarsanit , Nalini Schaduangrat

{"title":"Stack-HDAC3i：利用堆叠集合学习框架，高精度识别 HDAC3 抑制剂。","authors":"Watshara Shoombuatong , Ittipat Meewan , Lawankorn Mookdarsanit , Nalini Schaduangrat","doi":"10.1016/j.ymeth.2024.08.003","DOIUrl":null,"url":null,"abstract":"<div><p>Epigenetics involves reversible modifications in gene expression without altering the genetic code itself. Among these modifications, histone deacetylases (HDACs) play a key role by removing acetyl groups from lysine residues on histones. Overexpression of HDACs is linked to the proliferation and survival of tumor cells. To combat this, HDAC inhibitors (HDACi) are commonly used in cancer treatments. However, pan-HDAC inhibition can lead to numerous side effects. Therefore, isoform-selective HDAC inhibitors, such as HDAC3i, could be advantageous for treating various medical conditions while minimizing off-target effects. To date, computational approaches that use only the SMILES notation without any experimental evidence have become increasingly popular and necessary for the initial discovery of novel potential therapeutic drugs. In this study, we develop an innovative and high-precision stacked-ensemble framework, called Stack-HDAC3i, which can directly identify HDAC3i using only the SMILES notation. Using an up-to-date benchmark dataset, we first employed both molecular descriptors and Mol2Vec embeddings to generate feature representations that cover multi-view information embedded in HDAC3i, such as structural and contextual information. Subsequently, these feature representations were used to train baseline models using nine popular ML algorithms. Finally, the probabilistic features derived from the selected baseline models were fused to construct the final stacked model. Both cross-validation and independent tests showed that Stack-HDAC3i is a high-accuracy prediction model with great generalization ability for identifying HDAC3i. Furthermore, in the independent test, Stack-HDAC3i achieved an accuracy of 0.926 and Matthew’s correlation coefficient of 0.850, which are 0.44–6.11% and 0.83–11.90% higher than its constituent baseline models, respectively.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 147-157"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stack-HDAC3i: A high-precision identification of HDAC3 inhibitors by exploiting a stacked ensemble-learning framework\",\"authors\":\"Watshara Shoombuatong , Ittipat Meewan , Lawankorn Mookdarsanit , Nalini Schaduangrat\",\"doi\":\"10.1016/j.ymeth.2024.08.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Epigenetics involves reversible modifications in gene expression without altering the genetic code itself. Among these modifications, histone deacetylases (HDACs) play a key role by removing acetyl groups from lysine residues on histones. Overexpression of HDACs is linked to the proliferation and survival of tumor cells. To combat this, HDAC inhibitors (HDACi) are commonly used in cancer treatments. However, pan-HDAC inhibition can lead to numerous side effects. Therefore, isoform-selective HDAC inhibitors, such as HDAC3i, could be advantageous for treating various medical conditions while minimizing off-target effects. To date, computational approaches that use only the SMILES notation without any experimental evidence have become increasingly popular and necessary for the initial discovery of novel potential therapeutic drugs. In this study, we develop an innovative and high-precision stacked-ensemble framework, called Stack-HDAC3i, which can directly identify HDAC3i using only the SMILES notation. Using an up-to-date benchmark dataset, we first employed both molecular descriptors and Mol2Vec embeddings to generate feature representations that cover multi-view information embedded in HDAC3i, such as structural and contextual information. Subsequently, these feature representations were used to train baseline models using nine popular ML algorithms. Finally, the probabilistic features derived from the selected baseline models were fused to construct the final stacked model. Both cross-validation and independent tests showed that Stack-HDAC3i is a high-accuracy prediction model with great generalization ability for identifying HDAC3i. Furthermore, in the independent test, Stack-HDAC3i achieved an accuracy of 0.926 and Matthew’s correlation coefficient of 0.850, which are 0.44–6.11% and 0.83–11.90% higher than its constituent baseline models, respectively.</p></div>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\"230 \",\"pages\":\"Pages 147-157\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1046202324001841\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001841","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

表观遗传学涉及基因表达的可逆修饰，而不改变遗传密码本身。在这些修饰中，组蛋白去乙酰化酶（HDACs）通过去除组蛋白赖氨酸残基上的乙酰基发挥着关键作用。HDAC 的过度表达与肿瘤细胞的增殖和存活有关。为解决这一问题，HDAC 抑制剂（HDACi）通常用于癌症治疗。然而，泛HDAC抑制剂会导致许多副作用。因此，同工酶选择性 HDAC 抑制剂（如 HDAC3i）在治疗各种病症的同时，还能最大限度地减少脱靶效应。迄今为止，仅使用 SMILES 符号而不需要任何实验证据的计算方法越来越流行，而且对于初步发现新型潜在治疗药物也是必不可少的。在本研究中，我们开发了一种创新的高精度堆积组合框架，称为 Stack-HDAC3i，它可以仅使用 SMILES 符号直接识别 HDAC3i。利用最新的基准数据集，我们首先利用分子描述符和 Mol2Vec 嵌入生成了涵盖 HDAC3i 中嵌入的多视角信息（如结构和上下文信息）的特征表征。随后，这些特征表征被用于使用九种流行的 ML 算法训练基线模型。最后，融合从选定的基线模型中得出的概率特征，构建最终的堆叠模型。交叉验证和独立测试表明，Stack-HDAC3i 是一个高准确度的预测模型，在识别 HDAC3i 方面具有很强的泛化能力。此外，在独立测试中，Stack-HDAC3i 的准确率为 0.926，马修相关系数为 0.850，分别比其组成的基线模型高出 0.44-6.11% 和 0.83-11.90%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Stack-HDAC3i: A high-precision identification of HDAC3 inhibitors by exploiting a stacked ensemble-learning framework

Epigenetics involves reversible modifications in gene expression without altering the genetic code itself. Among these modifications, histone deacetylases (HDACs) play a key role by removing acetyl groups from lysine residues on histones. Overexpression of HDACs is linked to the proliferation and survival of tumor cells. To combat this, HDAC inhibitors (HDACi) are commonly used in cancer treatments. However, pan-HDAC inhibition can lead to numerous side effects. Therefore, isoform-selective HDAC inhibitors, such as HDAC3i, could be advantageous for treating various medical conditions while minimizing off-target effects. To date, computational approaches that use only the SMILES notation without any experimental evidence have become increasingly popular and necessary for the initial discovery of novel potential therapeutic drugs. In this study, we develop an innovative and high-precision stacked-ensemble framework, called Stack-HDAC3i, which can directly identify HDAC3i using only the SMILES notation. Using an up-to-date benchmark dataset, we first employed both molecular descriptors and Mol2Vec embeddings to generate feature representations that cover multi-view information embedded in HDAC3i, such as structural and contextual information. Subsequently, these feature representations were used to train baseline models using nine popular ML algorithms. Finally, the probabilistic features derived from the selected baseline models were fused to construct the final stacked model. Both cross-validation and independent tests showed that Stack-HDAC3i is a high-accuracy prediction model with great generalization ability for identifying HDAC3i. Furthermore, in the independent test, Stack-HDAC3i achieved an accuracy of 0.926 and Matthew’s correlation coefficient of 0.850, which are 0.44–6.11% and 0.83–11.90% higher than its constituent baseline models, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.