Tuan Vinh, Thanh-Hoang Nguyen-Vo, Viet-Tuan Le, Xuan-Phuc Phan-Nguyen, Binh P Nguyen
{"title":"使用基于预训练特征的流式掩码变换器对组蛋白去乙酰化酶抑制剂进行硅学识别。","authors":"Tuan Vinh, Thanh-Hoang Nguyen-Vo, Viet-Tuan Le, Xuan-Phuc Phan-Nguyen, Binh P Nguyen","doi":"10.1016/j.ymeth.2024.11.009","DOIUrl":null,"url":null,"abstract":"<p><p>Histone Deacetylases (HDACs) are enzymes that regulate gene expression by removing acetyl groups from histones. They are involved in various diseases, including neurodegenerative, cardiovascular, inflammatory, and metabolic disorders, as well as fibrosis in the liver, lungs, and kidneys. Successfully identifying potent HDAC inhibitors may offer a promising approach to treating these diseases. In addition to experimental techniques, researchers have introduced several in silico methods for identifying HDAC inhibitors. However, these existing computer-aided methods have shortcomings in their modeling stages, which limit their applications. In our study, we present a Streamlined Masked Transformer-based Pretrained (SMTP) encoder, which can be used to generate features for downstream tasks. The training process of the SMTP encoder was directed by masked attention-based learning, enhancing the model's generalizability in encoding molecules. The SMTP features were used to develop 11 classification models identifying 11 HDAC isoforms. We trained SMTP, a lightweight encoder, with only 1.9 million molecules, a smaller number than other known molecular encoders, yet its discriminant ability remains competitive. The results revealed that machine learning models developed using the SMTP feature set outperformed those developed using other feature sets in 8 out of 11 classification tasks. Additionally, chemical diversity analysis confirmed the encoder's effectiveness in distinguishing between two classes of molecules.</p>","PeriodicalId":390,"journal":{"name":"Methods","volume":" ","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In silico identification of Histone Deacetylase inhibitors using Streamlined Masked Transformer-based Pretrained features.\",\"authors\":\"Tuan Vinh, Thanh-Hoang Nguyen-Vo, Viet-Tuan Le, Xuan-Phuc Phan-Nguyen, Binh P Nguyen\",\"doi\":\"10.1016/j.ymeth.2024.11.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Histone Deacetylases (HDACs) are enzymes that regulate gene expression by removing acetyl groups from histones. They are involved in various diseases, including neurodegenerative, cardiovascular, inflammatory, and metabolic disorders, as well as fibrosis in the liver, lungs, and kidneys. Successfully identifying potent HDAC inhibitors may offer a promising approach to treating these diseases. In addition to experimental techniques, researchers have introduced several in silico methods for identifying HDAC inhibitors. However, these existing computer-aided methods have shortcomings in their modeling stages, which limit their applications. In our study, we present a Streamlined Masked Transformer-based Pretrained (SMTP) encoder, which can be used to generate features for downstream tasks. The training process of the SMTP encoder was directed by masked attention-based learning, enhancing the model's generalizability in encoding molecules. The SMTP features were used to develop 11 classification models identifying 11 HDAC isoforms. We trained SMTP, a lightweight encoder, with only 1.9 million molecules, a smaller number than other known molecular encoders, yet its discriminant ability remains competitive. The results revealed that machine learning models developed using the SMTP feature set outperformed those developed using other feature sets in 8 out of 11 classification tasks. Additionally, chemical diversity analysis confirmed the encoder's effectiveness in distinguishing between two classes of molecules.</p>\",\"PeriodicalId\":390,\"journal\":{\"name\":\"Methods\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ymeth.2024.11.009\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.ymeth.2024.11.009","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
In silico identification of Histone Deacetylase inhibitors using Streamlined Masked Transformer-based Pretrained features.
Histone Deacetylases (HDACs) are enzymes that regulate gene expression by removing acetyl groups from histones. They are involved in various diseases, including neurodegenerative, cardiovascular, inflammatory, and metabolic disorders, as well as fibrosis in the liver, lungs, and kidneys. Successfully identifying potent HDAC inhibitors may offer a promising approach to treating these diseases. In addition to experimental techniques, researchers have introduced several in silico methods for identifying HDAC inhibitors. However, these existing computer-aided methods have shortcomings in their modeling stages, which limit their applications. In our study, we present a Streamlined Masked Transformer-based Pretrained (SMTP) encoder, which can be used to generate features for downstream tasks. The training process of the SMTP encoder was directed by masked attention-based learning, enhancing the model's generalizability in encoding molecules. The SMTP features were used to develop 11 classification models identifying 11 HDAC isoforms. We trained SMTP, a lightweight encoder, with only 1.9 million molecules, a smaller number than other known molecular encoders, yet its discriminant ability remains competitive. The results revealed that machine learning models developed using the SMTP feature set outperformed those developed using other feature sets in 8 out of 11 classification tasks. Additionally, chemical diversity analysis confirmed the encoder's effectiveness in distinguishing between two classes of molecules.
期刊介绍:
Methods focuses on rapidly developing techniques in the experimental biological and medical sciences.
Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.