IDRdecoder: a machine learning approach for rational drug discovery toward intrinsically disordered regions.

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics Pub Date : 2025-07-18 eCollection Date: 2025-01-01 DOI:10.3389/fbinf.2025.1627836

Clara Shionyu-Mitusyama, Satoshi Ohmori, Subaru Hirata, Hirokazu Ishida, Tsuyoshi Shirai

{"title":"IDRdecoder: a machine learning approach for rational drug discovery toward intrinsically disordered regions.","authors":"Clara Shionyu-Mitusyama, Satoshi Ohmori, Subaru Hirata, Hirokazu Ishida, Tsuyoshi Shirai","doi":"10.3389/fbinf.2025.1627836","DOIUrl":null,"url":null,"abstract":"Introduction: Intrinsically disordered regions (IDRs) of proteins have traditionally been overlooked as drug targets. However, with growing recognition of their crucial role in biological activity and their involvement in various diseases, IDRs have emerged as promising targets for drug discovery. Despite this potential, rational methodologies for IDR-targeted drug discovery remain underdeveloped, primarily due to a lack of reference experimental data.Methods: This study explores a machine learning approach to predict IDR functions, drug interaction sites, and interacting molecular substructures within IDR sequences. To address the data gap, stepwise transfer learning was employed. IDRdecoder sequentially generate predictions for IDR classification, interaction sites, and interacting ligand substructures. In the first step, the neural net was trained as autoencoder by using 26,480,862 predicted IDR sequences. Then it was trained against 57,692 ligand-binding PDB sequences with higher IDR tendency via transfer learning for predict ligand interacting sites and ligand types.Results: IDRdecoder was evaluated against 9 IDR sequences, which were experimentally detailed as drug targets. In the encoding space, specific GO terms related to the hypothesized functions of the evaluation IDR sequences were highly enriched. The model's prediction performance for drug interacting sites and ligand types demonstrated the area under the curve (AUC) of 0.616 and 0.702, respectively. The performance was compared with existing methods including ProteinBERT, and IDRdecoder demonstrated moderately improved performance.Discussion: IDRdecoder is the first application for predicting drug interaction sites and ligands in IDR sequences. Analysis of the prediction results revealed characteristics beneficial for IDR-drug design; for instance, Tyr and Ala are preferred target sites, while flexible substructures, such as alkyl groups, are favored in ligand molecules.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1627836"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313641/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2025.1627836","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Intrinsically disordered regions (IDRs) of proteins have traditionally been overlooked as drug targets. However, with growing recognition of their crucial role in biological activity and their involvement in various diseases, IDRs have emerged as promising targets for drug discovery. Despite this potential, rational methodologies for IDR-targeted drug discovery remain underdeveloped, primarily due to a lack of reference experimental data.

Methods: This study explores a machine learning approach to predict IDR functions, drug interaction sites, and interacting molecular substructures within IDR sequences. To address the data gap, stepwise transfer learning was employed. IDRdecoder sequentially generate predictions for IDR classification, interaction sites, and interacting ligand substructures. In the first step, the neural net was trained as autoencoder by using 26,480,862 predicted IDR sequences. Then it was trained against 57,692 ligand-binding PDB sequences with higher IDR tendency via transfer learning for predict ligand interacting sites and ligand types.

Results: IDRdecoder was evaluated against 9 IDR sequences, which were experimentally detailed as drug targets. In the encoding space, specific GO terms related to the hypothesized functions of the evaluation IDR sequences were highly enriched. The model's prediction performance for drug interacting sites and ligand types demonstrated the area under the curve (AUC) of 0.616 and 0.702, respectively. The performance was compared with existing methods including ProteinBERT, and IDRdecoder demonstrated moderately improved performance.

Discussion: IDRdecoder is the first application for predicting drug interaction sites and ligands in IDR sequences. Analysis of the prediction results revealed characteristics beneficial for IDR-drug design; for instance, Tyr and Ala are preferred target sites, while flexible substructures, such as alkyl groups, are favored in ligand molecules.

Abstract Image

查看原文本刊更多论文

IDRdecoder：一种针对内在无序区域的合理药物发现的机器学习方法。

蛋白质的内在无序区（IDRs）作为药物靶点历来被忽视。然而，随着人们越来越认识到它们在生物活性和各种疾病中的关键作用，idr已成为药物发现的有希望的靶点。尽管有这种潜力，但idr靶向药物发现的合理方法仍然不发达，主要是由于缺乏参考实验数据。方法：本研究探索了一种机器学习方法来预测IDR序列中的IDR功能、药物相互作用位点和相互作用分子亚结构。为了解决数据差距，采用逐步迁移学习。IDRdecoder依次生成IDR分类、相互作用位点和相互作用配体亚结构的预测。在第一步中，使用26,480,862个预测的IDR序列将神经网络训练为自编码器。然后通过迁移学习对57,692条IDR倾向较高的配体结合PDB序列进行训练，预测配体相互作用位点和配体类型。结果：IDRdecoder对9个IDR序列进行了评价，并被实验确定为药物靶点。在编码空间中，与评估IDR序列的假设函数相关的特定GO项高度丰富。该模型预测药物相互作用位点和配体类型的曲线下面积（AUC）分别为0.616和0.702。将现有方法（包括ProteinBERT）的性能进行了比较，IDRdecoder的性能得到了适度的提高。讨论：IDRdecoder是预测IDR序列中药物相互作用位点和配体的第一个应用。对预测结果的分析揭示了有利于idr药物设计的特征；例如，Tyr和Ala是首选的靶位，而灵活的亚结构，如烷基，在配体分子中更受青睐。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in bioinformatics

CiteScore

2.60

自引率

0.00%

发文量