Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules

IF 2.6 4区生物学 Q2 BIOLOGY

Computational Biology and Chemistry Pub Date : 2024-10-28 DOI:10.1016/j.compbiolchem.2024.108270

Orchid Baruah , Upashya Parasar , Anirban Borphukan , Bikram Phukan , Pankaj Bharali , Selvaraman Nagamani , Hridoy Jyoti Mahanta

{"title":"Integrating (deep) machine learning and cheminformatics for predicting human intestinal absorption of small molecules","authors":"Orchid Baruah , Upashya Parasar , Anirban Borphukan , Bikram Phukan , Pankaj Bharali , Selvaraman Nagamani , Hridoy Jyoti Mahanta","doi":"10.1016/j.compbiolchem.2024.108270","DOIUrl":null,"url":null,"abstract":"<div><div>The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (<span><span>https://github.com/hridoy69/HIA</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108270"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124002585","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The oral route is the most preferred route for drug delivery, due to which the largest share of the pharmaceutical market is represented by oral drugs. Human intestinal absorption (HIA) is closely related to oral bioavailability making it an important factor in predicting drug absorption. In this study, we focus on predicting drug permeability at HIA as a marker for oral bioavailability. A set of 2648 compounds were collected from some early as well as recent works and curated to build a robust dataset. Five machine learning (ML) algorithms have been trained with a set of molecular descriptors of these compounds which have been selected after rigorous feature engineering. Additionally, two deep learning models - graph convolution neural network (GCNN) and graph attention network (GAT) based model were developed using the same set of compounds to exploit the predictability with automated extracted features. The numerical analyses show that out the five ML models, Random forest and LightGBM could predict with an accuracy of 87.71 % and 86.04 % on the test set and 81.43 % and 77.30 % with the external validation set respectively. Whereas with the GCNN and GAT based models, the final accuracy achieved was 77.69 % and 78.58 % on test set and 79.29 % and 79.42 % on the external validation set respectively. We believe deployment of these models for screening oral drugs can provide promising results and therefore deposited the dataset and models on the GitHub platform (https://github.com/hridoy69/HIA).

查看原文本刊更多论文

整合（深度）机器学习和化学信息学，预测人体肠道对小分子的吸收情况

口服途径是最受欢迎的给药途径，因此口服药物在医药市场中占有最大份额。人体肠道吸收（HIA）与口服生物利用度密切相关，因此是预测药物吸收的一个重要因素。在本研究中，我们将重点放在预测药物在 HIA 的渗透性，以此作为口服生物利用度的标志。我们从一些早期和近期的研究中收集了 2648 种化合物，并对其进行了整理，从而建立了一个强大的数据集。经过严格的特征工程筛选，使用这些化合物的一组分子描述符训练了五种机器学习（ML）算法。此外，还使用同一组化合物开发了两种深度学习模型--基于图卷积神经网络（GCNN）和图注意网络（GAT）的模型，以利用自动提取的特征进行预测。数值分析表明，在五个 ML 模型中，随机森林和 LightGBM 在测试集上的预测准确率分别为 87.71 % 和 86.04 %，在外部验证集上的预测准确率分别为 81.43 % 和 77.30 %。而基于 GCNN 和 GAT 的模型在测试集上的最终准确率分别为 77.69 % 和 78.58 %，在外部验证集上的准确率分别为 79.29 % 和 79.42 %。我们相信，将这些模型用于筛选口服药物能带来可喜的结果，因此将数据集和模型存入了 GitHub 平台 (https://github.com/hridoy69/HIA)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.