Molecular Informatics最新文献_第9页

Automatic generation of functional peptides with desired bioactivity and membrane permeability using Bayesian optimization. 利用贝叶斯优化技术自动生成具有所需生物活性和膜渗透性的功能肽。

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-04-01 Epub Date: 2024-02-19 DOI: 10.1002/minf.202300148

Itsuki Fukunaga, Yuki Matsukiyo, Kazuma Kaitoh, Yoshihiro Yamanishi

{"title":"Automatic generation of functional peptides with desired bioactivity and membrane permeability using Bayesian optimization.","authors":"Itsuki Fukunaga, Yuki Matsukiyo, Kazuma Kaitoh, Yoshihiro Yamanishi","doi":"10.1002/minf.202300148","DOIUrl":"10.1002/minf.202300148","url":null,"abstract":"Peptides are potentially useful modalities of drugs; however, cell membrane permeability is an obstacle in peptide drug discovery. The identification of bioactive peptides for a therapeutic target is also challenging because of the huge amino acid sequence patterns of peptides. In this study, we propose a novel computational method, PEptide generation system using Neural network Trained on Amino acid sequence data and Gaussian process-based optimizatiON (PENTAGON), to automatically generate new peptides with desired bioactivity and cell membrane permeability. In the algorithm, we mapped peptide amino acid sequences onto the latent space constructed using a variational autoencoder and searched for peptides with desired bioactivity and cell membrane permeability using Bayesian optimization. We used our proposed method to generate peptides with cell membrane permeability and bioactivity for each of the nine therapeutic targets, such as the estrogen receptor (ER). Our proposed method outperformed a previously developed peptide generator in terms of similarity to known active peptide sequences and the length of generated peptide sequences.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300148"},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139106312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors. 使用反应载体进行可合成的从头设计：应用于 PARP1 抑制剂。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2024-04-01 Epub Date: 2024-02-06 DOI: 10.1002/minf.202300183

Gian Marco Ghiandoni, Stuart R Flanagan, Michael J Bodkin, Maria Giulia Nizi, Albert Galera-Prat, Annalaura Brai, Beining Chen, James E A Wallace, Dimitar Hristozov, James Webster, Giuseppe Manfroni, Lari Lehtiö, Oriana Tabarrini, Valerie J Gillet

{"title":"Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors.","authors":"Gian Marco Ghiandoni, Stuart R Flanagan, Michael J Bodkin, Maria Giulia Nizi, Albert Galera-Prat, Annalaura Brai, Beining Chen, James E A Wallace, Dimitar Hristozov, James Webster, Giuseppe Manfroni, Lari Lehtiö, Oriana Tabarrini, Valerie J Gillet","doi":"10.1002/minf.202300183","DOIUrl":"10.1002/minf.202300183","url":null,"abstract":"De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300183"},"PeriodicalIF":2.8,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11475289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139521506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values. 基于集合的蛋白质配体结合亲和力预测值置信度估算方法。

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-04-01 Epub Date: 2024-02-15 DOI: 10.1002/minf.202300292

Milad Rayka, Morteza Mirzaei, Ali Mohammad Latifi

{"title":"An ensemble-based approach to estimate confidence of predicted protein-ligand binding affinity values.","authors":"Milad Rayka, Morteza Mirzaei, Ali Mohammad Latifi","doi":"10.1002/minf.202300292","DOIUrl":"10.1002/minf.202300292","url":null,"abstract":"When designing a machine learning-based scoring function, we access a limited number of protein-ligand complexes with experimentally determined binding affinity values, representing only a fraction of all possible protein-ligand complexes. Consequently, it is crucial to report a measure of confidence and quantify the uncertainty in the model's predictions during test time. Here, we adopt the conformal prediction technique to evaluate the confidence of a prediction for each member of the core set of the CASF 2016 benchmark. The conformal prediction technique requires a diverse ensemble of predictors for uncertainty estimation. To this end, we introduce ENS-Score as an ensemble predictor, which includes 30 models with different protein-ligand representation approaches and achieves Pearson's correlation of 0.842 on the core set of the CASF 2016 benchmark. Also, we comprehensively investigate the residual error of each data point to assess the normality behavior of the distribution of the residual errors and their correlation to the structural features of the ligands, such as hydrophobic interactions and halogen bonding. In the end, we provide a local host web application to facilitate the usage of ENS-Score. All codes to repeat results are provided at https://github.com/miladrayka/ENS_Score.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300292"},"PeriodicalIF":3.6,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139735655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs). 将基于机器学习的读取交叉结构-性质关系（RASPR）作为预测建模的新工具：预测染料敏化太阳能电池（DSSC）中某些类别有机染料的功率转换效率（PCE）。

IF 2.8 4区医学

Molecular Informatics Pub Date : 2024-04-01 Epub Date: 2024-02-19 DOI: 10.1002/minf.202300210

Souvik Pore, Arkaprava Banerjee, Kunal Roy

{"title":"Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs).","authors":"Souvik Pore, Arkaprava Banerjee, Kunal Roy","doi":"10.1002/minf.202300210","DOIUrl":"10.1002/minf.202300210","url":null,"abstract":"The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300210"},"PeriodicalIF":2.8,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139906082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cover Picture: (Mol. Inf. 3/2024) 封面图片：（Mol.Inf. 3/2024）

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-03-21 DOI: 10.1002/minf.202480301

引用次数: 0

Exploring data-driven chemical SMILES tokenization approaches to identify key protein-ligand binding moieties. 探索数据驱动的化学 SMILES 标记化方法，以确定关键的蛋白质配体结合分子。

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-03-01 Epub Date: 2024-01-23 DOI: 10.1002/minf.202300249

Asu Busra Temizer, Gökçe Uludoğan, Rıza Özçelik, Taha Koulani, Elif Ozkirimli, Kutlu O Ulgen, Nilgun Karali, Arzucan Özgür

{"title":"Exploring data-driven chemical SMILES tokenization approaches to identify key protein-ligand binding moieties.","authors":"Asu Busra Temizer, Gökçe Uludoğan, Rıza Özçelik, Taha Koulani, Elif Ozkirimli, Kutlu O Ulgen, Nilgun Karali, Arzucan Özgür","doi":"10.1002/minf.202300249","DOIUrl":"10.1002/minf.202300249","url":null,"abstract":"Machine learning models have found numerous successful applications in computational drug discovery. A large body of these models represents molecules as sequences since molecular sequences are easily available, simple, and informative. The sequence-based models often segment molecular sequences into pieces called chemical words, analogous to the words that make up sentences in human languages, and then apply advanced natural language processing techniques for tasks such as de novo drug design, property prediction, and binding affinity prediction. However, the chemical characteristics and significance of these building blocks, chemical words, remain unexplored. To address this gap, we employ data-driven SMILES tokenization techniques such as Byte Pair Encoding, WordPiece, and Unigram to identify chemical words and compare the resulting vocabularies. To understand the chemical significance of these words, we build a language-inspired pipeline that treats high affinity ligands of protein targets as documents and selects key chemical words making up those ligands based on tf-idf weighting. The experiments on multiple protein-ligand affinity datasets show that despite differences in words, lengths, and validity among the vocabularies generated by different subword tokenization algorithms, the identified key chemical words exhibit similarity. Further, we conduct case studies on a number of target to analyze the impact of key chemical words on binding. We find that these key chemical words are specific to protein targets and correspond to known pharmacophores and functional groups. Our approach elucidates chemical properties of the words identified by machine learning models and can be used in drug discovery studies to determine significant chemical moieties.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300249"},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139403684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

In silico construction of a focused fragment library facilitating exploration of chemical space. 硅构建聚焦片段库，促进化学空间探索。

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-03-01 Epub Date: 2024-01-23 DOI: 10.1002/minf.202300256

Weijie Han, Xiaohe Xu, Qing Fan, Yingchao Yan, YanMin Zhang, Yadong Chen, Haichun Liu

{"title":"In silico construction of a focused fragment library facilitating exploration of chemical space.","authors":"Weijie Han, Xiaohe Xu, Qing Fan, Yingchao Yan, YanMin Zhang, Yadong Chen, Haichun Liu","doi":"10.1002/minf.202300256","DOIUrl":"10.1002/minf.202300256","url":null,"abstract":"Fragment-based drug design (FBDD) has emerged as a captivating subject in the realm of computer-aided drug design, enabling the generation of novel molecules through the rearrangement of ring systems within known compounds. The construction of focused fragment library plays a pivotal role in FBDD, necessitating the compilation of all potential bioactive ring systems capable of interacting with a specific target. In our study, we propose a workflow for the development of a focused fragment library and combinatorial compound library. The fragment library comprises seed fragments and collected fragments. The extraction of seed fragments is guided by receptor information, serving as a prerequisite for establishing a focused libraries. Conversely, collected fragments are obtained using the feature graph method, which offers a simplified representation of fragments and strikes a balance between diversity and similarity when categorizing different fragments. The utilization of feature graph facilitates the rational partitioning of chemical space at fragment level, enabling the exploration of desired chemical space and enhancing the efficiency of screening compound library. Analysis demonstrates that our workflow enables the enumeration of a greater number of entirely new potential compounds, thereby aiding in the rational design of drugs.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300256"},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139403685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

In Silico prediction of inhibitors for multiple transporters via machine learning methods. 通过机器学习方法对多种转运体的抑制剂进行硅学预测。

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-03-01 Epub Date: 2024-02-06 DOI: 10.1002/minf.202300270

Hao Duan, Chaofeng Lou, Yaxin Gu, Yimeng Wang, Weihua Li, Guixia Liu, Yun Tang

{"title":"In Silico prediction of inhibitors for multiple transporters via machine learning methods.","authors":"Hao Duan, Chaofeng Lou, Yaxin Gu, Yimeng Wang, Weihua Li, Guixia Liu, Yun Tang","doi":"10.1002/minf.202300270","DOIUrl":"10.1002/minf.202300270","url":null,"abstract":"Transporters play an indispensable role in facilitating the transport of nutrients, signaling molecules and the elimination of metabolites and toxins in human cells. Contemporary computational methods have been employed in the prediction of transporter inhibitors. However, these methods often focus on isolated endpoints, overlooking the interactions between transporters and lacking good interpretation. In this study, we integrated a comprehensive dataset and constructed models to assess the inhibitory effects on seven transporters. Both conventional machine learning and multi-task deep learning methods were employed. The results demonstrated that the MLT-GAT model achieved superior performance with an average AUC value of 0.882. It is noteworthy that our model excels not only in prediction performance but also in achieving robust interpretability, aided by GNN-Explainer. It provided valuable insights into transporter inhibition. The reliability of our model's predictions positioned it as a promising and valuable tool in the field of transporter inhibition research. Related data and code are available at https://gitee.com/wutiantian99/transporter_code.git.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300270"},"PeriodicalIF":3.6,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139485652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cover Picture: (Mol. Inf. 2/2024) 封面图片：（Mol.Inf.2/2024）

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-02-23 DOI: 10.1002/minf.202480201

引用次数: 0

Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods. 利用机器学习方法预测钙钛矿太阳能电池的带隙和效率。

IF 3.6 4区医学

Molecular Informatics Pub Date : 2024-02-01 Epub Date: 2024-01-04 DOI: 10.1002/minf.202300217

Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong

{"title":"Predicting the bandgap and efficiency of perovskite solar cells using machine learning methods.","authors":"Asad Khan, Jeevan Kandel, Hilal Tayara, Kil To Chong","doi":"10.1002/minf.202300217","DOIUrl":"10.1002/minf.202300217","url":null,"abstract":"Rapid and accurate prediction of bandgaps and efficiency of perovskite solar cells is a crucial challenge for various solar cell applications. Existing theoretical and experimental methods often accurately measure these parameters; however, these methods are costly and time-consuming. Machine learning-based approaches offer a promising and computationally efficient method to address this problem. In this study, we trained different machine learning(ML) models using previously reported experimental data. Among the different ML models, the CatBoostRegressor performed better for both bandgap and efficiency approximations. We evaluated the proposed model using k-fold cross-validation and investigated the relative importance of input features using Shapley Additive Explanations (SHAP). SHAP interprets valuable insights into feature contributions of the prediction of the proposed model. Furthermore, we validated the performance of the proposed model using an independent dataset, demonstrating its robustness and generalizability beyond the training data. Our findings show that machine learning-based approaches, with the aid of SHAP, can provide a promising and computationally efficient method for the accurate and rapid prediction of perovskite solar cell properties. The proposed model is expected to facilitate the discovery of new perovskite materials and is freely available at GitHub (https://github.com/AsadKhanJBNU/perovskite_bandgap_and_efficiency.git) for the perovskite community.","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300217"},"PeriodicalIF":3.6,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138482686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0