Journal of Chemical Information and Modeling 最新文献

筛选
英文 中文
SELFprot: Effective and Efficient Multitask Finetuning Methods for Protein Parameter Prediction.
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-17 DOI: 10.1021/acs.jcim.4c02230
Marltan Wilson, Thomas Coudrat, Andrew Warden
{"title":"SELFprot: Effective and Efficient Multitask Finetuning Methods for Protein Parameter Prediction.","authors":"Marltan Wilson, Thomas Coudrat, Andrew Warden","doi":"10.1021/acs.jcim.4c02230","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02230","url":null,"abstract":"<p><p>Accurately predicting protein-ligand interactions and enzymatic kinetics remains a challenge for computational biology. Here, we present SELFprot, a suite of modular transformer-based machine learning architectures that leverage the ESM2-35M model architecture for protein sequence and small molecule embeddings to improve predictions of complex biochemical interactions. SELFprot employs multitask learning and parameter-efficient finetuning through low-rank adaptation, allowing for adaptive, data-driven model refinement. Furthermore, ensemble learning techniques are used to enhance the robustness and reduce the prediction variance. Evaluated on the BindingDB and CatPred-DB data sets, SELFprot achieves competitive performance with notable improvements in parameter-efficient prediction of <b>k</b><sub><b>cat</b></sub>, <b>K</b><sub><b>m</b></sub>, <b>K</b><sub><b>i</b></sub>, <b>K</b><sub><b>d</b></sub>, <b>IC</b><sub><b>50</b></sub>, and <b>EC</b><sub><b>50</b></sub> values as well as the classification of functional site residues. With comparable accuracy to existing models and an order of magnitude fewer parameters, SELFprot demonstrates versatility and efficiency, making it a valuable tool for protein-ligand interaction studies in bioengineering.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143646419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multitarget Natural Compounds for Ischemic Stroke Treatment: Integration of Deep Learning Prediction and Experimental Validation.
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-14 DOI: 10.1021/acs.jcim.5c00135
Junyu Zhou, Chen Li, Yu Yue, Yong Kwan Kim, Sunmin Park
{"title":"Multitarget Natural Compounds for Ischemic Stroke Treatment: Integration of Deep Learning Prediction and Experimental Validation.","authors":"Junyu Zhou, Chen Li, Yu Yue, Yong Kwan Kim, Sunmin Park","doi":"10.1021/acs.jcim.5c00135","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00135","url":null,"abstract":"<p><p>Ischemic stroke's complex pathophysiology demands therapeutic approaches targeting multiple pathways simultaneously, yet current treatments remain limited. We developed an innovative drug discovery pipeline combining a deep learning approach with experimental validation to identify natural compounds with comprehensive neuroprotective properties. Our computational framework integrated SELFormer, a transformer-based deep learning model, and multiple deep learning algorithms to predict NC bioactivity against seven crucial stroke-related targets (<i>ACE, GLA, MMP9, NPFFR2, PDE4D</i>, and <i>eNOS</i>). The pipeline encompassed IC50 predictions, clustering analysis, quantitative structure-activity relationship (QSAR) modeling, and uniform manifold approximation and projection (UMAP)-based bioactivity profiling followed by molecular docking studies and experimental validation. Analysis revealed six distinct NC clusters with unique molecular signatures. UMAP projection identified 11 medium-activity (6 < pIC50 ≤ 7) and 57 high-activity (pIC50 > 7) compounds, with molecular docking confirming strong correlations between binding energies and predicted pIC50 values. <i>In vitro</i> studies using NGF-differentiated PC12 cells under oxygen-glucose deprivation demonstrated significant neuroprotective effects of four high-activity compounds: feruloyl glucose, l-hydroxy-l-tryptophan, mulberrin, and ellagic acid. These compounds enhanced cell viability, reduced acetylcholinesterase activity and lipid peroxidation, suppressed <i>TNF-</i>α expression, and upregulated <i>BDNF</i> mRNA levels. Notably, mulberrin and ellagic acid showed superior efficacy in modulating oxidative stress, inflammation, and neurotrophic signaling. This study establishes a robust deep learning-driven framework for identifying multitarget natural therapeutics for ischemic stroke. The validated compounds, particularly mulberrin and ellagic acid, are promising for stroke treatment development. Our findings demonstrate the effectiveness of integrating computational prediction with experimental validation in accelerating drug discovery for complex neurological disorders.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143622866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Employing Automated Machine Learning (AutoML) Methods to Facilitate the In Silico ADMET Properties Prediction.
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-14 DOI: 10.1021/acs.jcim.4c02122
Herim Han, Bilal Shaker, Jin Hee Lee, Sunghwan Choi, Sanghee Yoon, Maninder Singh, Shaherin Basith, Minghua Cui, Sunil Ahn, Junyoung An, Soosung Kang, Min Sun Yeom, Sun Choi
{"title":"Employing Automated Machine Learning (AutoML) Methods to Facilitate the <i>In Silico</i> ADMET Properties Prediction.","authors":"Herim Han, Bilal Shaker, Jin Hee Lee, Sunghwan Choi, Sanghee Yoon, Maninder Singh, Shaherin Basith, Minghua Cui, Sunil Ahn, Junyoung An, Soosung Kang, Min Sun Yeom, Sun Choi","doi":"10.1021/acs.jcim.4c02122","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02122","url":null,"abstract":"<p><p>The rationale for using ADMET prediction tools in the early drug discovery paradigm is to guide the design of new compounds with favorable ADMET properties and ultimately minimize the attrition rates of drug failures. Artificial intelligence (AI) in <i>in silico</i> ADMET modeling has gained momentum due to its high-throughput and low-cost attributes. In this study, we developed a machine learning model capable of predicting 11 ADMET properties of chemical compounds. Each model was constructed by combining one of 40 classification algorithms including random forest (RF), extreme gradient boosting (XGB), support vector machine (SVM), and gradient boosting (GB) with one of three predefined hyperparameter configurations. This process can be efficiently performed using automated machine learning (AutoML) methods, which automatically search for the best combination of model algorithms and optimized hyperparameters. We developed optimal predictive models for 11 different ADMET properties using the Hyperopt-sklearn AutoML method. All of the developed models depicted an area under the ROC curve (AUC) >0.8. Furthermore, our developed models outperformed most of the ADMET properties and showed comparable performance in other properties when evaluated on external data sets and compared with published predictive models. Our results support the applicability of AutoML in ADMET prediction and will be helpful for ADMET prediction in early-stage drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143622859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BERT-AmPEP60: A BERT-Based Transfer Learning Approach to Predict the Minimum Inhibitory Concentrations of Antimicrobial Peptides for Escherichia coli and Staphylococcus aureus.
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-14 DOI: 10.1021/acs.jcim.4c01749
Jianxiu Cai, Jielu Yan, Chonwai Un, Yapeng Wang, François-Xavier Campbell-Valois, Shirley W I Siu
{"title":"BERT-AmPEP60: A BERT-Based Transfer Learning Approach to Predict the Minimum Inhibitory Concentrations of Antimicrobial Peptides for <i>Escherichia coli</i> and <i>Staphylococcus aureus</i>.","authors":"Jianxiu Cai, Jielu Yan, Chonwai Un, Yapeng Wang, François-Xavier Campbell-Valois, Shirley W I Siu","doi":"10.1021/acs.jcim.4c01749","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01749","url":null,"abstract":"<p><p>Antimicrobial peptides (AMPs) are a promising alternative for combating bacterial drug resistance. While current computer prediction models excel at binary classification of AMPs based on sequences, there is a lack of regression methods to accurately quantify AMP activity against specific bacteria, making the identification of highly potent AMPs a challenge. Here, we present a deep learning method, BERT-AmPEP60, based on the fine-tuned Bidirectional Encoder Representations from Transformers (BERT) architecture to extract embedding features from input sequences. Using the transfer learning strategy, we built regression models to predict the minimum inhibitory concentration (MIC) of peptides for <i>Escherichia coli</i> (EC) and <i>Staphylococcus aureus</i> (SA). In five independent experiments with 10% leave-out sequences as the test sets, the optimal EC and SA models outperformed the state-of-the-art regression method and traditional machine learning methods, achieving an average mean squared error of 0.2664 and 0.3032 (log μM), respectively. They also showed a Pearson correlation coefficient of 0.7955 and 0.7530, and a Kendall correlation coefficient of 0.5797 and 0.5222, respectively. Our models outperformed existing deep learning and machine learning methods that rely on conventional sequence features. This work underscores the effectiveness of utilizing BERT with transfer learning for training quantitative AMP prediction models specific for different bacterial species. The web server of BERT-AmPEP60 can be found at https://app.cbbio.online/ampep/home. To facilitate development, the program source codes are available at https://github.com/janecai0714/AMP_regression_EC_SA.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep-Learning Potential Molecular Dynamics Study on Nanopolycrystalline Al-Er Alloys: Effects of Er Concentration, Grain Boundary Segregation, and Grain Size on Plastic Deformation.
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-14 DOI: 10.1021/acs.jcim.5c00008
Zhen Chang, Li Feng, Hong-Tao Xue, Yan-Hong Yang, Jun-Qiang Ren, Fu-Ling Tang, Xue-Feng Lu, Jun-Chen Li
{"title":"Deep-Learning Potential Molecular Dynamics Study on Nanopolycrystalline Al-Er Alloys: Effects of Er Concentration, Grain Boundary Segregation, and Grain Size on Plastic Deformation.","authors":"Zhen Chang, Li Feng, Hong-Tao Xue, Yan-Hong Yang, Jun-Qiang Ren, Fu-Ling Tang, Xue-Feng Lu, Jun-Chen Li","doi":"10.1021/acs.jcim.5c00008","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00008","url":null,"abstract":"<p><p>Understanding the tensile mechanical properties of Al-Er alloys at the atomic scale is essential, and molecular dynamics (MD) simulations offer valuable insights. However, these simulations are constrained by the unavailability of suitable interatomic potentials. In this study, the deep potential (DP) approach, aided by high-throughput first-principles calculations, was utilized to develop an Al-Er interatomic potential specifically for MD simulations. Systematic comparisons between the physical properties (e.g., energy-volume curves, melting point, elastic constants) predicted by the DP model and those obtained from density functional theory (DFT) demonstrated that the developed DP model for Al-Er alloys possesses reliable predictive capabilities while retaining DFT-level accuracy. Our findings confirm that Al<sub>3</sub>Er, Al<sub>2</sub>Er, and AlEr<sub>2</sub> exhibit mechanical stability. The calculated melting point of Al<sub>3</sub>Er (1398 K) shows a 57 K deviation from the experimental value (1341 K). With the Er content increasing from 0.01% to 0.064 at.% in Al-Er alloys, the grain boundary (GB) concentration of Er atoms increases from 0.03 to 0.07% following Monte Carlo (MC) annealing optimization. The Al-0.05 at.%Er alloy exhibits the highest yield strength, with an increase of 0.128 GPa (6.1%) compared to pure Al. For Al-0.05 at.%Er alloys with varying average grain sizes, the GB concentration of Er atoms increases by about 1.4-1.6 times after MC annealing compared to the average Er content. Additionally, the Al-Er alloys reach the peak yield strength of 2.214 GPa when the average grain size is 11.72 nm. The GB segregation of Er atoms lowers the system energy and thus enhances stability. Notable changes in the segregation behavior of Er atoms were observed with increasing Er concentration and decreasing grain size. These results would facilitate the understanding of the mechanical characteristics of Al-Er alloys and offer a theoretical basis for developing advanced nanopolycrystalline Al-Er alloys.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Join Persistent Homology (JPH)-Based Machine Learning for Metalloprotein–Ligand Binding Affinity Prediction
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-13 DOI: 10.1021/acs.jcim.4c0230910.1021/acs.jcim.4c02309
Yaxing Wang, Xiang Liu, Yipeng Zhang, Xiangjun Wang and Kelin Xia*, 
{"title":"Join Persistent Homology (JPH)-Based Machine Learning for Metalloprotein–Ligand Binding Affinity Prediction","authors":"Yaxing Wang,&nbsp;Xiang Liu,&nbsp;Yipeng Zhang,&nbsp;Xiangjun Wang and Kelin Xia*,&nbsp;","doi":"10.1021/acs.jcim.4c0230910.1021/acs.jcim.4c02309","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02309https://doi.org/10.1021/acs.jcim.4c02309","url":null,"abstract":"<p >With the crucial role of metalloproteins in respiration, oxidative stress protection, photosynthesis, and drug metabolism, the design and discovery of drugs that can target metalloproteins are extremely important. Recently, enormous potential has been shown by topological data analysis (TDA) and TDA-based machine learning models in various steps of drug design and discovery. Here, we propose, for the first time, join persistent homology (JPH) and JPH-based machine learning models for metalloprotein–ligand binding affinity prediction. Mathematically, dramatically different from persistent homology and extended persistent homology, our JPH employs a set of filtration functions to generate a multistage filtration for the join of the original simplicial complex and a specially designed test simplicial complex. From the featurization perspective, our JPH-based molecular descriptors can provide a more comprehensive characterization of the intrinsic topological information of the data. Our JPH descriptors are combined with the gradient boosting tree (GBT) model for metalloprotein–ligand binding affinity prediction. The benchmark dataset for metalloprotein–ligand complexes from PDBbind-v2020 is employed for the validation and comparison of our model. It has been found that our JPH-GBT model can outperform all of the existing models, as far as we know. This demonstrates the great potential of our join persistent homology in the characterization of molecular structures and functions.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2785–2793 2785–2793"},"PeriodicalIF":5.6,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FDPSM: Feature-Driven Prediction Modeling of Pathogenic Synonymous Mutations
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-13 DOI: 10.1021/acs.jcim.4c0213910.1021/acs.jcim.4c02139
Fangfang Jin, Na Cheng, Lihua Wang, Bin Ye* and Junfeng Xia*, 
{"title":"FDPSM: Feature-Driven Prediction Modeling of Pathogenic Synonymous Mutations","authors":"Fangfang Jin,&nbsp;Na Cheng,&nbsp;Lihua Wang,&nbsp;Bin Ye* and Junfeng Xia*,&nbsp;","doi":"10.1021/acs.jcim.4c0213910.1021/acs.jcim.4c02139","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02139https://doi.org/10.1021/acs.jcim.4c02139","url":null,"abstract":"<p >Synonymous mutations, once considered to be biologically neutral, are now recognized to affect protein expression and function by altering the RNA splicing, stability, or translation efficiency. These effects can contribute to disease, making the prediction of the pathogenicity a crucial task. Computational methods have been developed to analyze the sequence features and biological functions of synonymous mutations, but existing methods face limitations, including scarcity of labeled data, reliance on other prediction tools, and insufficient representation of feature interrelationships. Here, we present FDPSM, a novel prediction method specifically designed to predict pathogenic synonymous mutations. FDPSM was trained on a robust data set of 4251 positive and negative training samples to enhance predictive accuracy. The method leveraged a comprehensive set of features, including genomic context, conservation, splicing effects, functional effects, and epigenomics, without relying on prediction scores from other mutation pathogenicity tools. Recognizing that original features alone may not fully capture the distinctions between pathogenic and benign synonymous mutations, we enhanced the feature set by extracting effective information from the interactions and distribution of these features. The experimental results showed that FDPSM significantly outperformed existing methods in predicting the pathogenicity of synonymous mutations, offering a more accurate and reliable tool for this important task. FDPSM is available at https://github.com/xialab-ahu/FDPSM.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"3064–3076 3064–3076"},"PeriodicalIF":5.6,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multidependency Graph Convolutional Networks and Contrastive Learning for Drug Repositioning
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-12 DOI: 10.1021/acs.jcim.4c0242410.1021/acs.jcim.4c02424
Yanglan Gan*, Shengnan Li, Guangwei Xu, Cairong Yan and Guobing Zou, 
{"title":"Multidependency Graph Convolutional Networks and Contrastive Learning for Drug Repositioning","authors":"Yanglan Gan*,&nbsp;Shengnan Li,&nbsp;Guangwei Xu,&nbsp;Cairong Yan and Guobing Zou,&nbsp;","doi":"10.1021/acs.jcim.4c0242410.1021/acs.jcim.4c02424","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02424https://doi.org/10.1021/acs.jcim.4c02424","url":null,"abstract":"<p >The goal of drug repositioning is to expedite the drug development process by finding novel therapeutic applications for approved drugs. Using multifeature learning, different computational drug repositioning techniques have recently been introduced to predict possible drug–disease relationships. Nevertheless, current graph-based methods tend to model drug–disease interaction relationships without considering the semantic influence of node-specific side information on graphs. These approaches also suffer from the noise and sparsity inherent in the data. To address these limitations, we propose MDGCN, a novel drug repositioning method that incorporates multidependency graph convolutional networks and contrastive learning. Based on drug and disease similarity matrices and the drug–disease relationships matrix, this approach constructs multidependency graphs. It subsequently employs graph convolutional networks to spread side information between various graphs in each layer. Meanwhile, the weak supervision of drug–disease connections is effectively addressed by introducing cross-view and cross-layer contrastive learning to align node embedding across various views. Extensive experiments show that MDGCN performs better in drug–disease association prediction than seven advanced methods, offering strong support for investigating novel therapeutic indications for medications of interest.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"3090–3103 3090–3103"},"PeriodicalIF":5.6,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fluor-Predictor: An Interpretable Tool for Multiproperty Prediction and Retrieval of Fluorescent Dyes
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-12 DOI: 10.1021/acs.jcim.5c0012710.1021/acs.jcim.5c00127
Wenxiang Song, Le Xiong, Xinmin Li, Yuyang Zhang, Binya Wang, Guixia Liu, Weihua Li, Youjun Yang* and Yun Tang*, 
{"title":"Fluor-Predictor: An Interpretable Tool for Multiproperty Prediction and Retrieval of Fluorescent Dyes","authors":"Wenxiang Song,&nbsp;Le Xiong,&nbsp;Xinmin Li,&nbsp;Yuyang Zhang,&nbsp;Binya Wang,&nbsp;Guixia Liu,&nbsp;Weihua Li,&nbsp;Youjun Yang* and Yun Tang*,&nbsp;","doi":"10.1021/acs.jcim.5c0012710.1021/acs.jcim.5c00127","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00127https://doi.org/10.1021/acs.jcim.5c00127","url":null,"abstract":"<p >With the rapid advancements in the field of fluorescent dyes, accurate prediction of optical properties and efficient retrieval of dye-related data are essential for effective dye design. However, there is a lack of tools for comprehensive data integration and convenient data retrieval. Moreover, existing prediction models mainly focus on a single property of fluorescent dyes and fail to account for the diverse fluorophores and solutions in a systematic manner. To address this, we proposed Fluor-predictor, a multitask prediction model for fluorophores. This study integrates multiple dye databases and develops an interpretable graph neural network-based multitask regression model to predict four key optical properties of fluorescent dyes. We thoroughly examined the impact of factors such as data quality and the number of solvents on model performance. By leveraging atomic weight contributions, the model not only predicts these properties but also provides insights to guide structural modifications. In addition, we compiled and built a comprehensive database containing 36,756 records of fluorescence properties. To address the limitations of existing models in accurate prediction of Xanthene and Cyanine dyes, we then compiled 1148 Xanthene dye records and 1496 Cyanine dye records from the literature, comparing direct training with transfer learning approaches. The model achieved mean absolute errors (MAE) of 11.70 nm, 15.37 nm, 0.096, and 0.091 for predicting absorption wavelength (λ<sub>abs</sub>), emission wavelength (λ<sub>em</sub>), quantum yield (Φ) and molar extinction coefficient (Log(ε)), respectively. We integrated this work into a tool, Fluor-predictor, which supports comprehensive retrieval methods and multiproperty prediction. Fluor-predictor will facilitate data retrieval, prescreening, and structural modification of dyes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 6","pages":"2854–2867 2854–2867"},"PeriodicalIF":5.6,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UMPPI: Unveiling Multilevel Protein-Peptide Interaction Prediction via Language Models.
IF 5.6 2区 化学
Journal of Chemical Information and Modeling Pub Date : 2025-03-12 DOI: 10.1021/acs.jcim.4c02365
Shuwen Xiong, Jiajie Cai, Hua Shi, Feifei Cui, Zilong Zhang, Leyi Wei
{"title":"UMPPI: Unveiling Multilevel Protein-Peptide Interaction Prediction via Language Models.","authors":"Shuwen Xiong, Jiajie Cai, Hua Shi, Feifei Cui, Zilong Zhang, Leyi Wei","doi":"10.1021/acs.jcim.4c02365","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02365","url":null,"abstract":"<p><p>Protein-peptide interactions are essential to cellular processes and disease mechanisms. Identifying protein-peptide binding residues is critical for understanding peptide function and advancing drug discovery. However, experimental methods are costly and time-intensive, while existing computational approaches often predict interactions or binding residues separately, lack effective feature integration, or rely heavily on limited high-quality structural data. To address these challenges, we propose UMPPI (Unveiling Multilevel Protein-Peptide Interaction), a multiobjective framework based on the pretrained protein language model ESM2. UMPPI simultaneously predicts binary protein-peptide interactions and binding residues on both peptides and proteins through a multiobjective optimization strategy. By integrating ESM2 to encode sequences and extract latent structural information, UMPPI bridges the gap between sequence-based and structure-based methods. Extensive experiments demonstrated that UMPPI successfully captured binary interactions between peptides and proteins and identified the binding residues on peptides and proteins. UMPPI can serve as a useful tool for protein-peptide interaction prediction and identification of critical binding residues, thereby facilitating the peptide drug discovery process.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143612820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信