{"title":"HiMolformer: Integrating graph and sequence representations for predicting liver microsome stability with SMILES","authors":"Seokwoo Yun , Gibeom Nam , Jahwan Koo","doi":"10.1016/j.compbiolchem.2024.108263","DOIUrl":null,"url":null,"abstract":"<div><div>In the initial stages of drug discovery or pre-clinical studies, understanding the metabolic stability of new molecules is crucial. Recently, research on pre-trained deep learning for molecular property prediction has been actively progressing, with various models being made open-source. However, most of these models rely on either 2D graph or 1D sequence for training, and the representation varies depending on the data format used. Consequently, combining multiple representations can broaden the scope of learning and may potentially be a manageable and most effective method to enhance performance.</div><div>Therefore, we propose a novel hybrid model for predicting metabolic stability, which integrates representations from both graph-based and sequence-based models pre-trained for molecular features. This approach utilizes the combined strengths of 2D topological and 1D sequential information of molecules. HiMol, a graph-based graph neural network (GNN) model, and Molformer, a sequence-based Transformer model, were selected for integration, thus we named it HiMolformer. HiMolformer demonstrated superior performance compared to other models. We also focus on regression task for prediction with a empirical dataset from Korea Chemical Bank (KCB), comprising 3,498 molecules with mouse liver microsome (MLM) and human liver microsome (HLM) data obtained from actual metabolic reaction experiments. To the best of our knowledge, it is the first attempt to develop MLM and HLM prediction models using regression with a single SMILES input. The source code of this model is available at <span><span>https://github.com/YUNSEOKWOO/HiMolformer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"113 ","pages":"Article 108263"},"PeriodicalIF":2.6000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124002512","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In the initial stages of drug discovery or pre-clinical studies, understanding the metabolic stability of new molecules is crucial. Recently, research on pre-trained deep learning for molecular property prediction has been actively progressing, with various models being made open-source. However, most of these models rely on either 2D graph or 1D sequence for training, and the representation varies depending on the data format used. Consequently, combining multiple representations can broaden the scope of learning and may potentially be a manageable and most effective method to enhance performance.
Therefore, we propose a novel hybrid model for predicting metabolic stability, which integrates representations from both graph-based and sequence-based models pre-trained for molecular features. This approach utilizes the combined strengths of 2D topological and 1D sequential information of molecules. HiMol, a graph-based graph neural network (GNN) model, and Molformer, a sequence-based Transformer model, were selected for integration, thus we named it HiMolformer. HiMolformer demonstrated superior performance compared to other models. We also focus on regression task for prediction with a empirical dataset from Korea Chemical Bank (KCB), comprising 3,498 molecules with mouse liver microsome (MLM) and human liver microsome (HLM) data obtained from actual metabolic reaction experiments. To the best of our knowledge, it is the first attempt to develop MLM and HLM prediction models using regression with a single SMILES input. The source code of this model is available at https://github.com/YUNSEOKWOO/HiMolformer.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.