Journal of Cheminformatics最新文献

筛选
英文 中文
kMoL: an open-source machine and federated learning library for drug discovery kMoL:用于药物发现的开源机器和联邦学习库
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-25 DOI: 10.1186/s13321-025-00967-9
Romeo Cozac, Haris Hasic, Jun Jin Choong, Vincent Richard, Loic Beheshti, Cyrille Froehlich, Takuto Koyama, Shigeyuki Matsumoto, Ryosuke Kojima, Hiroaki Iwata, Aki Hasegawa, Takao Otsuka, Yasushi Okuno
{"title":"kMoL: an open-source machine and federated learning library for drug discovery","authors":"Romeo Cozac,&nbsp;Haris Hasic,&nbsp;Jun Jin Choong,&nbsp;Vincent Richard,&nbsp;Loic Beheshti,&nbsp;Cyrille Froehlich,&nbsp;Takuto Koyama,&nbsp;Shigeyuki Matsumoto,&nbsp;Ryosuke Kojima,&nbsp;Hiroaki Iwata,&nbsp;Aki Hasegawa,&nbsp;Takao Otsuka,&nbsp;Yasushi Okuno","doi":"10.1186/s13321-025-00967-9","DOIUrl":"10.1186/s13321-025-00967-9","url":null,"abstract":"<div><p>Machine learning is quickly becoming integral to drug discovery pipelines, particularly quantitative structure-activity relationship (QSAR) and absorption, distribution, metabolism, and excretion (ADME) tasks. Graph Convolutional Network (GCN) models have proven especially promising due to their inherent ability to model molecular structures using graph-based representations. However, maximizing the potential of such models in practice is challenging, as companies prioritize data privacy and security over collaboration initiatives to improve model performance and robustness. kMoL is an open-source machine learning library with integrated federated learning capabilities developed to address such challenges. Its key features include state-of-the-art model architectures, Bayesian optimization, explainability, and federated learning mechanisms. It demonstrates extensive customization possibilities, advanced security features, straightforward implementation of user-specific models, and high adaptability to custom datasets without additional programming requirements. kMoL is evaluated through locally trained benchmark settings and distributed federated learning experiments using various datasets to assess the features and flexibility of the library, as well as the ability to facilitate fast and practical experimentation. Additionally, results of these experiments provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines. kMoL is available on GitHub at https://github.com/elix-tech/kmol.</p><p><b>Scientific contribution</b> The primary scientific contribution of this research project is the introduction and evaluation of kMoL, an open-source machine learning library with integrated federated learning capabilities. By demonstrating advanced customization and security capabilities without additional programming requirements, kMoL represents an accessible yet secure open-source platform for collaborative drug discovery projects. Additionally, the experiment results provide further insights into the performance trade-offs associated with federated learning strategies, presenting valuable guidance for deploying machine learning models in a privacy-preserving manner within drug discovery pipelines.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00967-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DrugDiff: small molecule diffusion model with flexible guidance towards molecular properties DrugDiff:对分子性质具有灵活指导的小分子扩散模型
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-25 DOI: 10.1186/s13321-025-00965-x
Marie Oestreich, Erinc Merdivan, Michael Lee, Joachim L. Schultze, Marie Piraud, Matthias Becker
{"title":"DrugDiff: small molecule diffusion model with flexible guidance towards molecular properties","authors":"Marie Oestreich,&nbsp;Erinc Merdivan,&nbsp;Michael Lee,&nbsp;Joachim L. Schultze,&nbsp;Marie Piraud,&nbsp;Matthias Becker","doi":"10.1186/s13321-025-00965-x","DOIUrl":"10.1186/s13321-025-00965-x","url":null,"abstract":"<p>With the cost/yield-ratio of drug development becoming increasingly unfavourable, recent work has explored machine learning to accelerate early stages of the development process. Given the current success of deep generative models across domains, we here investigated their application to the property-based proposal of new small molecules for drug development. Specifically, we trained a latent diffusion model—<i>DrugDiff</i>—paired with predictor guidance to generate novel compounds with a variety of desired molecular properties. The architecture was designed to be highly flexible and easily adaptable to future scenarios. Our experiments showed successful generation of unique, diverse and novel small molecules with targeted properties. The code is available at https://github.com/MarieOestreich/DrugDiff.</p><p> This work expands the use of generative modelling in the field of drug development from previously introduced models for proteins and RNA to the here presented application to small molecules. With small molecules making up the majority of drugs, but simultaneously being difficult to model due to their elaborate chemical rules, this work tackles a new level of difficulty in comparison to sequence-based molecule generation as is the case for proteins and RNA. Additionally, the demonstrated framework is highly flexible, allowing easy addition or removal of considered molecular properties without the need to retrain the model, making it highly adaptable to diverse research settings and it shows compelling performance for a wide variety of targeted molecular properties.\u0000</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00965-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive modeling of biodegradation pathways using transformer architectures 使用变压器结构的生物降解途径的预测建模
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-17 DOI: 10.1186/s13321-025-00969-7
Liam Brydon, Kunyang Zhang, Gillian Dobbie, Katerina Taškova, Jörg Simon Wicker
{"title":"Predictive modeling of biodegradation pathways using transformer architectures","authors":"Liam Brydon,&nbsp;Kunyang Zhang,&nbsp;Gillian Dobbie,&nbsp;Katerina Taškova,&nbsp;Jörg Simon Wicker","doi":"10.1186/s13321-025-00969-7","DOIUrl":"10.1186/s13321-025-00969-7","url":null,"abstract":"<div><p>In recent years, the integration of machine learning techniques into chemical reaction product prediction has opened new avenues for understanding and predicting the behaviour of chemical substances. The necessity for such predictive methods stems from the growing regulatory and social awareness of the environmental consequences associated with the persistence and accumulation of chemical residues. Traditional biodegradation prediction methods rely on expert knowledge to perform predictions. However, creating this expert knowledge is becoming increasingly prohibitive due to the complexity and diversity of newer datasets, leaving existing methods unable to perform predictions on these datasets. We formulate the product prediction problem as a sequence-to-sequence generation task and take inspiration from natural language processing and other reaction prediction tasks. In doing so, we reduce the need for the expensive manual creation of expert-based rules. </p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00969-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143431069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ROASMI: accelerating small molecule identification by repurposing retention data ROASMI:通过重新利用保留数据加速小分子鉴定
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-14 DOI: 10.1186/s13321-025-00968-8
Fang-Yuan Sun, Ying-Hao Yin, Hui-Jun Liu, Lu-Na Shen, Xiu-Lin Kang, Gui-Zhong Xin, Li-Fang Liu, Jia-Yi Zheng
{"title":"ROASMI: accelerating small molecule identification by repurposing retention data","authors":"Fang-Yuan Sun,&nbsp;Ying-Hao Yin,&nbsp;Hui-Jun Liu,&nbsp;Lu-Na Shen,&nbsp;Xiu-Lin Kang,&nbsp;Gui-Zhong Xin,&nbsp;Li-Fang Liu,&nbsp;Jia-Yi Zheng","doi":"10.1186/s13321-025-00968-8","DOIUrl":"10.1186/s13321-025-00968-8","url":null,"abstract":"<div><p>The limited replicability of retention data hinders its application in untargeted metabolomics for small molecule identification. While retention order models hold promise in addressing this issue, their predictive reliability is limited by uncertain generalizability. Here, we present the ROASMI model, which enables reliable prediction of retention order within a well-defined application domain by coupling data-driven molecular representation and mechanistic insights. The generalizability of ROASMI is proven by 71 independent reversed-phase liquid chromatography (RPLC) datasets. The application of ROASMI to four real-world datasets demonstrates its advantages in distinguishing coexisting isomers with similar fragmentation patterns and in annotating detection peaks without informative spectra. ROASMI is flexible enough to be retrained with user-defined reference sets and is compatible with other MS/MS scorers, making further improvements in small-molecule identification. </p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00968-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143417976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FluoBase: a fluorinated agents database FluoBase:含氟试剂数据库
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-11 DOI: 10.1186/s13321-025-00949-x
Rafal Mulka, Dan Su, Wen-Shuo Huang, Li Zhang, Huaihai Huang, Xiaoyu Lai, Yao Li, Xiao-Song Xue
{"title":"FluoBase: a fluorinated agents database","authors":"Rafal Mulka,&nbsp;Dan Su,&nbsp;Wen-Shuo Huang,&nbsp;Li Zhang,&nbsp;Huaihai Huang,&nbsp;Xiaoyu Lai,&nbsp;Yao Li,&nbsp;Xiao-Song Xue","doi":"10.1186/s13321-025-00949-x","DOIUrl":"10.1186/s13321-025-00949-x","url":null,"abstract":"<p>Organofluorine compounds, owing to their unique physicochemical properties, play an increasingly crucial role in fields such as medicine, pesticides, and advanced materials. Fluorinated reagents are indispensable for developing efficient synthetic methods for organofluorine compounds and serve as the cornerstone of organofluorine chemistry. Equally important are fluorinated functional molecules, which contribute specific properties necessary for applications in pharmaceuticals, agrochemicals, and materials science. However, information about these agents' structure, properties, and functions is scattered throughout vast literature, making it inconvenient for synthetic chemists to access and utilize them effectively. Recognizing the need for a dedicated and organized resource, we present FluoBase—a comprehensive fluorinated agents database designed to streamline access to key information about fluorinated agents. FluoBase aims to become the premier resource for information related to fluorine chemistry, serving the scientific community and anyone interested in the applications of fluorine chemistry and machine learning for property predictions. FluoBase is freely available at https://fluobase.siochemdb.com.</p><p><b>Scientific contribution</b></p><p>FluoBase is a database designed to provide comprehensive information on the structures, properties, and functions of fluorinated agents and functional molecules. FluoBase aims to become the premier resource for fluorine chemistry, serving the scientific community and anyone interested in the applications of fluorine chemistry and machine learning for property predictions.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00949-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positional embeddings and zero-shot learning using BERT for molecular-property prediction 利用 BERT 进行位置嵌入和零点学习以预测分子特性
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-05 DOI: 10.1186/s13321-025-00959-9
Medard Edmund Mswahili, JunHa Hwang, Jagath C. Rajapakse, Kyuri Jo, Young-Seob Jeong
{"title":"Positional embeddings and zero-shot learning using BERT for molecular-property prediction","authors":"Medard Edmund Mswahili,&nbsp;JunHa Hwang,&nbsp;Jagath C. Rajapakse,&nbsp;Kyuri Jo,&nbsp;Young-Seob Jeong","doi":"10.1186/s13321-025-00959-9","DOIUrl":"10.1186/s13321-025-00959-9","url":null,"abstract":"&lt;div&gt;&lt;p&gt;Recently, advancements in cheminformatics such as representation learning for chemical structures, deep learning (DL) for property prediction, data-driven discovery, and optimization of chemical data handling, have led to increased demands for handling chemical simplified molecular input line entry system (SMILES) data, particularly in text analysis tasks. These advancements have driven the need to optimize components like positional encoding and positional embeddings (PEs) in transformer model to better capture the sequential and contextual information embedded in molecular representations. SMILES data represent complex relationships among atoms or elements, rendering them critical for various learning tasks within the field of cheminformatics. This study addresses the critical challenge of encoding complex relationships among atoms in SMILES strings to explore various PEs within the transformer-based framework to increase the accuracy and generalization of molecular property predictions. The success of transformer-based models, such as the bidirectional encoder representations from transformer (BERT) models, in natural language processing tasks has sparked growing interest from the domain of cheminformatics. However, the performance of these models during pretraining and fine-tuning is significantly influenced by positional information such as PEs, which help in understanding the intricate relationships within sequences. Integrating position information within transformer architectures has emerged as a promising approach. This encoding mechanism provides essential supervision for modeling dependencies among elements situated at different positions within a given sequence. In this study, we first conduct pretraining experiments using various PEs to explore diverse methodologies for incorporating positional information into the BERT model for chemical text analysis using SMILES strings. Next, for each PE, we fine-tune the best-performing BERT (masked language modeling) model on downstream tasks for molecular-property prediction. Here, we use two molecular representations, SMILES and DeepSMILES, to comprehensively assess the potential and limitations of the PEs in zero-shot learning analysis, demonstrating the model’s proficiency in predicting properties of unseen molecular representations in the context of newly proposed and existing datasets.&lt;/p&gt;&lt;p&gt;&lt;b&gt;Scientific contribution&lt;/b&gt;&lt;/p&gt;&lt;p&gt;This study explores the unexplored potential of PEs using BERT model for molecular property prediction. The study involved pretraining and fine-tuning the BERT model on various datasets related to COVID-19, bioassay data, and other molecular and biological properties using SMILES and DeepSMILES representations. The study details the pretraining architecture, fine-tuning datasets, and the performance of the BERT model with different PEs. It also explores zero-shot learning analysis and the model’s performance on various classification and regression tasks. I","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00959-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143125145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Barlow Twins deep neural network for advanced 1D drug–target interaction prediction Barlow Twins深度神经网络用于一维药物-靶标相互作用预测
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-05 DOI: 10.1186/s13321-025-00952-2
Maximilian G. Schuh, Davide Boldini, Annkathrin I. Bohne, Stephan A. Sieber
{"title":"Barlow Twins deep neural network for advanced 1D drug–target interaction prediction","authors":"Maximilian G. Schuh,&nbsp;Davide Boldini,&nbsp;Annkathrin I. Bohne,&nbsp;Stephan A. Sieber","doi":"10.1186/s13321-025-00952-2","DOIUrl":"10.1186/s13321-025-00952-2","url":null,"abstract":"<p>Accurate prediction of drug–target interactions is critical for advancing drug discovery. By reducing time and cost, machine learning and deep learning can accelerate this laborious discovery process. In a novel approach, BarlowDTI, we utilise the powerful Barlow Twins architecture for feature-extraction while considering the structure of the target protein. Our method achieves state-of-the-art predictive performance against multiple established benchmarks using only one-dimensional input. The use of our hybrid approach of deep learning and gradient boosting machine as the underlying predictor ensures fast and efficient predictions without the need for substantial computational resources. We also propose the use of an influence method to investigate how the model reaches its decision based on individual training samples. By comparing co-crystal structures, we find that BarlowDTI effectively exploits catalytically active and stabilising residues, highlighting the model’s ability to generalise from one-dimensional input data. In addition, we further benchmark new baselines against existing methods. Together, these innovations improve the efficiency and effectiveness of drug–target interactions predictions, providing robust tools for accelerating drug development and deepening the understanding of molecular interactions. Therefore, we provide an easy-to-use web interface that can be freely accessed at https://www.bio.nat.tum.de/oc2/barlowdti.</p><p>Our computationally efficient and effective hybrid approach, combining the deep learning model Barlow Twins and gradient boosting machines, outperforms state-of-the-art methods across multiple splits and benchmarks using only one-dimensional input. Furthermore, we advance the field by proposing an influence method that elucidates model decision-making, thereby providing deeper insights into molecular interactions and improving the interpretability of drug-target interactions predictions.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00952-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143125144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving drug repositioning with negative data labeling using large language models 使用大型语言模型改进负数据标记的药物重新定位
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-04 DOI: 10.1186/s13321-025-00962-0
Milan Picard, Mickael Leclercq, Antoine Bodein, Marie Pier Scott-Boyer, Olivier Perin, Arnaud Droit
{"title":"Improving drug repositioning with negative data labeling using large language models","authors":"Milan Picard,&nbsp;Mickael Leclercq,&nbsp;Antoine Bodein,&nbsp;Marie Pier Scott-Boyer,&nbsp;Olivier Perin,&nbsp;Arnaud Droit","doi":"10.1186/s13321-025-00962-0","DOIUrl":"10.1186/s13321-025-00962-0","url":null,"abstract":"<div><h3>Introduction</h3><p>Drug repositioning offers numerous advantages, such as faster development timelines, reduced costs, and lower failure rates in drug development. Supervised machine learning is commonly used to score drug candidates but is hindered by the lack of reliable negative data—drugs that fail due to inefficacy or toxicity— which is difficult to access, lowering their prediction accuracy and generalization. Positive-Unlabeled (PU) learning has been used to overcome this issue by either randomly sampling unlabeled drugs or identifying probable negatives but still suffers from misclassification or oversimplified decision boundaries.</p><h3>Results</h3><p>We proposed a novel strategy using Large Language Models (GPT-4) to analyze all clinical trials on prostate cancer and systematically identify true negatives. This approach showed remarkable improvement in predictive accuracy on independent test sets with a Matthews Correlation Coefficient of 0.76 (± 0.33) compared to 0.55 (± 0.15) and 0.48 (± 0.18) for two commonly used PU learning approaches. Using our labeling strategy, we created a training set of 26 positive and 54 experimentally validated negative drugs. We then applied a machine learning ensemble to this new dataset to assess the repurposing potential of the remaining 11,043 drugs in the DrugBank database. This analysis identified 980 potential candidates for prostate cancer. A detailed review of the top 30 revealed 9 promising drugs targeting various mechanisms such as genomic instability, p53 regulation, or TMPRSS2-ERG fusion.</p><h3>Conclusion</h3><p>By expanding our negative data labeling approach to all diseases within the ClinicalTrials.gov database, our method could greatly advance supervised drug repositioning, offering a more accurate and data-driven path for discovering new treatments.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00962-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143083430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports PretoxTM:从临床前毒理学报告中提取治疗相关发现的文本挖掘系统
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-02-03 DOI: 10.1186/s13321-024-00925-x
Javier Corvi, Nicolás Díaz-Roussel, José M. Fernández, Francesco Ronzano, Emilio Centeno, Pablo Accuosto, Celine Ibrahim, Shoji Asakura, Frank Bringezu, Mirjam Fröhlicher, Annika Kreuchwig, Yoko Nogami, Jeong Rih, Raul Rodriguez-Esteban, Nicolas Sajot, Joerg Wichard, Heng-Yi Michael Wu, Philip Drew, Thomas Steger-Hartmann, Alfonso Valencia, Laura I. Furlong, Salvador Capella-Gutierrez
{"title":"PretoxTM: a text mining system for extracting treatment-related findings from preclinical toxicology reports","authors":"Javier Corvi,&nbsp;Nicolás Díaz-Roussel,&nbsp;José M. Fernández,&nbsp;Francesco Ronzano,&nbsp;Emilio Centeno,&nbsp;Pablo Accuosto,&nbsp;Celine Ibrahim,&nbsp;Shoji Asakura,&nbsp;Frank Bringezu,&nbsp;Mirjam Fröhlicher,&nbsp;Annika Kreuchwig,&nbsp;Yoko Nogami,&nbsp;Jeong Rih,&nbsp;Raul Rodriguez-Esteban,&nbsp;Nicolas Sajot,&nbsp;Joerg Wichard,&nbsp;Heng-Yi Michael Wu,&nbsp;Philip Drew,&nbsp;Thomas Steger-Hartmann,&nbsp;Alfonso Valencia,&nbsp;Laura I. Furlong,&nbsp;Salvador Capella-Gutierrez","doi":"10.1186/s13321-024-00925-x","DOIUrl":"10.1186/s13321-024-00925-x","url":null,"abstract":"<div><p>Over the last few decades the pharmaceutical industry has generated a vast corpus of knowledge on the safety and efficacy of drugs. Much of this information is contained in toxicology reports, which summarise the results of animal studies designed to analyse the effects of the tested compound, including unintended pharmacological and toxic effects, known as treatment-related findings. Despite the potential of this knowledge, the fact that most of this relevant information is only available as unstructured text with variable degrees of digitisation has hampered its systematic access, use and exploitation. Text mining technologies have the ability to automatically extract, analyse and aggregate such information, providing valuable new insights into the drug discovery and development process. In the context of the eTRANSAFE project, we present PretoxTM (Preclinical Toxicology Text Mining), the first system specifically designed to detect, extract, organise and visualise treatment-related findings from toxicology reports. The PretoxTM tool comprises three main components: PretoxTM Corpus, PretoxTM Pipeline and PretoxTM Web App. The PretoxTM Corpus is a gold standard corpus of preclinical treatment-related findings annotated by toxicology experts. This corpus was used to develop, train and validate the PretoxTM Pipeline, which extracts treatment-related findings from preclinical study reports. The extracted information is then presented for expert visualisation and validation in the PretoxTM Web App.</p><p><b>Scientific Contribution</b></p><p>While text mining solutions have been widely used in the clinical domain to identify adverse drug reactions from various sources, no similar systems exist for identifying adverse events in animal models during preclinical testing. PretoxTM fills this gap by efficiently extracting treatment-related findings from preclinical toxicology reports. This provides a valuable resource for toxicology research, enhancing the efficiency of safety evaluations, saving time, and leading to more effective decision-making in the drug development process.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00925-x","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143077582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions APBIO:通过推断生物活性特征和预测新的靶标相互作用来分析空气污染物的生物活性
IF 7.1 2区 化学
Journal of Cheminformatics Pub Date : 2025-01-31 DOI: 10.1186/s13321-025-00961-1
Eva Viesi, Ugo Perricone, Patrick Aloy, Rosalba Giugno
{"title":"APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions","authors":"Eva Viesi,&nbsp;Ugo Perricone,&nbsp;Patrick Aloy,&nbsp;Rosalba Giugno","doi":"10.1186/s13321-025-00961-1","DOIUrl":"10.1186/s13321-025-00961-1","url":null,"abstract":"<div><p>More sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated yet, and further exploration could shed light on the impact of air pollution on complex disorders. Therefore, a biological signature that simultaneously captures the chemistry and the biology of small molecules may be beneficial in predicting the behaviour of such ligands towards a protein target. Moreover, the interactivity between biological entities can be represented through combined feature vectors that can be given as input to a machine learning (ML) model to capture the underlying interaction. To this end, we propose a chemogenomic approach, called Air Pollutant Bioactivity (APBIO), which integrates compound bioactivity signatures and target sequence descriptors to train ML classifiers subsequently used to predict potential compound-target interactions (CTIs). We report the performances of the proposed methodology and, via external validation sets, demonstrate its outperformance compared to existing molecular representations in terms of model generalizability. We have also developed a publicly available Streamlit application for APBIO at ap-bio.streamlit.app, allowing users to predict associations between investigated compounds and protein targets.</p><p><b>Scientific contribution</b></p><p>We derived ex novo bioactivity signatures for air pollutant molecules to capture their biological behaviour and associations with protein targets. The proposed chemogenomic methodology enables the prediction of novel CTIs for known or similar compounds and targets through well-established and efficient ML models, deepening our insight into the molecular interactions and mechanisms that may have a deleterious impact on human biological systems.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00961-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143071586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信