{"title":"The published role of artificial intelligence in drug discovery and development: a bibliometric and social network analysis from 1990 to 2023","authors":"Murat Koçak, Zafer Akçalı","doi":"10.1186/s13321-025-00988-4","DOIUrl":"10.1186/s13321-025-00988-4","url":null,"abstract":"<div><p>Today, drug discovery and development is one of the fields where Artificial Intelligence (AI) is used extensively. Therefore, this study aims to systematically analyze the scientific literature on the application of AI in drug discovery and development to understand the evolution, trends, and key contributors within this rapidly growing field. By leveraging various bibliometric indicators and visualization techniques, we seek to explore the growth patterns, influential authors and institutions, collaboration networks, and emerging research trends within this domain. Bibliometric and network analysis methods (co-occurrence, co-authorship, and collaboration, etc.) were used to achieve this goal. Bibliometric visualization tools such as Bibliometrix R package software, VOSviewer, and Litmaps were used for comprehensive data analysis. Scientific publications on AI in drug discovery and development were retrieved from the Web of Science Core Collection (WoS CC) database covering 1990–2023. In addition to visualization programs, the InCites database was also used for analysis and visualization. A total of 4059 scientific publications written by 13,932 authors and published in 1071 journals were included in the analysis. The results reveal that the most prolific authors are Ekins (n = 67), Schneider (n = 52), Hou Tj (n = 43), and Cao Ds (n = 34), while the most active institutions are the “Chinese Academy of Science” and “University of California.” The leading scientific journals are “Journal of Chemical Information and Modelling,” “Briefings in Bioinformatics,” and “Journal of Cheminformatics.” The most frequently used author keywords include “protein folding,” “QSAR,” “gene expression data,” “coronavirus,” and “genome rearrangement.” The average number of citations per scientific publication is 28.62, indicating a high impact of research in this field. A significant increase in publications was observed after 2014, with a peak in 2022, followed by a slight decline. International collaboration accounts for 28.06% of the publications, with the USA and China leading in both productivity and influence. The study also identifies key funding organizations, such as the National Natural Science Foundation of China (NSFC) and the United States Department of Health & Human Services, which have significantly supported advancements in this field. In conclusion, this study highlights the transformative role of AI in drug discovery and development, showcasing its potential to accelerate innovation and improve efficiency. The findings provide valuable insights into the current state of research, emerging trends, and future directions, offering a roadmap for researchers, industry professionals, and policymakers to further explore and leverage AI technologies in this domain.</p><p><b>Scientific contribution</b>This study provides a comprehensive bibliometric analysis of 4,059 scientific publications (1990–2023) to map the evolution, trends, and key contrib","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00988-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143920516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehrsa Mardikoraem, Joelle N. Eaves, Theodore Belecciu, Nathaniel Pascual, Alexander Aljets, Bruno Hagenbuch, Erik M. Shapiro, Benjamin J. Orlando, Daniel R. Woldring
{"title":"Predicting inhibitors of OATP1B1 via heterogeneous OATP-ligand interaction graph neural network (HOLIgraph)","authors":"Mehrsa Mardikoraem, Joelle N. Eaves, Theodore Belecciu, Nathaniel Pascual, Alexander Aljets, Bruno Hagenbuch, Erik M. Shapiro, Benjamin J. Orlando, Daniel R. Woldring","doi":"10.1186/s13321-025-01020-5","DOIUrl":"10.1186/s13321-025-01020-5","url":null,"abstract":"<div><p>Organic anion transporting polypeptides (OATPs) are membrane transporters crucial for drug uptake and distribution in the human body. OATPs can mediate drug-drug interactions (DDIs) in which the interaction of one drug with an OATP impairs the uptake of another drug, resulting in potentially fatal pharmacological effects. Predicting OATP-mediated DDIs is challenging, due to limited information on OATP inhibition mechanisms and inconsistent experimental OATP inhibition data across different studies. This study introduces Heterogeneous OATP-Ligand Interaction Graph Neural Network (HOLIgraph), a novel computational model that integrates molecular modeling with a graph neural network to enhance the prediction of drug-induced OATP inhibition. By combining ligand (i.e., drug) molecular features with protein-ligand interaction data from rigorous docking simulations, HOLIgraph outperforms traditional DDI prediction models which rely solely on ligand molecular features. HOLIgraph achieved a median balanced accuracy of over 90 percent when predicting inhibitors for OATP1B1, significantly outperforming purely ligand-based models. Beyond improving inhibition prediction, the data used to train HOLIgraph can enable the characterization of protein residues involved in inhibitory drug-OATP interactions. We identified certain OATP1B1 residues that preferentially interact with inhibitors, including I46 and K49. We anticipate such interaction information will be valuable to future structural and mechanistic investigations of OATP1B1.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01020-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of 3D atom pair map in an attention model for enhanced drug virtual screening","authors":"Gina Ryu, Wankyu Kim","doi":"10.1186/s13321-025-01023-2","DOIUrl":"10.1186/s13321-025-01023-2","url":null,"abstract":"<p>This study demonstrates the utility of a novel molecular representation, 3D APM and a deep learning model based on it for virtual screening, suggesting that many other prediction models would also benefit from adopting APM. An open-source script to generate 3D APM is available at https://github.com/rimeless/APM</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01023-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143908751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of blood–brain barrier and Caco-2 permeability through the Enalos Cloud Platform: combining contrastive learning and atom-attention message passing neural networks","authors":"Nikoletta-Maria Koutroumpa, Andreas Tsoumanis, Haralambos Sarimveis, Iseult Lynch, Georgia Melagraki, Antreas Afantitis","doi":"10.1186/s13321-025-01007-2","DOIUrl":"10.1186/s13321-025-01007-2","url":null,"abstract":"<div><p>In this study, we introduce a novel approach for predicting two key drug properties, blood–brain barrier (BBB) permeability and human intestinal absorption via Caco-2 permeability. Our methodology centers around a specialized neural network, the atom transformer-based Message Passing Neural Network (MPNN), which we have combined with contrastive learning techniques to enhance the process of representing and embedding molecular structures for more accurate property prediction. These innovative models focus on predicting BBB and Caco-2 permeability -two critical factors in drug absorption and distribution- which fall under the broader scope of ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties. The models are readily accessible online through the Enalos Cloud Platform which offers a user-friendly, AI-powered, ready-to-use web service that significantly streamlines the drug design process, enabling users to easily predict and understand the behavior of potential drug compounds within the human body.</p><p><b>Scientific Contribution</b> Our study combines an atom-attention Message Passing Neural Network (AA-MPNN) with contrastive learning (CL), which significantly improves predictive accuracy. Our model leverages self-supervised learning to expand the chemical space used in training and self-attention mechanisms to focus on critical molecular features, enhancing both model accuracy and interpretability. Additionally, the ready-to-use web service based on our model democratizes access to predictive tools for the scientific and regulatory communities.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01007-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143904854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kirill E. Medvedev, R. Dustin Schaeffer, Nick V. Grishin
{"title":"Leveraging AI to explore structural contexts of post-translational modifications in drug binding","authors":"Kirill E. Medvedev, R. Dustin Schaeffer, Nick V. Grishin","doi":"10.1186/s13321-025-01019-y","DOIUrl":"10.1186/s13321-025-01019-y","url":null,"abstract":"<div><p>Post-translational modifications (PTMs) play a crucial role in allowing cells to expand the functionality of their proteins and adaptively regulate their signaling pathways. Defects in PTMs have been linked to numerous developmental disorders and human diseases, including cancer, diabetes, heart, neurodegenerative and metabolic diseases. PTMs are important targets in drug discovery, as they can significantly influence various aspects of drug interactions including binding affinity. The structural consequences of PTMs, such as phosphorylation-induced conformational changes or their effects on ligand binding affinity, have historically been challenging to study on a large scale, primarily due to reliance on experimental methods. Recent advancements in computational power and artificial intelligence, particularly in deep learning algorithms and protein structure prediction tools like AlphaFold3, have opened new possibilities for exploring the structural context of interactions between PTMs and drugs. These AI-driven methods enable accurate modeling of protein structures including prediction of PTM-modified regions and simulation of ligand-binding dynamics on a large scale. In this work, we identified small molecule binding-associated PTMs that can influence drug binding across all human proteins listed as small molecule targets in the DrugDomain database, which we developed recently. 6,131 identified PTMs were mapped to structural domains from Evolutionary Classification of Protein Domains (ECOD) database.</p><p><b>Scientific contribution</b>: Using recent AI-based approaches for protein structure prediction (AlphaFold3, RoseTTAFold All-Atom, Chai-1), we generated 14,178 models of PTM-modified human proteins with docked ligands. Our results demonstrate that these methods can predict PTM effects on small molecule binding, but precise evaluation of their accuracy requires a much larger benchmarking set. We also found that phosphorylation of NADPH-Cytochrome P450 Reductase, observed in cervical and lung cancer, causes significant structural disruption in the binding pocket, potentially impairing protein function. All data and generated models are available from DrugDomain database v1.1 (http://prodata.swmed.edu/DrugDomain/) and GitHub (https://github.com/kirmedvedev/DrugDomain). This resource is the first to our knowledge in offering structural context for small molecule binding-associated PTMs on a large scale.</p><h3>Graphical abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01019-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143904752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the accuracy of prediction models for small datasets of Cytochrome P450 inhibition with deep learning","authors":"Elpri Eka Permadi, Reiko Watanabe, Kenji Mizuguchi","doi":"10.1186/s13321-025-01015-2","DOIUrl":"10.1186/s13321-025-01015-2","url":null,"abstract":"<div><p>The cytochrome P450 (CYP) superfamily metabolises a wide range of compounds; however, drug-induced CYP inhibition can lead to adverse interactions. Identifying potential CYP inhibitors is crucial for safe drug administration. This study investigated the application of deep learning techniques to the prediction of CYP inhibition, focusing on the challenges posed by limited datasets for CYP2B6 and CYP2C8 isoforms. To tackle these limitations, we leveraged larger datasets for related CYP isoforms, compiling comprehensive data from public databases containing IC50 values for 12,369 compounds that target seven CYP isoforms. We constructed single-task, fine-tuning, multitask, and multitask models incorporating data imputation on the missing values. Notably, the multitask models with data imputation demonstrated significant improvement in CYP inhibition prediction over the single-task models. Using the most accurate prediction models, we evaluated the inhibitory activity of approved drugs against CYP2B6 and CYP2C8. Among the 1,808 approved drugs analysed, our multitask models with data imputation identified 161 and 154 potential inhibitors of CYP2B6 and CYP2C8, respectively. This study underscores the significant potential of multitask deep learning, particularly when utilising a graph convolutional network with data imputation, to enhance the accuracy of CYP inhibition predictions under the conditions of limited data availability.</p><p><b>Scientific contribution</b></p><p>This study demonstrates that even with small datasets, accurate prediction models can be constructed by utilising related data effectively. Also, our imputation techniques on the missing values improved the prediction accuracy of CYP2B6 and CYP2C8 inhibition significantly.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01015-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantin Ushenin, Kuzma Khrabrov, Artem Tsypin, Anton Ber, Egor Rumiantsev, Artur Kadurin
{"title":"LAGNet: better electron density prediction for LCAO-based data and drug-like substances","authors":"Konstantin Ushenin, Kuzma Khrabrov, Artem Tsypin, Anton Ber, Egor Rumiantsev, Artur Kadurin","doi":"10.1186/s13321-025-01010-7","DOIUrl":"10.1186/s13321-025-01010-7","url":null,"abstract":"<div><p>The electron density is an important object in quantum chemistry that is crucial for many downstream tasks in drug design. Recent deep learning approaches predict the electron density around a molecule from atom types and atom positions. Most of these methods use the plane wave (PW) numerical method as a source of ground-truth training data. However, the drug design field mostly uses the Linear Combination of Atomic Orbitals (LCAO) for computation of quantum properties. In this study, we focus on prediction of the electron density for drug-like substances and training neural networks with LCAO-based datasets. Our experiments show that proper handling of large amplitudes of core orbitals is crucial for training on LCAO-based data. We propose to store the electron density with the standard grids instead of the uniform grid. This allowed us to reduce the number of probing points per molecule by 43 times and reduce storage space requirements by 8 times. Finally, we propose a novel architecture based on the DeepDFT model that we name LAGNet. It is specifically designed and tuned for drug-like substances and <span>(nabla ^2)</span>DFT dataset.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01010-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vincenzo Palmacci, Yasmine Nahal, Matthias Welsch, Ola Engkvist, Samuel Kaski, Johannes Kirchmair
{"title":"E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays","authors":"Vincenzo Palmacci, Yasmine Nahal, Matthias Welsch, Ola Engkvist, Samuel Kaski, Johannes Kirchmair","doi":"10.1186/s13321-025-01014-3","DOIUrl":"10.1186/s13321-025-01014-3","url":null,"abstract":"<p>Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery.</p><p>We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling.</p>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01014-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radek Halfar, Jiří Damborský, Sérgio M. Marques, Jan Martinovič
{"title":"Moldina: a fast and accurate search algorithm for simultaneous docking of multiple ligands","authors":"Radek Halfar, Jiří Damborský, Sérgio M. Marques, Jan Martinovič","doi":"10.1186/s13321-025-01005-4","DOIUrl":"10.1186/s13321-025-01005-4","url":null,"abstract":"<div><p>Protein-ligand docking is a computational method routinely used in many structural biology applications. It usually involves one receptor and one ligand. The docking of multiple ligands, however, can be important in several situations, such as the study of synergistic effects, substrate and product inhibition, or competitive binding. This can be a challenging and computationally demanding process. By integrating Particle Swarm Optimization into the established AutoDock Vina framework, we provided a powerful tool capable of accelerating drug discovery, and computational enzymology. Here we present Moldina (Multiple-Ligand Molecular Docking over AutoDock Vina), a new algorithm built upon AutoDock Vina. Through comprehensive testing against AutoDock Vina, the algorithm exhibited comparable accuracy in predicting ligand binding conformations while significantly reducing the computational time up to several hundred times. Moldina and the benchmark data are freely available at https://opencode.it4i.eu/permed/moldina-multiple-ligand-molecular-docking-over-autodock-vina and https://github.com/It4innovations/moldina-multiple-ligand-molecular-docking-over-autodock-vina.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01005-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143880345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Szwarc, Adriano Rutz, Kyungha Lee, Yassine Mejri, Olivier Bonnet, Hazrina Hazni, Adrien Jagora, Rany B. Mbeng Obame, Jin Kyoung Noh, Elvis Otogo N’Nang, Stephenie C. Alaribe, Khalijah Awang, Guillaume Bernadat, Young Hae Choi, Vincent Courdavault, Michel Frederich, Thomas Gaslonde, Florian Huber, Toh-Seok Kam, Yun Yee Low, Erwan Poupon, Justin J. J. van der Hooft, Kyo Bin Kang, Pierre Le Pogam, Mehdi A. Beniddir
{"title":"Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids","authors":"Sarah Szwarc, Adriano Rutz, Kyungha Lee, Yassine Mejri, Olivier Bonnet, Hazrina Hazni, Adrien Jagora, Rany B. Mbeng Obame, Jin Kyoung Noh, Elvis Otogo N’Nang, Stephenie C. Alaribe, Khalijah Awang, Guillaume Bernadat, Young Hae Choi, Vincent Courdavault, Michel Frederich, Thomas Gaslonde, Florian Huber, Toh-Seok Kam, Yun Yee Low, Erwan Poupon, Justin J. J. van der Hooft, Kyo Bin Kang, Pierre Le Pogam, Mehdi A. Beniddir","doi":"10.1186/s13321-025-01009-0","DOIUrl":"10.1186/s13321-025-01009-0","url":null,"abstract":"<div><p>With over 3000 representatives, the monoterpene indole alkaloids (MIAs) class is among the most diverse families of plant natural products. The MS/MS spectral space exploration of these complex compounds using chemoinformatic and computational mass spectrometry tools offers a valuable opportunity to extract and share chemical insights from this emblematic family of natural products (NPs). In this work, we first present a substantially updated version of the MIADB, a database now containing 422 MS/MS spectra of MIAs that has been uploaded to the GNPS library versus 172 initial entries. We then introduce an innovative workflow that leverages hundreds of fragmentation spectra to support the FAIRification, extraction and dissemination of chemical knowledge. This workflow aims at the extraction of spectral patterns matching finely defined MIA skeletons. These extracted signatures can then be queried against complex biological extract datasets using MassQL. By applying this strategy to an LC-MS/MS dataset of 75 plant extracts, our results demonstrated the efficiency of this approach in identifying the diversity of MIA skeletons present in the analyzed samples. Additionally, our work enabled the digitization of structural data for diverse MIA skeletons by converting them into machine-readable formats and thereby enhancing their dissemination for the scientific community.</p><p><b>Scientific contribution</b> A comprehensive investigation of the monoterpene indole alkaloid chemical space, aiming to highlight skeleton-dependent fragmentation similarity trends and to generate valuable spectrometric signatures that could be used as queries.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01009-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}