{"title":"AI-driven prediction of drug activity against Toxoplasma gondii: Data augmentation and deep neural networks for limited datasets","authors":"Natalia V. Karimova , Ravithree D. Senanayake","doi":"10.1016/j.aichem.2025.100084","DOIUrl":"10.1016/j.aichem.2025.100084","url":null,"abstract":"<div><div>Toxoplasmosis, caused by <em>Toxoplasma gondii</em> (<em>T. gondii</em>), is a serious global health concern, particularly in immunocompromised individuals. Inhibiting the enzyme TgDHFR is a promising strategy for developing treatments. This Artificial Intelligence (AI)-driven Quantitative Structure-Activity Relationship (QSAR) study applies deep neural networks (DNNs) to predict pIC<sub>50</sub> values for potential inhibitors, using 2D and 3D molecular descriptors and fingerprints. To address training data limitations, we introduced a novel methodology combining targeted descriptor selection, Gaussian noise-based data augmentation, and an ensemble of DNNs. This approach significantly enhanced model performance, increasing the R² from 0.75 with the original dataset to 0.85. The model was further validated using two FDA-approved drugs for <em>T. gondii</em> treatment—pyrimethamine and trimethoprim—yielding relative errors of 3.35 % and 2.15 % in pIC<sub>50</sub> predictions compared to experimental values. Finally, the model was applied to screen FDA-approved drugs after filtering out molecules that did not align with the characteristics of the training dataset. The predicted pIC<sub>50</sub> values were further used to calculate ligand efficiency (LE), binding efficiency index (BEI), lipophilic ligand efficiency (LLE), and surface efficiency index (SEI), identifying the most promising TgDHFR inhibitors for further investigation. By leveraging AI and data augmentation approach, this study provides a powerful tool for pIC<sub>50</sub> predictions of TgDHFR inhibitors, which can be adapted to other systems.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100084"},"PeriodicalIF":0.0,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143350515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Chen , Junqing Li , Ziyi Liu, Shitao Sun, Shijia Zhou, Dongqi Wang
{"title":"Small-dataset-orientated data-driven screening for catalytic propane activation","authors":"Jiaqi Chen , Junqing Li , Ziyi Liu, Shitao Sun, Shijia Zhou, Dongqi Wang","doi":"10.1016/j.aichem.2024.100083","DOIUrl":"10.1016/j.aichem.2024.100083","url":null,"abstract":"<div><div>This work aims at the proper application of machine learning screening of direct propane dehydrogenation (PDH) reaction and oxidative dehydrogenation (ODH) of propane, which are two main protocols to convert propane to propylene and featured by limited available experimental data. Current studies mainly adopt trial-and-error strategy, which is time consuming and raises concerns on environment and health owing to the release of chemical waste. This motivates the introduction of data-driven research paradigm to alleviate the deficiency of the traditional trial-and-error strategy, which however relies on large quantity of high quality data. In this work, a dataset enveloping PDH and ODH data was constructed, and the performance of machine learning algorithms in the study of light alkane activation was evaluated, based on which a strategy appropriate for small dataset was proposed: for small unbalanced datasets, it is sensible to train the model by treating the dataset as a whole rather than to fuse multiple specific models based on divided smaller pieces of data. The results show that the trained models using ensemble algorithms exhibited the best predictability of propylene selectivity, i.e. CatBoost and random forest for PDH and LightGBM for ODH, respectively. Based on the optimal model, the key influencing factors in PDH and ODH were identified. This study demonstrates the proper use of data-driven strategy in the catalytic science, which can be adopted in other scientific problems that suffer from the limited available high quality data and contribute to the gain of novel understanding, e.g. the rational design and optimization of the catalytic systems.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100083"},"PeriodicalIF":0.0,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143100098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning for active sites prediction of quinoline derivatives","authors":"Jie Sun, Zi-Hao Li, Yi-Fei Yang, Shu-Yu Zhang","doi":"10.1016/j.aichem.2024.100082","DOIUrl":"10.1016/j.aichem.2024.100082","url":null,"abstract":"<div><div>Privileged structures, like quinoline, have diverse biological activities, and their synthetic versatility makes them crucial for drug design. In traditional synthesis methods, the C-H functionalization of quinoline can be effectively achieved using different conditions, especially transition metal catalysis. Machine learning (ML) techniques enable rapid prediction of C-H functionalization, facilitating drug design and synthesis. In this study, a generalizable approach to predict site selectivity is accomplished by using artificial neural network (ANN), which is suitable for the site prediction of derivatives of quinoline. In an 80/10/10 training/validation/testing split of 2467 compounds, the model takes SMILES strings as input format and uses six quantum chemical descriptors to identify reactive site(s) of the compound. On the external validation set, 86 .5% of all molecules were correctly predicted. This model allows chemists to rapidly predict which site is more likely to produce electrophilic substitution reaction.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 1","pages":"Article 100082"},"PeriodicalIF":0.0,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143100097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning approaches for modelling of molecular polarizability in gold nanoclusters","authors":"Abhishek Ojha , Satya S. Bulusu , Arup Banerjee","doi":"10.1016/j.aichem.2024.100080","DOIUrl":"10.1016/j.aichem.2024.100080","url":null,"abstract":"<div><div>The polarizability of molecules describes their response to an external electric field. It quantifies the ability of a system to form an induced dipole moment when subjected to an electric field. In this work, we investigated isotropic polarizability and anisotropy in the polarizability of gold nanoclusters using various machine-learning algorithms. We utilized high-order invariant descriptors based on spherical harmonics, integrated with machine-learning models like artificial neural network, Gaussian process regression, and kernel ridge regression. Our results demonstrate the efficacy of machine-learning in accurately predicting the polarizability of gold nanoclusters. We find that ANN-based model performs better than the other models.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100080"},"PeriodicalIF":0.0,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142653223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophia Li , Emma Wang , Leia Pei , Sourodeep Deb , Prashanth Prabhala , Sai Hruday Reddy Nara , Raina Panda , Shiven Eltepu , Marx Akl , Larry McMahan , Edward Njoo
{"title":"Evaluation of machine learning models for the accelerated prediction of density functional theory calculated 19F chemical shifts based on local atomic environments","authors":"Sophia Li , Emma Wang , Leia Pei , Sourodeep Deb , Prashanth Prabhala , Sai Hruday Reddy Nara , Raina Panda , Shiven Eltepu , Marx Akl , Larry McMahan , Edward Njoo","doi":"10.1016/j.aichem.2024.100078","DOIUrl":"10.1016/j.aichem.2024.100078","url":null,"abstract":"<div><div>The introduction of fluorine in compounds plays a crucial role in drug development as it greatly influences their final pharmacokinetic and dynamic properties. Due to the prevalence of fluorine in FDA-approved drugs in recent years, identifying the mechanisms driving their chemical transformations has become crucial in the drug discovery landscape. <sup>19</sup>F NMR spectroscopy is a powerful analytical technique that allows for the examination of fluorine-containing compounds, offering valuable information about their structure, dynamics, and reactivity. NMR spectra can be interpreted through the leveraging of Density Functional Theory (DFT). However, the screening of compounds and discovery of feasible drug candidates is limited due to its computational cost. Here, we present a machine learning approach to accelerate the prediction of DFT-calculated <sup>19</sup>F NMR chemical shifts. The fluorine atoms’ features in the models were derived from their local three-dimensional environments, representing their neighboring atoms within a radius of <em>n</em> Å away from the given fluorine atom in the compound. A comparative analysis of thirteen regression models was conducted using features extracted from 501 fluorinated compounds in our laboratory’s chemical inventory. Among the models, Gradient Boosting Regression (GBR) exhibited the highest performance, achieving a mean absolute error of 3.31 ppm with a local environment radius of 3 Å. This demonstrates a comparable accuracy to DFT calculations while reducing computational time from several hundred seconds to milliseconds. 3 Å was also found to be the most optimal radius across all models when encoding features for local atomic environments.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100078"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging graph neural networks to predict Hammett’s constants for benzoic acid derivatives","authors":"Vaneet Saini , Ranjeet Kumar","doi":"10.1016/j.aichem.2024.100079","DOIUrl":"10.1016/j.aichem.2024.100079","url":null,"abstract":"<div><div>The Hammett constants, σ<sub>m</sub> and σ<sub>p</sub>, reflect the electron-withdrawing and electron-donating abilities of substituents on aromatic compounds, and have been successfully used in various structure-activity relationship studies. However, determining these constants experimentally is both resource-intensive and time-consuming approach. In this study, we explore the use of graph neural networks (GNNs) to predict Hammett constant parameters using graph-based features. This innovative approach aims to provide rapid and efficient predictions of σ<sub>m</sub> and σ<sub>p</sub> values, eliminating the need for extensive computational and experimental setups. By leveraging the power of GNNs, we hope to streamline the process of obtaining these critical parameters, thereby facilitating more efficient reaction design and enhancing the applicability of linear free energy relationship studies in chemical research.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100079"},"PeriodicalIF":0.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenneth López-Pérez , Juan F. Avellaneda-Tamayo , Lexin Chen , Edgar López-López , K. Eurídice Juárez-Mercado , José L. Medina-Franco , Ramón Alain Miranda-Quintana
{"title":"Molecular similarity: Theory, applications, and perspectives","authors":"Kenneth López-Pérez , Juan F. Avellaneda-Tamayo , Lexin Chen , Edgar López-López , K. Eurídice Juárez-Mercado , José L. Medina-Franco , Ramón Alain Miranda-Quintana","doi":"10.1016/j.aichem.2024.100077","DOIUrl":"10.1016/j.aichem.2024.100077","url":null,"abstract":"<div><p>Molecular similarity pervades much of our understanding and rationalization of chemistry. This has become particularly evident in the current data-intensive era of chemical research, with similarity measures serving as the backbone of many Machine Learning (ML) supervised and unsupervised procedures. Here, we present a discussion on the role of molecular similarity in drug design, chemical space exploration, chemical “art” generation, molecular representations, and many more. We also discuss more recent topics in molecular similarity, like the ability to efficiently compare large molecular libraries.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100077"},"PeriodicalIF":0.0,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747724000356/pdfft?md5=7238a1972b367d1732b52f425b046ba9&pid=1-s2.0-S2949747724000356-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-language models: The game-changers for materials science research","authors":"Songlin Yu , Nian Ran , Jianjun Liu","doi":"10.1016/j.aichem.2024.100076","DOIUrl":"10.1016/j.aichem.2024.100076","url":null,"abstract":"<div><p>Large Language Models (LLMs), such as GPT-4, are precipitating a new \"industrial revolution\" by significantly enhancing productivity across various domains. These models encode an extensive corpus of scientific knowledge from vast textual datasets, functioning as near-universal generalists with the ability to engage in natural language communication and exhibit advanced reasoning capabilities. Notably, agents derived from LLMs can comprehend user intent and autonomously design, plan, and utilize tools to execute intricate tasks. These attributes are particularly advantageous for materials science research, an interdisciplinary field characterized by numerous complex and time-intensive activities. The integration of LLMs into materials science research holds the potential to fundamentally transform the research paradigm in this field.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100076"},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747724000344/pdfft?md5=e80906f3aecc3736b5e0dcac5da9017c&pid=1-s2.0-S2949747724000344-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142095479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhijiang Yang , Youjun Xu , Li Pan , Tengxin Huang , Yunfan Wang , Junjie Ding , Liangliang Wang , Junhua Xiao
{"title":"Conf-GEM: A geometric information-assisted direct conformation generation model","authors":"Zhijiang Yang , Youjun Xu , Li Pan , Tengxin Huang , Yunfan Wang , Junjie Ding , Liangliang Wang , Junhua Xiao","doi":"10.1016/j.aichem.2024.100074","DOIUrl":"10.1016/j.aichem.2024.100074","url":null,"abstract":"<div><p>Molecular conformations generation (MCG) aims to efficiently obtain reasonable and stable three-dimensional (3D) atomic coordinates of the atoms in the molecule from scratch, providing a structural foundation for molecular representation learning models and advanced downstream molecular design tasks such as molecular property prediction, molecular generation, and molecular docking. Existing MCG methods mostly rely on indirect distance-based strategies, which which can result in geometrically unrealistic conformations, or direct coordinate-based methods, which have larger search spaces and are prone to overfitting. Therefore, this study introduces Conf-GEM, a novel geometric information-assisted direct conformation generation model based on E-GeoGNN, a geometrically augmented 3D graph neural network with multiple scales. Pre-training and divide-and-conquer strategies, are integrated into the proposed model. Conf-GEM outperforms RDKit and nine deep-learning-based MCG models on the GEOM-QM9 and GEOM-Drugs datasets, achieving conformational coverage of 96.69% and 96.07%, respectively, without force field optimization. It also excels on the X-ray diffraction crystal structure dataset with up to 97.04% conformational coverage. In conclusion, Conf-GEM provides a novel solution for stabilizing 3D conformations generation. We provide an online prediction service (<span><span>https://confgem.cmdrg.com</span><svg><path></path></svg></span>) with a user-friendly interface for researchers.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100074"},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747724000320/pdfft?md5=48affbdd2252ef50c6eb12dedcdeacc7&pid=1-s2.0-S2949747724000320-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Top 20 influential AI-based technologies in chemistry","authors":"Valentine P. Ananikov","doi":"10.1016/j.aichem.2024.100075","DOIUrl":"10.1016/j.aichem.2024.100075","url":null,"abstract":"<div><p>The beginning and ripening of digital chemistry is analyzed focusing on the role of artificial intelligence (AI) in an expected leap in chemical sciences to bring this area to the next evolutionary level. The analytic description selects and highlights the top 20 AI-based technologies and 7 broader themes that are reshaping the field. It underscores the integration of digital tools such as machine learning, big data, digital twins, the Internet of Things (IoT), robotic platforms, smart control of chemical processes, virtual reality and blockchain, among many others, in enhancing research methods, educational approaches, and industrial practices in chemistry. The significance of this study lies in its focused overview of how these digital innovations foster a more efficient, sustainable, and innovative future in chemical sciences. This article not only illustrates the transformative impact of these technologies but also draws new pathways in chemistry, offering a broad appeal to researchers, educators, and industry professionals to embrace these advancements for addressing contemporary challenges in the field.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 2","pages":"Article 100075"},"PeriodicalIF":0.0,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747724000332/pdfft?md5=a101cdd9b75aa2e13939289fee50e2d5&pid=1-s2.0-S2949747724000332-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}