{"title":"Physics-informed neural networks and beyond: enforcing physical constraints in quantum dissipative dynamics†","authors":"Arif Ullah, Yu Huang, Ming Yang and Pavlo O. Dral","doi":"10.1039/D4DD00153B","DOIUrl":"10.1039/D4DD00153B","url":null,"abstract":"<p >Neural networks (NNs) accelerate simulations of quantum dissipative dynamics. Ensuring that these simulations adhere to fundamental physical laws is crucial, but has been largely ignored in the state-of-the-art NN approaches. We show that this may lead to implausible results measured by violation of the trace conservation. To recover the correct physical behavior, we develop physics-informed NNs (PINNs) that mitigate the violations to a good extent. Beyond that, we propose a novel uncertainty-aware approach that enforces perfect trace conservation by design, surpassing PINNs.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2052-2060"},"PeriodicalIF":6.2,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00153b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baochen Li, Yuru Liu, Haibin Sun, Rentao Zhang, Yongli Xie, Klement Foo, Frankie S. Mak, Ruimao Zhang, Tianshu Yu, Sen Lin, Peng Wang and Xiaoxue Wang
{"title":"Regio-MPNN: predicting regioselectivity for general metal-catalyzed cross-coupling reactions using a chemical knowledge informed message passing neural network†","authors":"Baochen Li, Yuru Liu, Haibin Sun, Rentao Zhang, Yongli Xie, Klement Foo, Frankie S. Mak, Ruimao Zhang, Tianshu Yu, Sen Lin, Peng Wang and Xiaoxue Wang","doi":"10.1039/D4DD00244J","DOIUrl":"10.1039/D4DD00244J","url":null,"abstract":"<p >As a fundamental problem in organic chemistry, synthesis planning aims at designing energy and cost-efficient reaction pathways for target compounds. In synthesis planning, it is crucial to understand regioselectivity, or the preference of a reaction over competing reaction sites. Precisely predicting regioselectivity enables early exclusion of unproductive reactions and paves the way to designing high-yielding synthetic routes with minimal separation and material costs. However, it is still at the emerging state to combine chemical knowledge and data-driven methods to make practical predictions for regioselectivity. At the same time, metal-catalyzed cross-coupling reactions have profoundly transformed medicinal chemistry, and thus become one of the most frequently encountered types of reactions in synthesis planning. In this work, we for the first time introduce a chemical knowledge informed message passing neural network (MPNN) framework that directly identifies the intrinsic major products for metal-catalyzed cross-coupling reactions with regioselective ambiguity. Integrating both first principles methods and data-driven methods, our model achieves an overall accuracy of 96.51% on the test set of eight typical metal-catalyzed cross-coupling reaction types, including Suzuki–Miyaura, Stille, Sonogashira, Buchwald–Hartwig, Hiyama, Kumada, Negishi, and Heck reactions, outperforming other commonly used model types. To integrate electronic effects with steric effects in regioselectivity prediction, we propose a quantitative method to measure the steric hindrance effect. Our steric hindrance checker can successfully identify regioselectivity induced solely by steric hindrance. Notably under practical scenarios, our model outperforms 6 experimental organic chemists with an average working experience of over 10 years in the organic synthesis industry in terms of predicting major products in regioselective cases. We have also exemplified the practical usage of our model by fixing routes designed by open-access synthesis planning software and improving reactions by identifying low-cost starting materials. To assist general chemists in making prompt decisions about regioselectivity, we have developed a free web-based AI-empowered tool. Our code and web tool have been made available at https://github.com/Chemlex-AI/regioselectivity and https://ai.tools.chemlex.com/region-choose, respectively.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2019-2031"},"PeriodicalIF":6.2,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00244j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michaela K. Loveless, Minwei Che, Alec J. Sanchez, Vikrant Tripathy, Bo W. Laursen, Sudhakar Pamidighantam, Krishnan Raghavachari and Amar H. Flood
{"title":"Extracting recalcitrant redox data on fluorophores to pair with optical data for predicting small-molecule, ionic isolation lattices†","authors":"Michaela K. Loveless, Minwei Che, Alec J. Sanchez, Vikrant Tripathy, Bo W. Laursen, Sudhakar Pamidighantam, Krishnan Raghavachari and Amar H. Flood","doi":"10.1039/D4DD00137K","DOIUrl":"10.1039/D4DD00137K","url":null,"abstract":"<p >Redox and optical data of organic fluorophores are essential for using design rules and property screening to identify new candidate dyes capable of forming optical materials. One such optical material is small-molecule, ionic isolation lattices (SMILES), which have properties defined by the optical and electrochemical properties of the fluorophores used. While optical data are available and readily extracted, the promise of digital discovery to mine the data and identify new dye candidates for making new fluorescent compounds is limited by experimental electrochemical data, which is reported with varying quality. We report methods to extract data from 20 000+ literature-reported dyes for generating a library of both redox and optical data constituted by 206 dye-solvent entries. Wide heterogeneity in data collection and reporting practices predicated use of a workflow involving manual data extraction, expert annotations of data quality and validation. Chemometric analysis shows distributions of solvents, electrolytes, and reference electrodes used in electrochemistry and the distributions of dye families and molecular weights. Data were extracted and screened to identify fluorophores predicted to form fluorescent solids based on SMILES. Screening used three design rules requiring dyes to be cationic, have a redox window within −1.9 and +1.5 V (<em>vs.</em> ferrocene), and a size less than 2 nm. A set of 47 dyes are compliant with all design rules showcasing the potential for using paired electrochemical-optical data in a workflow for designing optical materials.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2105-2117"},"PeriodicalIF":6.2,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00137k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mariya L. Ivanova, Nicola Russo, Nadia Djaid and Konstantin Nikolic
{"title":"Application of machine learning for predicting G9a inhibitors†‡","authors":"Mariya L. Ivanova, Nicola Russo, Nadia Djaid and Konstantin Nikolic","doi":"10.1039/D4DD00101J","DOIUrl":"10.1039/D4DD00101J","url":null,"abstract":"<p > <em>Object and significance</em>: the G9a enzyme is an epigenomic regulator, making gene expression directly dependent on how various substances in the cell affect this enzyme. Therefore, it is crucial to consider this impact in any biochemical research involving the development of new compounds introduced into the body. While this can be examined experimentally, it would be highly advantageous to predict these effects using computer simulations. <em>Purpose</em>: the purpose of the model was to assist in answering the question of the potential effect that a compound under development could have on the G9a activity, and thus reduce the need for laboratory experiments and facilitate faster and more productive research and development. <em>Solution</em>: the paper proposes a cost-effective machine learning model that determines whether a compound is an active G9a inhibitor. The proposed approach utilises the already existing very extensive PubChem database. The starting point was the quantitative high-throughput screening assay for inhibitors of histone lysine methyltransferase G9a (also available on PubChem) which screened around 350 000 compounds. For these compounds, datasets of 60 features were created. Then different ML algorithms were deployed to find the best performing one, which can then be used to predict if some untested compound would actively inhibit G9a. <em>Results</em>: six different ML classifiers have been implemented on five dataset variations. Different variants of the dataset were created by using two different data balancing approaches and including or not the influence of water solubility at a pH of 7.4. The most successful combination was a dataset with five features and a random forest classifier that reached 90% accuracy. The classifier was trained with 60 244 and tested with 15 062 compounds. Feature reduction was obtained by analysing three different feature importance algorithms, which resulted in not only feature reduction but also some insights for further biochemical research.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2010-2018"},"PeriodicalIF":6.2,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00101j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Hernández-del-Valle, Jorge Ilarraza-Zuazo, Enrique Dios-Lázaro, Javier Rubio, Joris Audoux and Maciej Haranczyk
{"title":"Pellet dispensomixer and pellet distributor: open hardware for nanocomposite space exploration via automated material compounding†","authors":"Miguel Hernández-del-Valle, Jorge Ilarraza-Zuazo, Enrique Dios-Lázaro, Javier Rubio, Joris Audoux and Maciej Haranczyk","doi":"10.1039/D4DD00198B","DOIUrl":"10.1039/D4DD00198B","url":null,"abstract":"<p >The development of novel polymer-based nanocomposites necessitates the experimental preparation and characterization of numerous compositions to identify optimal formulations. For thermoplastic-based materials, the compounding process typically involves the labor-intensive tasks of dispensing, weighing, mixing, and extruding solid components such as polymers and additives. Herein, we present an open hardware solution that aims to automate this process. Our setup system is designed to streamline material surveying tasks associated with experimental design or closed-loop, self-driving laboratories. Our hardware setup consists of two main components: a multi-material pellet dispenser, which simplifies the preparation of targeted compositions from a range of master batches, and a pellet collector-distributor, which efficiently gathers and distributes processed materials into various containers throughout the experiment.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2032-2040"},"PeriodicalIF":6.2,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00198b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a science exocortex","authors":"Kevin G. Yager","doi":"10.1039/D4DD00178H","DOIUrl":"10.1039/D4DD00178H","url":null,"abstract":"<p >Artificial intelligence (AI) methods are poised to revolutionize intellectual work, with generative AI enabling automation of text analysis, text generation, and simple decision making or reasoning. The impact to science is only just beginning, but the opportunity is significant since scientific research relies fundamentally on extended chains of cognitive work. Here, we review the state of the art in agentic AI systems, and discuss how these methods could be extended to have even greater impact on science. We propose the development of an exocortex, a synthetic extension of a person's cognition. A science exocortex could be designed as a swarm of AI agents, with each agent individually streamlining specific researcher tasks, and whose inter-communication leads to emergent behavior that greatly extend the researcher's cognition and volition.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1933-1957"},"PeriodicalIF":6.2,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00178h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital chemistry: navigating the confluence of computation and experimentation – definition, status quo, and future perspective","authors":"Stefan Bräse","doi":"10.1039/D4DD00130C","DOIUrl":"10.1039/D4DD00130C","url":null,"abstract":"<p >Digital chemistry represents a transformative approach integrating computational methods, digital data, and automation within the chemical sciences. It is defined by using digital toolkits and algorithms to simulate, predict, accelerate, and analyze chemical processes and properties, augmenting traditional experimental methods. The current status quo of digital chemistry is marked by rapid advancements in several key areas: high-throughput screening, machine learning models, quantum chemistry, and laboratory automation. These technologies have enabled unprecedented speeds in discovering and optimizing new molecules, materials, and reactions. Digital retrosynthesis and structure–active prediction tools have supported these endeavors. Furthermore, integrating large-language models and robotics in chemistry labs (<em>e.g.</em> demonstrated in self-driving labs) have begun to automate routine tasks and complex decision-making processes. Looking forward, the future of digital and digitalized chemistry is poised for significant growth, driven by the increasing accessibility of computational resources, the expansion of chemical databases, and the refinement of artificial intelligence algorithms. This evolution promises to accelerate innovation in drug discovery, materials science, and sustainable manufacturing, ultimately leading to more efficient, cost-effective, and environmentally friendly chemical research and production. The challenge lies in advancing the technology itself, fostering interdisciplinary collaboration, and ensuring the ethical use of digital tools in chemical research.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1923-1932"},"PeriodicalIF":6.2,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00130c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keisuke Kameda, Takaaki Ariga, Kazuma Ito, Manabu Ihara and Sergei Manzhos
{"title":"Machine learning the screening factor in the soft bond valence approach for rapid crystal structure estimation†","authors":"Keisuke Kameda, Takaaki Ariga, Kazuma Ito, Manabu Ihara and Sergei Manzhos","doi":"10.1039/D4DD00152D","DOIUrl":"10.1039/D4DD00152D","url":null,"abstract":"<p >The development of novel functional ceramics is critically important for several applications, including the design of better electrochemical batteries and fuel cells, in particular solid oxide fuel cells. Computational prescreening and selection of such materials can help discover novel materials but is also challenging due to the high cost of electronic structure calculations which would be needed to compute the structures and properties of interest such as the material's stability and ion diffusion properties. The soft bond valence (SoftBV) approach is attractive for rapid prescreening among multiple compositions and structures, but the simplicity of the approximation can make the results inaccurate. In this study, we explore the possibility of enhancing the accuracy of the SoftBV approach when estimating crystal structures by adapting the parameters of the approximation to the chemical composition. Specifically, on the examples of perovskite- and spinel-type oxides that have been proposed as promising solid-state ionic conductors, the screening factor – an independent parameter of the SoftBV approximation – is modeled using linear and non-linear methods as a function of descriptors of the chemical composition. We find that making the screening factor a function of composition can noticeably improve the ability of the SoftBV approximation to correctly model structures, in particular new, putative crystal structures whose structural parameters are yet unknown. We also analyze the relative importance of nonlinearity and coupling in improving the model and find that while the quality of the model is improved by including nonlinearity, coupling is relatively unimportant. While using a neural network showed practically no improvement over linear regression, the recently proposed GPR-NN method that is a hybrid between a single hidden layer neural network and kernel regression showed substantial improvement, enabling the prediction of structural parameters of new ceramics with accuracy on the order of 1%.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1967-1979"},"PeriodicalIF":6.2,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00152d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Tynes, Michael G. Taylor, Jan Janssen, Daniel J. Burrill, Danny Perez, Ping Yang and Nicholas Lubbers
{"title":"Linear graphlet models for accurate and interpretable cheminformatics†","authors":"Michael Tynes, Michael G. Taylor, Jan Janssen, Daniel J. Burrill, Danny Perez, Ping Yang and Nicholas Lubbers","doi":"10.1039/D4DD00089G","DOIUrl":"10.1039/D4DD00089G","url":null,"abstract":"<p >Advances in machine learning have given rise to a plurality of data-driven methods for predicting chemical properties from molecular structure. For many decades, the cheminformatics field has relied heavily on structural fingerprinting, while in recent years much focus has shifted toward leveraging highly parameterized deep neural networks which usually maximize accuracy. Beyond accuracy, to be useful and trustworthy in scientific applications, machine learning techniques often need intuitive explanations for model predictions and uncertainty quantification techniques so a practitioner might know when a model is appropriate to apply to new data. Here we revisit graphlet histogram fingerprints and introduce several new elements. We show that linear models built on graphlet fingerprints attain accuracy that is competitive with the state of the art while retaining an explainability advantage over black-box approaches. We show how to produce precise explanations of predictions by exploiting the relationships between molecular graphlets and show that these explanations are consistent with chemical intuition, experimental measurements, and theoretical calculations. Finally, we show how to use the presence of unseen fragments in new molecules to adjust predictions and quantify uncertainty.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 1980-1996"},"PeriodicalIF":6.2,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00089g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oliver S. Lee, Malte C. Gather and Eli Zysman-Colman
{"title":"Digichem: computational chemistry for everyone†","authors":"Oliver S. Lee, Malte C. Gather and Eli Zysman-Colman","doi":"10.1039/D4DD00147H","DOIUrl":"https://doi.org/10.1039/D4DD00147H","url":null,"abstract":"<p >We describe a new tool for the efficient management of computational chemistry. Digichem is a program that automates and simplifies nearly the entire computational pipeline, including large-scale batch submission of calculations, analysis and results parsing, the generation of 3D density plots and 2D graphs of calculation data, storage and retrieval of calculation results to a database, and automated handling of multi-step jobs. The program is designed to reduce the tedium and likelihood of human error for researchers of all skill-levels but is particularly targeted towards novice users who otherwise may find the barrier to entry to computational chemistry unnecessarily high. To date, this program has been used to successfully run and analyse over 50 000 individual calculations, evidencing its usefulness and utility. The Digichem program is presently released under a free-to-use license, and components of the Digichem system are additionally available under an open-source license.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 9","pages":" 1695-1713"},"PeriodicalIF":6.2,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00147h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}