Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi and Matthew N. Grayson
{"title":"Distortion/interaction analysis via machine learning†","authors":"Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi and Matthew N. Grayson","doi":"10.1039/D4DD00224E","DOIUrl":"https://doi.org/10.1039/D4DD00224E","url":null,"abstract":"<p >Machine learning (ML) models have provided a highly efficient pathway to quantum mechanical accurate reaction barrier predictions. Previous approaches have, however, stopped at prediction of these barriers instead of developing predictive capabilities in reactivity analysis tasks such as distortion/interaction–activation strain analysis. Such methods can provide insight into reactivity trends and ultimately guide rational reaction design. In this work we present the novel application of ML to the rapid and accurate prediction of distortion and interaction DFT energies across four datasets (three existing and one new dataset). We also show how our models can accurately predict on unseen, high impact literature examples where DFT-level distortion/interaction analysis has previously been used to explain reactivity trends for cycloadditions. This work thus provides support for ML to be utilised further in reactivity analysis across different reaction classes at a fraction of the cost of traditional methods such as DFT.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2479-2486"},"PeriodicalIF":6.2,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00224e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass
{"title":"Challenges for error-correction coding in DNA data storage: photolithographic synthesis and DNA decay†","authors":"Andreas L. Gimpel, Wendelin J. Stark, Reinhard Heckel and Robert N. Grass","doi":"10.1039/D4DD00220B","DOIUrl":"https://doi.org/10.1039/D4DD00220B","url":null,"abstract":"<p >Efficient error-correction codes are crucial for realizing DNA's potential as a long-lasting, high-density storage medium for digital data. At the same time, new workflows promising low-cost, resilient DNA data storage are challenging their design and error-correcting capabilities. This study characterizes the errors and biases in two new additions to the state-of-the-art workflow in DNA data storage: photolithographic synthesis and DNA decay. Photolithographic synthesis offers low-cost, scalable oligonucleotide synthesis but suffers from high error rates, necessitating sophisticated error-correction schemes, for example codes introducing within-sequence redundancy combined with clustering and alignment techniques for retrieval. On the other hand, the decoding of oligo fragments after DNA decay promises unprecedented storage densities, but complicates data recovery by requiring the reassembly of full-length sequences or the use of partial sequences for decoding. Our analysis provides a detailed account of the error patterns and biases present in photolithographic synthesis and DNA decay, and identifies considerable bias stemming from sequencing workflows. We implement our findings into a digital twin of the two workflows, offering a tool for developing error-correction codes and providing benchmarks for the evaluation of codec performance.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2497-2508"},"PeriodicalIF":6.2,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00220b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142778009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ganesh Chandan Kanakala, Bhuvanesh Sridharan and U. Deva Priyakumar
{"title":"Spectra to structure: contrastive learning framework for library ranking and generating molecular structures for infrared spectra†","authors":"Ganesh Chandan Kanakala, Bhuvanesh Sridharan and U. Deva Priyakumar","doi":"10.1039/D4DD00135D","DOIUrl":"https://doi.org/10.1039/D4DD00135D","url":null,"abstract":"<p >Inferring complete molecular structure from infrared (IR) spectra is a challenging task. In this work, we propose SMEN (Spectra and Molecule Encoder Network), a framework for scoring molecules against given IR spectra. The proposed framework uses contrastive optimization to obtain similar embedding for a molecule and its spectra. For this study, we consider the QM9 dataset with molecules consisting of less than 9 heavy atoms and obtain simulated spectra. Using the proposed method, we can rank the molecules using embedding similarity and obtain a Top 1 accuracy of ∼81%, Top 3 accuracy of ∼96%, and Top 10 accuracy of ∼99% on the evaluation set. We extend SMEN to build a generative transformer for a direct molecule prediction from IR spectra. The proposed method can significantly help molecule library ranking tasks and aid the problem of inferring molecular structures from spectra.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2417-2423"},"PeriodicalIF":6.2,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00135d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of object detection and action recognition toward automated recognition of chemical experiments†","authors":"Ryosuke Sasaki, Mikito Fujinami and Hiromi Nakai","doi":"10.1039/D4DD00015C","DOIUrl":"https://doi.org/10.1039/D4DD00015C","url":null,"abstract":"<p >Developments in deep learning-based computer vision technology have significantly improved the performance of applied research. The use of image recognition methods to manually conduct chemical experiments is promising for digitizing traditional practices in terms of experimental recording, hazard management, and educational applications. This study investigated the feasibility of automatically recognizing manual chemical experiments using recent image recognition technology. Both object detection and action recognition were evaluated, that is, the identification of the locations and types of objects in images and the inference of human actions in videos. The image and video datasets for the chemical experiments were originally constructed by capturing scenes from actual organic chemistry laboratories. The assessment of inference accuracy indicates that image recognition methods can effectively detect chemical apparatuses and classify manipulations in experiments.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2458-2464"},"PeriodicalIF":6.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00015c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In situ synthesis within micron-sized hydrogel reactors created via programmable aerosol chemistry†","authors":"Luokun Zhang and S. Hessam M. Mehr","doi":"10.1039/D4DD00139G","DOIUrl":"https://doi.org/10.1039/D4DD00139G","url":null,"abstract":"<p >Recent progress in materials science and complex chemical systems has highlighted the critical role of containers in directing and modulating reactivity. Micron-sized reactors are especially attractive due to their significantly different surface/volume ratios compared to traditional laboratory glassware, while still providing high experimental throughput and being easily observable using optical microscopy. Despite their promise, there is a gap in adapting chemical synthesis protocols to work within microspheres. We demonstrate a programmable aerosol chemistry setup that automates the generation of calcium alginate microspheres and allows them to be used as micro-reactors for exploration of chemical reactivity. A range of reactions can be adapted for <em>in situ</em> synthesis within the forming microspheres by pre-loading the precursor solutions with solid and soluble reagents, exemplified by our preparation of Prussian blue and quinhydrone. The micro-reactors are permeable, allowing rapid uptake and release of small molecule reagents and products. Larger particles trapped within the calcium alginate matrix can also be released, triggered <em>via</em> rapid disassembly of the microspheres in response to calcium binders like EDTA. As our standard programmable apparatus is extensible to broad reagent types and reaction stoichiometries, we expect that its adoption will accelerate exploration of chemical reactivity and discovery within micro-reactors.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 12","pages":" 2424-2433"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00139g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142777978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin
{"title":"Sorting polyolefins with near-infrared spectroscopy: identification of optimal data analysis pipelines and machine learning classifiers†‡","authors":"Bradley P. Sutliff, Peter A. Beaucage, Debra J. Audus, Sara V. Orski and Tyler B. Martin","doi":"10.1039/D4DD00235K","DOIUrl":"https://doi.org/10.1039/D4DD00235K","url":null,"abstract":"<p >Polyolefins (POs) are the largest class of polymers produced worldwide. Despite the intrinsic chemical similarities within this class of polymers, they are often physically incompatible. This combination presents a significant hurdle for high-throughput recycling systems that strive to sort various types of plastics from one another. Some research has been done to show that near-infrared spectroscopy (NIR) can sort POs from other plastics, but they generally fall short of sorting POs from one another. In this work, we enhance NIR spectroscopy-based sortation by screening over 12 000 machine-learning pipelines to enable sorting of PO species beyond what is possible using current NIR databases. These pipelines include a series of scattering corrections, filtering and differentiation, data scaling, dimensionality reduction, and machine learning classifiers. Common scattering corrections and preprocessing steps include scatter correction, linear detrending, and Savitzky–Golay filtering. Dimensionality reduction techniques such as principal component analysis (PCA), functional principal component analysis (fPCA) and uniform manifold approximation and projection (UMAP) were also investigated for classification enhancements. This analysis of preprocessing steps and classification algorithm combinations identified multiple data pipelines capable of successfully sorting PO materials with over 95% accuracy. Through rigorous testing, this study provides recommendations for consistently applying preprocessing and classification techniques without over-complicating the data analysis. This work also provides a set of preprocessing steps, a chosen classifier, and tuned hyperparameters that may be useful for benchmarking new models and data sets. Finally, the approach outlined here is ready to be applied by the developers of materials sortation equipment so that we can improve the value and purity of recycled plastic waste streams.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2341-2355"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00235k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann
{"title":"High accuracy uncertainty-aware interatomic force modeling with equivariant Bayesian neural networks†","authors":"Tim Rensmeyer, Ben Craig, Denis Kramer and Oliver Niggemann","doi":"10.1039/D4DD00183D","DOIUrl":"https://doi.org/10.1039/D4DD00183D","url":null,"abstract":"<p > <em>Ab initio</em> molecular dynamics simulations of material properties have become a cornerstone in the development of novel materials for a wide range of applications such as battery technology and catalysis. Unfortunately, their high computational demand can make them unsuitable in many applications. Consequently, surrogate modeling <em>via</em> neural networks has become an active field of research. Two of the major obstacles to their practical application in many cases are assessing the reliability of the neural network predictions and the difficulty of generating suitable datasets to train the neural network in the first place. Bayesian neural networks offer a promising framework for modeling uncertainty, active learning and improving data efficiency and robustness by incorporating prior physical knowledge. However, due to the high computational demand and slow convergence of the gold standard approach of Monte Carlo Markov Chain (MCMC) sampling methods, variational inference <em>via</em> Monte Carlo dropout is currently the only sampling method successfully applied in this domain. Since MCMC methods have often displayed a superior quality in their uncertainty quantification, developing a suitable MCMC method in this domain would be a significant advance in making neural network-based molecular dynamics simulations more practically viable. In this paper, we demonstrate that convergence for state-of-the-art models with high-quality MCMC methods can still be achieved in a practical amount of time by introducing a novel parameter-specific adaptive step size scheme. In addition, we introduce a new stochastic neural network model based on the NequIP architecture and demonstrate that, when combined with our novel sampling algorithm, we obtain predictions with state-of-the-art accuracy as well as a significantly improved measure of uncertainty over Monte Carlo dropout. Lastly, we show that the proposed algorithm can even outperform deep ensembles while sampling from a single Markov chain.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2356-2366"},"PeriodicalIF":6.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00183d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers
{"title":"Artificial intelligence-enabled optimization of battery-grade lithium carbonate production†","authors":"S. Shayan Mousavi Masouleh, Corey A. Sanz, Ryan P. Jansonius, Samuel Shi, Maria J. Gendron Romero, Jason E. Hein and Jason Hattrick-Simpers","doi":"10.1039/D4DD00159A","DOIUrl":"https://doi.org/10.1039/D4DD00159A","url":null,"abstract":"<p >By 2035, the need for battery-grade lithium is expected to quadruple. About half of this lithium is currently sourced from brines and must be converted from lithium chloride into lithium carbonate (Li<small><sub>2</sub></small>CO<small><sub>3</sub></small>) through a process called softening. Conventional softening methods using sodium or potassium salts contribute to carbon emissions during reagent mining and battery manufacturing, exacerbating global warming. This study introduces an alternative approach using carbon dioxide (CO<small><sub>2(g)</sub></small>) as the carbonating reagent in the lithium softening process, offering a carbon capture solution. We employed an active learning-driven high-throughput method to rapidly capture CO<small><sub>2(g)</sub></small> and convert it to lithium carbonate. The model was simplified by focusing on the elemental concentrations of C, Li, and N for practical measurement and tracking, avoiding the complexities of ion speciation equilibria. This approach led to an optimized lithium carbonate process that capitalizes on CO<small><sub>2(g)</sub></small> capture and improves the battery metal supply chain's carbon efficiency.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2320-2326"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00159a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow
{"title":"Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing","authors":"Benedikt Winter, Clemens Winter, Johannes Schilling and André Bardow","doi":"10.1039/D4DD90045F","DOIUrl":"10.1039/D4DD90045F","url":null,"abstract":"<p >Correction for ‘A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing’ by Benedikt Winter <em>et al.</em>, <em>Digital Discovery</em>, 2022, <strong>1</strong>, 859–869, https://doi.org/10.1039/D2DD00058J.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2384-2384"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin
{"title":"Embedding DNA-based natural language in microbes for the benefit of future researchers†","authors":"Heqian Zhang, Jiaquan Huang, Xiaoyu Wang, Zhizeng Gao, Song Meng, Hang Li, Shanshan Zhou, Shang Wang, Shan Wang, Xunyou Yan, Xinwei Yang, Xiaoluo Huang and Zhiwei Qin","doi":"10.1039/D4DD00251B","DOIUrl":"https://doi.org/10.1039/D4DD00251B","url":null,"abstract":"<p >Microorganisms are valuable resources as antibiotic producers, biocontrol agents, and symbiotic agents in various ecosystems and organisms. Over the past decades, there has been a notable increase in the identification and generation of both wild-type and genetically modified microbial strains from research laboratories worldwide. However, a substantial portion of the information represented in these strains remains scattered across the scientific literature. To facilitate the work of future researchers, in this perspective article, we advocate the adoption of the DNA-based natural language (DBNL) algorithm standard and then demonstrate it using a <em>Streptomyces</em> species as a proof of concept. This standard enables the sophisticated genome sequencing and subsequent extraction of valuable information encoded within a particular microbial species. In addition, it allows the access of such information for the continued research and applications even if a currently cultivated microbe cannot be cultured in the future. Embracing the DBNL algorithm standard promises to enhance the efficiency and effectiveness of microbial research, paving the way for innovative solutions and discoveries in diverse fields.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 11","pages":" 2377-2383"},"PeriodicalIF":6.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00251b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}