Benedikt Winter, Philipp Rehner, Timm Esper, Johannes Schilling and André Bardow
{"title":"Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES†","authors":"Benedikt Winter, Philipp Rehner, Timm Esper, Johannes Schilling and André Bardow","doi":"10.1039/D4DD00077C","DOIUrl":"https://doi.org/10.1039/D4DD00077C","url":null,"abstract":"<p >A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while keeping the thermodynamic consistency of an equation of state like PC-SAFT. SPT<small><sub>PC-SAFT</sub></small> demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPT<small><sub>PC-SAFT</sub></small> captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of 13 279 components, making PC-SAFT accessible to all researchers.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 5","pages":" 1142-1157"},"PeriodicalIF":6.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00077c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Moran, Michael W. Gaultois, Vladimir V. Gusev, Dmytro Antypov and Matthew J. Rosseinsky
{"title":"Establishing Deep InfoMax as an effective self-supervised learning methodology in materials informatics†","authors":"Michael Moran, Michael W. Gaultois, Vladimir V. Gusev, Dmytro Antypov and Matthew J. Rosseinsky","doi":"10.1039/D4DD00202D","DOIUrl":"https://doi.org/10.1039/D4DD00202D","url":null,"abstract":"<p >The scarcity of property labels remains a key challenge in materials informatics, whereas materials data without property labels are abundant in comparison. By pre-training supervised property prediction models on self-supervised tasks that depend only on the “intrinsic information” available in any Crystallographic Information File (CIF), there is potential to leverage the large amount of crystal data without property labels to improve property prediction results on small datasets. We apply Deep InfoMax as a self-supervised machine learning framework for materials informatics that explicitly maximises the mutual information between a point set (or graph) representation of a crystal and a vector representation suitable for downstream learning. This allows the pre-training of supervised models on large materials datasets without the need for property labels and without requiring the model to reconstruct the crystal from a representation vector. We investigate the benefits of Deep InfoMax pre-training implemented on the Site-Net architecture to improve the performance of downstream property prediction models with small amounts (<10<small><sup>3</sup></small>) of data, a situation relevant to experimentally measured materials property databases. Using a property label masking methodology, where we perform self-supervised learning on larger supervised datasets and then train supervised models on a small subset of the labels, we isolate Deep InfoMax pre-training from the effects of distributional shift. We demonstrate performance improvements in the contexts of representation learning and transfer learning on the tasks of band gap and formation energy prediction. Having established the effectiveness of Deep InfoMax pre-training in a controlled environment, our findings provide a foundation for extending the approach to address practical challenges in materials informatics.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 790-811"},"PeriodicalIF":6.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00202d?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lowering the exponential wall: accelerating high-entropy alloy catalysts screening using local surface energy descriptors from neural network potentials","authors":"Tomoya Shiota, Kenji Ishihara and Wataru Mizukami","doi":"10.1039/D4DD00303A","DOIUrl":"https://doi.org/10.1039/D4DD00303A","url":null,"abstract":"<p >Computational screening is indispensable for the efficient design of high-entropy alloys (HEAs), which hold considerable potential for catalytic applications. However, the chemical space of HEAs is exponentially vast with respect to the number of constituent elements, making even machine learning-based screening calculations time-intensive. To address this challenge, we propose a rapid method for predicting HEA properties using data from monometallic systems (or few-component alloys). Central to our approach is the newly introduced local surface energy (LSE) descriptor, which captures local surface reactivity at atomic resolution. We established a correlation between LSE and adsorption energies using monometallic systems. Using this correlation in a linear regression model, we successfully estimated molecular adsorption energies on HEAs with significantly higher accuracy than a conventional descriptor (<em>i.e.</em>, generalized coordination numbers). Furthermore, we developed high-precision models by employing both classical and quantum machine learning. Our method enabled CO adsorption-energy calculations for 1000 quinary nanoparticles, comprising 201 atoms each, within a few days, considerably faster than density functional theory, which would require hundreds of years or neural network potentials, which would have taken hundreds of days. The proposed approach accelerates the exploration of the vast HEA chemical space, facilitating the design of novel catalysts.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 738-751"},"PeriodicalIF":6.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00303a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James R. Deneault, Woojae Kim, Jiseob Kim, Yuzhe Gu, Jorge Chang, Benji Maruyama, Jay I. Myung and Mark A. Pitt
{"title":"Preferential Bayesian optimization improves the efficiency of printing objects with subjective qualities†‡","authors":"James R. Deneault, Woojae Kim, Jiseob Kim, Yuzhe Gu, Jorge Chang, Benji Maruyama, Jay I. Myung and Mark A. Pitt","doi":"10.1039/D4DD00320A","DOIUrl":"https://doi.org/10.1039/D4DD00320A","url":null,"abstract":"<p >Despite recent advances in closed-loop 3D printing, optimizing subjective and difficult-to-quantify qualities—such as surface finish and clarity of fine detail—remains a significant challenge, often relying on the traditional time-consuming and inefficient trial-and-error process. Preferential Bayesian optimization (PBO) is a machine learning technique that uses human preference judgements to efficiently guide the search for such abstract optimums in a high-dimensional space. We evaluated PBO's ability to identify optimal parameter values in printing profiles of vases and pairs of 3D cones. In semi-autonomous printing campaigns, a human observer ranked triplets of images of these objects with a target object in mind, preferring slender/bulbous vases and cone pairs that were smooth and well-formed. Results show that PBO consistently and quickly identified an optimal parameter combination across repeated testing. Modeling was then used to identify object dimensions responsible for preference judgements and to mimic preference behavior. Findings suggest that PBO is a promising tool for expanding the range of 3D objects that can be printed efficiently.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 723-737"},"PeriodicalIF":6.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00320a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking study of deep generative models for inverse polymer design†","authors":"Tianle Yue, Lei Tao, Vikas Varshney and Ying Li","doi":"10.1039/D4DD00395K","DOIUrl":"https://doi.org/10.1039/D4DD00395K","url":null,"abstract":"<p >Molecular generative models based on deep learning have increasingly gained attention for their ability in <em>de novo</em> polymer design. However, there remains a knowledge gap in the thorough evaluation of these models. This benchmark study explores <em>de novo</em> polymer design using six popular deep generative models: Variational Autoencoder (VAE), Adversarial Autoencoder (AAE), Objective-Reinforced Generative Adversarial Networks (ORGAN), Character-level Recurrent Neural Network (CharRNN), REINVENT, and GraphINVENT. Various metrics highlighted the excellent performance of CharRNN, REINVENT, and GraphINVENT, particularly when applied to the real polymer dataset, while VAE and AAE show more advantages in generating hypothetical polymers. The CharRNN, REINVENT, and GraphINVENT models were successfully further trained on real polymers using reinforcement learning methods, targeting the generation of hypothetical high-temperature polymers for extreme environments. The findings of this study provide critical insights into the capabilities and limitations of each generative model, offering valuable guidance for future endeavors in polymer design and discovery.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 4","pages":" 910-926"},"PeriodicalIF":6.2,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00395k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143809078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengru Liu, Long Bian, Wenting Shao, Sean I. Hwang and Alexander Star
{"title":"An automated electrolyte-gate field-effect transistor test system for rapid screening of multiple sensors†","authors":"Zhengru Liu, Long Bian, Wenting Shao, Sean I. Hwang and Alexander Star","doi":"10.1039/D4DD00301B","DOIUrl":"https://doi.org/10.1039/D4DD00301B","url":null,"abstract":"<p >Automation of laboratory processes is crucial in analytical chemistry, as it enhances experimental reproducibility by eliminating repetitive tasks and reducing human errors. In this context, the integration of laboratory automation techniques into chemical analysis, particularly utilizing electrochemical field-effect transistor (FET)-based sensors, is highly desirable for high-throughput testing. In this study, we developed an automated electrolyte-gate FET test system designed for rapid screening of multiple sensors. Comprising five key components – printed circuit board, pipetting robot, source meter unit, system switch, and computer – the automated system achieves precision control through individual programming of each instrument, followed by the synergistic integration of the instruments using Python scripts. The automated system could perform FET measurements of 96 sensors in a single run, and different operations such as liquid transfer and waste removal were optimized. The automated system was evaluated by running a pH sensing test successfully and finally applied for opioid drug testing with high working efficiency and good accuracy, demonstrating that it could be an excellent tool for different sensing applications based on electrolyte-gate FET sensors.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 752-761"},"PeriodicalIF":6.2,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00301b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyungwon Kim, Hyejeong Song, Sanghun Lee, Hyeongkyu Cho, Hyung Mi Lim and Hyunseok Ko
{"title":"Accentuating the ambient curing behavior of geopolymers: metamodel-guided optimization for fast-curing geopolymers with high flexural strength†","authors":"Kyungwon Kim, Hyejeong Song, Sanghun Lee, Hyeongkyu Cho, Hyung Mi Lim and Hyunseok Ko","doi":"10.1039/D4DD00217B","DOIUrl":"https://doi.org/10.1039/D4DD00217B","url":null,"abstract":"<p >A geopolymer, consisting of –Si–O–Al– covalent bonds in a polymeric network, has a simple manufacturing process with low CO<small><sub>2</sub></small> emissions and excellent high-temperature performance, making it a promising modern refractory material. In particular, owing to its low-temperature and fast-curing conditions, geopolymers can be used for practical on-site applications. However, the properties of geopolymers are significantly dependent on the composition and content of various additives, and this complexity limits our understanding of the composition to a narrow scope. In this study, we investigated the optimal composition designed for fast and low-temperature curing geopolymers with additives, including Ca(OH)<small><sub>2</sub></small>, fumed silica, and chopped carbon fiber. A multivariate compositional optimization was systematically conducted using design of experiments and metamodeling. By utilizing the metamodel, we successfully developed an optimized geopolymer composition with only 45 sets of experiments. The flexural strength obtained was 27.83 MPa, the highest recorded value for a bulk fast-curing geopolymer to date. Furthermore, the curing speed was modulated to be swift at ambient conditions, achieving 98% of the full strength in 6 days at 20 °C (whereas it typically takes 1 to 4 weeks at 40 °C). We also investigated how superior strength could be achieved while curing at low temperatures for a short duration. It turned out that fumed silica slowed down the growth of the Ca compound, balancing two different effects stemming from Ca ions: strength degradation and rapid curing. The developed geopolymer is expected to be widely used in applications that require rapid curing at room temperature, such as external cement replacements for fire spread prevention structures, acid-exposed environments, or repair and finishing materials.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 653-665"},"PeriodicalIF":6.2,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00217b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Àlex Solé, Albert Mosella-Montoro, Joan Cardona, Silvia Gómez-Coca, Daniel Aravena, Eliseo Ruiz and Javier Ruiz-Hidalgo
{"title":"A Cartesian encoding graph neural network for crystal structure property prediction: application to thermal ellipsoid estimation†","authors":"Àlex Solé, Albert Mosella-Montoro, Joan Cardona, Silvia Gómez-Coca, Daniel Aravena, Eliseo Ruiz and Javier Ruiz-Hidalgo","doi":"10.1039/D4DD00352G","DOIUrl":"https://doi.org/10.1039/D4DD00352G","url":null,"abstract":"<p >In the diffraction resolution of crystal structures, thermal ellipsoids are a critical parameter that is usually more difficult to determine than atomic positions. These ellipsoids are quantified through Anisotropic Displacement Parameters (ADPs), which provide critical insights into atomic vibrations within crystalline structures. ADPs reflect the thermal behaviour and structural properties of crystal structures. However, traditional methods to compute ADPs are computationally intensive. This paper presents CartNet, a novel graph neural network (GNN) architecture designed to predict properties of crystal structures efficiently by encoding the atomic structural geometry to the Cartesian axes and the temperature of the crystal structure. Additionally, CartNet employs a neighbour equalization technique for message passing to help emphasise the covalent and contact interactions and a novel Cholesky-based head to ensure valid ADP predictions. Furthermore, a rotational SO(3) data augmentation technique has been proposed during the training phase to generalize unseen rotations. To corroborate this procedure, an ADP dataset with over 200 000 experimental crystal structures from the Cambridge Structural Database (CSD) has been curated. The model significantly reduces computational costs and outperforms existing previously reported methods for ADP prediction by 10.87%, while demonstrating a 34.77% improvement over the tested theoretical computation methods. Moreover, we have employed CartNet for other already known datasets that included different material properties, such as formation energy, band gap, total energy, energy above the convex hull, bulk moduli, and shear moduli. The proposed architecture outperformed previously reported methods by 7.71% in the JARVIS dataset and 13.16% in the Materials Project dataset, proving CarNet's capability to achieve state-of-the-art results in several tasks. The project website with online demo available at: https://www.ee.ub.edu/cartnet.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 694-710"},"PeriodicalIF":6.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00352g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdulelah S. Alshehri, Michael T. Bergman, Fengqi You and Carol K. Hall
{"title":"Biophysics-guided uncertainty-aware deep learning uncovers high-affinity plastic-binding peptides","authors":"Abdulelah S. Alshehri, Michael T. Bergman, Fengqi You and Carol K. Hall","doi":"10.1039/D4DD00219A","DOIUrl":"10.1039/D4DD00219A","url":null,"abstract":"<p >Plastic pollution, particularly microplastics (MPs), poses a significant global threat to ecosystems and human health, necessitating innovative remediation strategies. Biocompatible and biodegradable plastic-binding peptides (PBPs) offer a potential solution through targeted adsorption and subsequent MP detection or removal from the environment. A challenge in discovering plastic-binding peptides is the vast combinatorial space of possible peptides (<em>i.e.</em>, over 10<small><sup>15</sup></small> for 12-mer peptides), which far exceeds the sample sizes typically reachable by experiments or biophysics-based computational methods. One step towards addressing this issue is to train deep learning models on experimental or biophysical datasets, permitting faster and cheaper evaluations of peptides. However, deep learning predictions are not always accurate, which could waste time and money due to synthesizing and evaluating false positives. Here, we resolve this issue by combining biophysical modeling data from Peptide Binder Design (PepBD) algorithm, the predictive power and uncertainty quantification of evidential deep learning, and metaheuristic search methods to identify high-affinity PBPs for several common plastics. Molecular dynamics simulations show that the discovered PBPs have greater median adsorption free energies for polyethylene (5%), polypropylene (18%), and polystyrene (34%) relative to PBPs previously designed by PepBD. The impact of including uncertainty quantification in peptide design is demonstrated by the increasing improvement in the median adsorption free energy with decreasing uncertainty. This robust framework accelerates peptide discovery, paving the way for effective, bio-inspired solutions to MP remediation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 561-571"},"PeriodicalIF":6.2,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771220/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rocco Cancelliere, Mario Molinara, Antonio Licheri, Antonio Maffucci and Laura Micheli
{"title":"Artificial intelligence-assisted electrochemical sensors for qualitative and semi-quantitative multiplexed analyses†","authors":"Rocco Cancelliere, Mario Molinara, Antonio Licheri, Antonio Maffucci and Laura Micheli","doi":"10.1039/D4DD00318G","DOIUrl":"https://doi.org/10.1039/D4DD00318G","url":null,"abstract":"<p >This research utilises Artificial Intelligence (AI) to enhance electrochemical peak resolution and lower detection limits in voltammetric analysis, focusing on complex, multiplex real matrices analyses. The study investigated the quinone family, hydroquinone, benzoquinone, and catechol analysed individually and in mixtures using cyclic and square wave voltammetry. The ferrocyanide/ferricyanide redox couple was included as a standard redox probe to provide a reference for method validation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 2","pages":" 338-342"},"PeriodicalIF":6.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00318g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143396422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}