Deekshant Wadhwa, Philipp Mensing, James Harden, Paula Branco, Vincent Tabard-Cossa and Kyle Briggs
{"title":"Nano Trees: nanopore signal processing and sublevel fitting using decision trees†","authors":"Deekshant Wadhwa, Philipp Mensing, James Harden, Paula Branco, Vincent Tabard-Cossa and Kyle Briggs","doi":"10.1039/D5DD00060B","DOIUrl":"https://doi.org/10.1039/D5DD00060B","url":null,"abstract":"<p >As the complexity of solid-state nanopore experiments increases, analysis of the resulting electrical signals to determine biomolecular details becomes a challenge. State of the art techniques for this task perform poorly when transient signal characteristics approach the bandwidth limitations of the measurement electronics. In this work, we address this challenge through an algorithm, called Nano Trees, for fitting piecewise constant functions. Nano Trees leverages machine learning algorithms to provide fits to the noisy piecewise constant data that is characteristic of nanopore ionic current signals, producing accurate fits on transients as short as twice the rise time of the measurement system. We demonstrate the performance of our algorithm on several real and synthetic datasets. These findings underscore the generalizability and accuracy of this approach in the regime of fast molecular translocations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1743-1750"},"PeriodicalIF":6.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00060b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naruki Yoshikawa, Yuki Asano, Don N. Futaba, Kanako Harada, Taro Hitosugi, Genki N. Kanda, Shoichi Matsuda, Yuuya Nagata, Keisuke Nagato, Masanobu Naito, Tohru Natsume, Kazunori Nishio, Kanta Ono, Haruka Ozaki, Woosuck Shin, Junichiro Shiomi, Kunihiko Shizume, Koichi Takahashi, Seiji Takeda, Ichiro Takeuchi, Ryo Tamura, Koji Tsuda and Yoshitaka Ushiku
{"title":"Self-driving laboratories in Japan","authors":"Naruki Yoshikawa, Yuki Asano, Don N. Futaba, Kanako Harada, Taro Hitosugi, Genki N. Kanda, Shoichi Matsuda, Yuuya Nagata, Keisuke Nagato, Masanobu Naito, Tohru Natsume, Kazunori Nishio, Kanta Ono, Haruka Ozaki, Woosuck Shin, Junichiro Shiomi, Kunihiko Shizume, Koichi Takahashi, Seiji Takeda, Ichiro Takeuchi, Ryo Tamura, Koji Tsuda and Yoshitaka Ushiku","doi":"10.1039/D4DD00387J","DOIUrl":"https://doi.org/10.1039/D4DD00387J","url":null,"abstract":"<p >Self-driving laboratories (SDLs) are transforming the scientific discovery process worldwide by integrating automated experimentation with data-driven decision-making. Japan, known for its automation industry, is actively contributing to this field. This perspective introduces Japan's efforts in SDL development, including diverse applications across materials science, biology, chemistry, and software. In addition, it covers national funding programs, research communities, and Japanese industries supporting progress in this field. It also highlights the importance of education, standardization, and benchmarking for the future growth of SDL research.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1384-1403"},"PeriodicalIF":6.2,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00387j?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jakob Baumsteiger, Lorenzo Celiberti, Patrick Rinke, Milica Todorović and Cesare Franchini
{"title":"Exploring noncollinear magnetic energy landscapes with Bayesian optimization","authors":"Jakob Baumsteiger, Lorenzo Celiberti, Patrick Rinke, Milica Todorović and Cesare Franchini","doi":"10.1039/D4DD00402G","DOIUrl":"https://doi.org/10.1039/D4DD00402G","url":null,"abstract":"<p >The investigation of magnetic energy landscapes and the search for ground states of magnetic materials using <em>ab initio</em> methods like density functional theory (DFT) is a challenging task. Complex interactions, such as superexchange and spin–orbit coupling, make these calculations computationally expensive and often lead to non-trivial energy landscapes. Consequently, a comprehensive and systematic investigation of large magnetic configuration spaces is often impractical. We approach this problem by utilizing Bayesian optimization, an active machine learning scheme that has proven to be efficient in modeling unknown functions and finding global minima. Using this approach we can obtain the magnetic contribution to the energy as a function of one or more spin canting angles with relatively small numbers of DFT calculations. To assess the capabilities and the efficiency of the approach we investigate the noncollinear magnetic energy landscapes of selected materials containing 3d, 5d and 5f magnetic ions: Ba<small><sub>3</sub></small>MnNb<small><sub>2</sub></small>O<small><sub>9</sub></small>, LaMn<small><sub>2</sub></small>Si<small><sub>2</sub></small>, β-MnO<small><sub>2</sub></small>, Sr<small><sub>2</sub></small>IrO<small><sub>4</sub></small>, UO<small><sub>2</sub></small>, Ba<small><sub>2</sub></small>NaOsO<small><sub>6</sub></small> and kagome RhMn<small><sub>3</sub></small>. By comparing our results to previous <em>ab initio</em> studies that followed more conventional approaches, we observe significant improvements in efficiency.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1639-1650"},"PeriodicalIF":6.2,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00402g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reprogramming pretrained language models for protein sequence representation learning†","authors":"Ria Vinod, Pin-Yu Chen and Payel Das","doi":"10.1039/D4DD00195H","DOIUrl":"https://doi.org/10.1039/D4DD00195H","url":null,"abstract":"<p >Machine learning-guided solutions for protein learning tasks have made significant headway in recent years. However, success in scientific discovery tasks is limited by the accessibility of well-defined and labeled in-domain data. To tackle the low-data constraint, recent adaptions of deep learning models pretrained on millions of protein sequences have shown promise; however, the construction of such domain-specific large-scale models is computationally expensive. Herein, we propose representation reprogramming <em>via</em> dictionary learning (R2DL), an end-to-end representation learning framework in which we reprogram deep models for alternate-domain tasks that can perform well on protein property prediction with significantly fewer training samples. R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences, by learning a sparse linear mapping between English and protein sequence vocabulary embeddings. Our model can attain better accuracy and significantly improve the data efficiency by up to 10<small><sup>4</sup></small> times over the baselines set by pretrained and standard supervised methods. To this end, we reprogram several recent state-of-the-art pretrained English language classification models (BERT, TinyBERT, T5, and roBERTa) and benchmark on a set of protein physicochemical prediction tasks (secondary structure, stability, homology, and solubility) as well as on a biomedically relevant set of protein function prediction tasks (antimicrobial, toxicity, antibody affinity, and protein–protein interaction).</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1591-1601"},"PeriodicalIF":6.2,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00195h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sayan Doloi, Maloy Das, Yujia Li, Zen Han Cho, Xingchi Xiao, John V. Hanna, Matthew Osvaldo and Leonard Ng Wei Tat
{"title":"Democratizing self-driving labs: advances in low-cost 3D printing for laboratory automation","authors":"Sayan Doloi, Maloy Das, Yujia Li, Zen Han Cho, Xingchi Xiao, John V. Hanna, Matthew Osvaldo and Leonard Ng Wei Tat","doi":"10.1039/D4DD00411F","DOIUrl":"https://doi.org/10.1039/D4DD00411F","url":null,"abstract":"<p >Laboratory automation through self-driving labs represents a transformative approach to accelerating scientific discovery, particularly in chemical sciences, biological sciences, materials science, and high-throughput experimentation. However, widespread adoption of these technologies faces a significant barrier: the prohibitive costs of commercial automation systems, which can range from tens to hundreds of thousands of dollars. This financial hurdle has created a technological divide, limiting access primarily to well-funded institutions and leaving many research facilities unable to leverage the benefits of automated experimentation. 3D printing technology emerges as a democratizing force in this landscape, offering a revolutionary solution to the accessibility challenge. By enabling the production of customizable laboratory equipment at a fraction of the cost of commercial alternatives, 3D printing is transforming how researchers approach laboratory automation. This approach not only reduces financial barriers but also promotes innovation through open-source designs, allowing researchers to share, modify, and improve upon existing solutions. This review addresses a critical gap in the current literature by exploring both the transformation of low-cost Fused Deposition Modelling (FDM) 3D printers into sophisticated automation platforms and the use of FDM 3D-printed components to develop a broad range of affordable laboratory automation systems. Furthermore, we explore how strategic modifications enable these systems to serve as automatic liquid handlers, robotic arms, automated sample preparation and detection systems, chemical reactionware, automated imaging systems and bioprinting units. The integration of these modified 3D-printed components with machine learning and artificial intelligence algorithms creates unprecedented opportunities for developing accessible, highly flexible self-driving laboratories.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1685-1721"},"PeriodicalIF":6.2,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00411f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijie Ding, Chi-Huan Tung, Zhiqiang Cao, Zekun Ye, Xiaodan Gu, Yan Xia, Wei-Ren Chen and Changwoo Do
{"title":"Machine learning-assisted profiling of a kinked ladder polymer structure using scattering†","authors":"Lijie Ding, Chi-Huan Tung, Zhiqiang Cao, Zekun Ye, Xiaodan Gu, Yan Xia, Wei-Ren Chen and Changwoo Do","doi":"10.1039/D5DD00051C","DOIUrl":"https://doi.org/10.1039/D5DD00051C","url":null,"abstract":"<p >Ladder polymers consisting of fused rings in the backbone have very limited conformational freedom, which results in very different properties from traditional linear polymers. However, accurately determining their size and chain conformations from solution scattering remains a challenge. Their chain conformations of kinked ladder polymers are largely governed by the structures and relative orientations or configurations of the repeat units, unlike conventional polymer chains whose bending angles between repeat units follow a unimodal Gaussian distribution. Meanwhile, traditional scattering models for polymer chains do not account for these unique structural features. This work introduces a novel approach that integrates machine learning with Monte Carlo simulations to construct a model that can describe the geometry of a type of kinked CANAL ladder polymers. We first develop a Monte Carlo simulation model for sampling the configuration space of CANAL ladder polymers, where each repeat unit is modeled as a biaxial segment. Then, we establish a machine learning-assisted scattering analysis framework based on Gaussian Process Regression. Finally, we conduct small-angle neutron scattering experiments on a CANAL ladder polymer solution to apply our approach. Our method uncovers structural features of such ladder polymers that conventional methods fail to capture.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1570-1577"},"PeriodicalIF":6.2,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00051c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyu Liu, Tongqi Wen, Beilin Ye, Zhuoyuan Li, Han Liu, Yang Ren and David J. Srolovitz
{"title":"Large language models for material property predictions: elastic constant tensor prediction and materials design†","authors":"Siyu Liu, Tongqi Wen, Beilin Ye, Zhuoyuan Li, Han Liu, Yang Ren and David J. Srolovitz","doi":"10.1039/D5DD00061K","DOIUrl":"https://doi.org/10.1039/D5DD00061K","url":null,"abstract":"<p >Efficient and accurate prediction of material properties is critical for advancing materials design and applications. Leveraging the rapid progress of large language models (LLMs), we introduce ElaTBot, a domain-specific LLM for predicting elastic constant tensors and enabling materials discovery as a case study. The proposed ElaTBot LLM enables simultaneous prediction of elastic constant tensors, bulk modulus at finite temperatures, and the generation of new materials with targeted properties. Integrating general LLMs (GPT-4o) and Retrieval-Augmented Generation (RAG) further enhances its predictive capabilities. A specialized variant, ElaTBot-DFT, designed for 0 K elastic constant tensor prediction, reduces the prediction errors by 33.1% compared with a domain-specific, materials science LLM (Darwin) trained on the same dataset. This natural language-based approach highlights the broader potential of LLMs for material property predictions and inverse design. Their multitask capabilities lay the foundation for multimodal materials design, enabling more integrated and versatile exploration of material systems.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1625-1638"},"PeriodicalIF":6.2,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00061k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Zhang, Lars Banko, Wolfgang Schuhmann, Alfred Ludwig and Markus Stricker
{"title":"Composition-property extrapolation for compositionally complex solid solutions based on word embeddings†","authors":"Lei Zhang, Lars Banko, Wolfgang Schuhmann, Alfred Ludwig and Markus Stricker","doi":"10.1039/D5DD00169B","DOIUrl":"https://doi.org/10.1039/D5DD00169B","url":null,"abstract":"<p >Mastering the challenge of predicting properties of unknown materials with multiple principal elements (high entropy alloys/compositionally complex solid solutions) is crucial for the speedup in materials discovery. We show and discuss three models, using experimentally measured electrocatalytic performance data from two ternary systems (Ag–Pd–Ru; Ag–Pd–Pt), to predict electrocatalytic performance in the shared quaternary system (Ag–Pd–Pt–Ru). As a starting point, we apply Gaussian Process Regression (GPR) based on composition as the feature, which includes both Ag and Pd, achieving an initial correlation coefficient for the prediction (<em>r</em>) of 0.63 and a determination coefficient (<em>r</em><small><sup>2</sup></small>) of 0.08. Second, we present a version of the GPR model using word embedding-derived materials vectors as features. Using materials-specific embedding vectors significantly improves the predictions, evident from an improved <em>r</em><small><sup>2</sup></small> of 0.65. The third model is based on a ‘standard vector method’ which synthesizes weighted vector representations of material properties as features, then creating a reference vector that results in a very good correlation with the quaternary system's material performance (resulting <em>r</em> of 0.94). Our approach demonstrates that existing experimental data combined with the latent knowledge of word embedding-derived representations of materials can be used effectively for materials discovery where data is typically scarce.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1578-1590"},"PeriodicalIF":6.2,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00169b?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144264333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A digital laboratory with a modular measurement system and standardized data format†","authors":"Kazunori Nishio, Akira Aiba, Kei Takihara, Yota Suzuki, Ryo Nakayama, Shigeru Kobayashi, Akira Abe, Haruki Baba, Shinichi Katagiri, Kazuki Omoto, Kazuki Ito, Ryota Shimizu and Taro Hitosugi","doi":"10.1039/D4DD00326H","DOIUrl":"https://doi.org/10.1039/D4DD00326H","url":null,"abstract":"<p >Machine learning, robotics, and data are the keys for accelerating the discovery of new materials. While collecting more data is essential, the experimental processes remain a bottleneck. In this study, we constructed a digital laboratory by interconnecting apparatuses using robots to collect experimental data (synthesis processes and measured physical properties, including measurement conditions) for solid materials research. A variety of modular experimental instruments are physically interconnected, enabling fully automated processes from material synthesis to measurement and analysis. The data from each measurement instrument are outputted in an XML format, namely MaiML, and collected in a cloud-based database. In addition, the data are analyzed by software and utilized on the cloud. Using this system, we demonstrate an autonomous synthesis of high-quality LiCoO<small><sub>2</sub></small> (001) thin films. The system maximized the X-ray diffraction peak-intensity ratio of LiCoO<small><sub>2</sub></small> (001) thin films using Bayesian optimization. This system demonstrates advanced automatic and autonomous material synthesis for data- and robot-driven materials science.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1734-1742"},"PeriodicalIF":6.2,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00326h?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dong Chen, Jian Jiang, Nicole Hayes, Zhe Su and Guo-Wei Wei
{"title":"Artificial intelligence approaches for anti-addiction drug discovery","authors":"Dong Chen, Jian Jiang, Nicole Hayes, Zhe Su and Guo-Wei Wei","doi":"10.1039/D5DD00032G","DOIUrl":"10.1039/D5DD00032G","url":null,"abstract":"<p >Drug addiction remains a complex global public health challenge, with traditional anti-addiction drug discovery hindered by limited efficacy and slow progress in targeting intricate neurochemical systems. Advanced algorithms within artificial intelligence (AI) present a transformative solution that boosts both speed and precision in therapeutic development. This review examines how artificial intelligence serves as a crucial element in developing anti-addiction medications by targeting the opioid system along with dopaminergic and GABAergic systems, which are essential in addiction pathology. It identifies upcoming trends promising in studying less-researched addiction-linked systems through innovative general-purpose drug discovery techniques. AI holds the potential to transform anti-addiction research by breaking down conventional limitations, which will enable the development of superior treatment methods.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 6","pages":" 1404-1416"},"PeriodicalIF":6.2,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12086782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144121519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}