Digital discovery最新文献

筛选
英文 中文
BitBIRCH: efficient clustering of large molecular libraries.
IF 6.2
Digital discovery Pub Date : 2025-03-13 DOI: 10.1039/d5dd00030k
Kenneth López Pérez, Vicky Jung, Lexin Chen, Kate Huddleston, Ramón Alain Miranda-Quintana
{"title":"BitBIRCH: efficient clustering of large molecular libraries.","authors":"Kenneth López Pérez, Vicky Jung, Lexin Chen, Kate Huddleston, Ramón Alain Miranda-Quintana","doi":"10.1039/d5dd00030k","DOIUrl":"10.1039/d5dd00030k","url":null,"abstract":"<p><p>The widespread use of Machine Learning (ML) techniques in chemical applications has come with the pressing need to analyze extremely large molecular libraries. In particular, clustering remains one of the most common tools to dissect the chemical space. Unfortunately, most current approaches present unfavorable time and memory scaling, which makes them unsuitable to handle million- and billion-sized sets. Here, we propose to bypass these problems with a time- and memory-efficient clustering algorithm, BitBIRCH. This method uses a tree structure similar to the one found in the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm to ensure <i>O</i>(<i>N</i>) time scaling. BitBIRCH leverages the instant similarity (iSIM) formalism to process binary fingerprints, allowing the use of Tanimoto similarity, and reducing memory requirements. Our tests show that BitBIRCH is already >1000 times faster than standard implementations of the Taylor-Butina clustering for libraries with 1 500 000 molecules. BitBIRCH increases efficiency without compromising the quality of the resulting clusters. We explore strategies to handle large sets, which we applied in the clustering of one billion molecules under 5 hours using a parallel/iterative BitBIRCH approximation.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" ","pages":""},"PeriodicalIF":6.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11912344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143665477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling CO2 reactivity with data-driven methods†
IF 6.2
Digital discovery Pub Date : 2025-02-26 DOI: 10.1039/D5DD00020C
Maike Eckhoff, Kerstin L. Bublitz and Jonny Proppe
{"title":"Unveiling CO2 reactivity with data-driven methods†","authors":"Maike Eckhoff, Kerstin L. Bublitz and Jonny Proppe","doi":"10.1039/D5DD00020C","DOIUrl":"https://doi.org/10.1039/D5DD00020C","url":null,"abstract":"<p >Carbon dioxide is a versatile C1 building block in organic synthesis. Understanding its reactivity is crucial for predicting reaction outcomes and identifying suitable substrates for the creation of value-added chemicals and drugs. A recent study [Li <em>et al.</em>, <em>J. Am. Chem. Soc.</em>, 2020, <strong>142</strong>, 8383] estimated the reactivity of CO<small><sub>2</sub></small> in the form of Mayr's electrophilicity parameter <em>E</em> on the basis of a single carboxylation reaction. The disagreement between experiment (<em>E</em> = −16.3) and computation (<em>E</em> = −11.4) corresponds to a deviation of up to ten orders of magnitude in bimolecular rate constants of carboxylation reactions according to the Mayr–Patz equation, log <em>k</em> = <em>s</em><small><sub>N</sub></small>(<em>E</em> + <em>N</em>). Here, we introduce a data-driven approach incorporating supervised learning, quantum chemistry, and uncertainty quantification to resolve this discrepancy. The dataset used for reducing the uncertainty in <em>E</em>(CO<small><sub>2</sub></small>) represents 15 carboxylation reactions in DMSO. However, experimental data is only available for one of these reactions. To ensure reliable predictions, we selected a training set composed of this and 19 additional reactions comprising heteroallenes other than CO<small><sub>2</sub></small> for which experimental data is available. With the new data-driven protocol, we can narrow down the electrophilicity of carbon dioxide to <em>E</em>(CO<small><sub>2</sub></small>) = −14.6(5) with 95% confidence, and suggest an electrophile-specific sensitivity parameter <em>s</em><small><sub>E</sub></small>(CO<small><sub>2</sub></small>) = 0.81(6), resulting in an extended reactivity equation, log <em>k</em> = <em>s</em><small><sub>E</sub></small><em>s</em><small><sub>N</sub></small>(<em>E</em> + <em>N</em>) [Mayr, <em>Tetrahedron</em>, 2015, <strong>71</strong>, 5095].</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 868-878"},"PeriodicalIF":6.2,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00020c?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SANE: strategic autonomous non-smooth exploration for multiple optima discovery in multi-modal and non-differentiable black-box functions†
IF 6.2
Digital discovery Pub Date : 2025-02-18 DOI: 10.1039/D4DD00299G
Arpan Biswas, Rama Vasudevan, Rohit Pant, Ichiro Takeuchi, Hiroshi Funakubo and Yongtao Liu
{"title":"SANE: strategic autonomous non-smooth exploration for multiple optima discovery in multi-modal and non-differentiable black-box functions†","authors":"Arpan Biswas, Rama Vasudevan, Rohit Pant, Ichiro Takeuchi, Hiroshi Funakubo and Yongtao Liu","doi":"10.1039/D4DD00299G","DOIUrl":"https://doi.org/10.1039/D4DD00299G","url":null,"abstract":"<p >Both computational and experimental material discovery bring forth the challenge of exploring multidimensional and multimodal parameter spaces, such as phase diagrams of Hamiltonians with multiple interactions, composition spaces of combinatorial libraries, material structure image spaces, and molecular embedding spaces. Often these systems are black-boxes and time-consuming to evaluate, which resulted in strong interest towards active learning methods such as Bayesian optimization (BO). However, these systems are often noisy which make the black box function severely multi-modal and non-differentiable, where a vanilla BO can get overly focused near a single or faux optimum, deviating from the broader goal of scientific discovery. To address these limitations, here we developed Strategic Autonomous Non-Smooth Exploration (SANE) to facilitate an intelligent Bayesian optimized navigation with a proposed cost-driven probabilistic acquisition function to find multiple global and local optimal regions, avoiding the tendency to becoming trapped in a single optimum. To distinguish between a true and false optimal region due to noisy experimental measurements, a human (domain) knowledge driven dynamic surrogate gate is integrated with SANE. We implemented the gate-SANE into pre-acquired piezoresponse spectroscopy data of a ferroelectric combinatorial library with high noise levels in specific regions, and piezoresponse force microscopy (PFM) hyperspectral data. SANE demonstrated better performance than classical BO to facilitate the exploration of multiple optimal regions and thereby prioritized learning with higher coverage of scientific values in autonomous experiments. Our work showcases the potential application of this method to real-world experiments, where such combined strategic and human intervening approaches can be critical to unlocking new discoveries in autonomous research.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 853-867"},"PeriodicalIF":6.2,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00299g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dissecting errors in machine learning for retrosynthesis: a granular metric framework and a transformer-based model for more informative predictions
IF 6.2
Digital discovery Pub Date : 2025-02-18 DOI: 10.1039/D4DD00263F
Arihanth Srikar Tadanki, H. Surya Prakash Rao and U. Deva Priyakumar
{"title":"Dissecting errors in machine learning for retrosynthesis: a granular metric framework and a transformer-based model for more informative predictions","authors":"Arihanth Srikar Tadanki, H. Surya Prakash Rao and U. Deva Priyakumar","doi":"10.1039/D4DD00263F","DOIUrl":"https://doi.org/10.1039/D4DD00263F","url":null,"abstract":"<p >Chemical reaction prediction, encompassing forward synthesis and retrosynthesis, stands as a fundamental challenge in organic synthesis. A widely adopted computational approach frames synthesis prediction as a sequence-to-sequence translation task, using the commonly used SMILES representation for molecules. The current evaluation of machine learning methods for retrosynthesis assumes perfect training data, overlooking imperfections in reaction equations in popular datasets, such as missing reactants, products, other physical and practical constraints such as temperature and cost, primarily due to a focus on the target molecule. This limitation leads to an incomplete representation of viable synthetic routes, especially when multiple sets of reactants can yield a given desired product. In response to these shortcomings, this study examines the prevailing evaluation methods and introduces comprehensive metrics designed to address imperfections in the dataset. Our novel metrics not only assess absolute accuracy by comparing predicted outputs with ground truth but also introduce a nuanced evaluation approach. We provide scores for partial correctness and compute adjusted accuracy through graph matching, acknowledging the inherent complexities of retrosynthetic pathways. Additionally, we explore the impact of small molecular augmentations while preserving chemical properties and employ similarity matching to enhance the assessment of prediction quality. We introduce SynFormer, a sequence-to-sequence model tailored for SMILES representation. It incorporates architectural enhancements to the original transformer, effectively tackling the challenges of chemical reaction prediction. SynFormer achieves a Top-1 accuracy of 53.2% on the USPTO-50k dataset, matching the performance of widely accepted models like Chemformer, but with greater efficiency by eliminating the need for pre-training.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 831-845"},"PeriodicalIF":6.2,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00263f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active learning high coverage sets of complementary reaction conditions†
IF 6.2
Digital discovery Pub Date : 2025-02-17 DOI: 10.1039/D4DD00365A
Sofia L. Sivilotti, David M. Friday and Nicholas E. Jackson
{"title":"Active learning high coverage sets of complementary reaction conditions†","authors":"Sofia L. Sivilotti, David M. Friday and Nicholas E. Jackson","doi":"10.1039/D4DD00365A","DOIUrl":"https://doi.org/10.1039/D4DD00365A","url":null,"abstract":"<p >Chemical reaction conditions capable of producing high yields over diverse reactants are a desired component of nearly all chemical and materials discovery campaigns. While much work has been done to discover individual general reaction conditions, any single conditions are necessarily limited over increasingly diverse chemical spaces. A potential solution to this problem is to identify small sets of complementary reaction conditions that, when combined, cover a larger chemical space than any one general reaction condition. In this work, we analyze experimentally derived datasets to assess the relative performance of individual general reaction conditions <em>vs.</em> sets of complementary reaction conditions. We then propose and benchmark active learning methods to efficiently discover these complimentary sets of conditions. The results show the value of active learning in identifying complementary sets of reaction conditions and provide an avenue for improving synthetic hit rates in high-throughput synthesis campaigns.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 846-852"},"PeriodicalIF":6.2,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00365a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-powered exploration of molecular vibrations, phonons, and spectroscopy
IF 6.2
Digital discovery Pub Date : 2025-02-14 DOI: 10.1039/D4DD00353E
Bowen Han, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Mouyang Cheng, Mingda Li and Yongqiang Cheng
{"title":"AI-powered exploration of molecular vibrations, phonons, and spectroscopy","authors":"Bowen Han, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Mouyang Cheng, Mingda Li and Yongqiang Cheng","doi":"10.1039/D4DD00353E","DOIUrl":"https://doi.org/10.1039/D4DD00353E","url":null,"abstract":"<p >The vibrational dynamics of molecules and solids play a critical role in defining material properties, particularly their thermal behaviors. However, theoretical calculations of these dynamics are often computationally intensive, while experimental approaches can be technically complex and resource-demanding. Recent advancements in data-driven artificial intelligence (AI) methodologies have substantially enhanced the efficiency of these studies. This review explores the latest progress in AI-driven methods for investigating atomic vibrations, emphasizing their role in accelerating computations and enabling rapid predictions of lattice dynamics, phonon behaviors, molecular dynamics, and vibrational spectra. Key developments are discussed, including advancements in databases, structural representations, machine-learning interatomic potentials, graph neural networks, and other emerging approaches. Compared to traditional techniques, AI methods exhibit transformative potential, dramatically improving the efficiency and scope of research in materials science. The review concludes by highlighting the promising future of AI-driven innovations in the study of atomic vibrations.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 584-624"},"PeriodicalIF":6.2,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00353e?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative evaluation of anharmonic bond potentials for molecular simulations†
IF 6.2
Digital discovery Pub Date : 2025-02-13 DOI: 10.1039/D4DD00344F
Paul J. van Maaren and David van der Spoel
{"title":"Quantitative evaluation of anharmonic bond potentials for molecular simulations†","authors":"Paul J. van Maaren and David van der Spoel","doi":"10.1039/D4DD00344F","DOIUrl":"https://doi.org/10.1039/D4DD00344F","url":null,"abstract":"<p >Most general force fields only implement a harmonic potential to model covalent bonds. In addition, in some force fields, all or a selection of the covalent bonds are constrained in molecular dynamics simulations. Nevertheless, it is possible to implement accurate bond potentials for a relatively small computational cost. Such potentials may be important for spectroscopic applications, free energy perturbation calculations or for studying reactions using empirical valence bond theory. Here, we evaluate different bond potentials for diatomic molecules. Based on quantum-chemical scans around the equilibrium distance of 71 molecules using the MP2/aug-cc-pVTZ level of theory as well as CCSD(T) with the same basis-set, we determine the quality of fit to the data of 28 model potentials. As expected, a large spread in accuracies of the potentials is found and more complex potentials generally provide a better fit. As a second and more challenging test, five spectroscopic parameters (<em>ω</em><small><sub>e</sub></small>, <em>ω</em><small><sub>e</sub></small><em>x</em><small><sub>e</sub></small>, <em>α</em><small><sub>e</sub></small>, <em>B</em><small><sub>e</sub></small> and <em>D</em><small><sub>e</sub></small>) predicted based on quantum chemistry as well as the fitted potentials are compared to experimental data. A handful of the 28 potentials tested are found to be accurate. Of these, we suggest that the potential due to Hua (<em>Phys. Rev. A</em>, <strong>42</strong> (1990), 2524) could be a suitable choice for implementation in molecular simulations codes, since it is considerably more accurate than the well-known Morse potential (<em>Phys. Rev.</em>, <strong>34</strong> (1929), 57) at a very similar computational cost.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 824-830"},"PeriodicalIF":6.2,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00344f?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction: Distortion/interaction analysis via machine learning
IF 6.2
Digital discovery Pub Date : 2025-02-06 DOI: 10.1039/D5DD90005K
Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi and Matthew N. Grayson
{"title":"Correction: Distortion/interaction analysis via machine learning","authors":"Samuel G. Espley, Samuel S. Allsop, David Buttar, Simone Tomasi and Matthew N. Grayson","doi":"10.1039/D5DD90005K","DOIUrl":"https://doi.org/10.1039/D5DD90005K","url":null,"abstract":"<p >Correction for ‘Distortion/interaction analysis <em>via</em> machine learning’ by Samuel G. Espley <em>et al.</em>, <em>Digital Discovery</em>, 2024, <strong>3</strong>, 2479–2486, https://doi.org/10.1039/D4DD00224E.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 879-879"},"PeriodicalIF":6.2,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd90005k?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Active learning-guided exploration of thermally conductive polymers under strain†
IF 6.2
Digital discovery Pub Date : 2025-02-06 DOI: 10.1039/D4DD00267A
Renzheng Zhang, Jiaxin Xu, Hanfeng Zhang, Guoyue Xu and Tengfei Luo
{"title":"Active learning-guided exploration of thermally conductive polymers under strain†","authors":"Renzheng Zhang, Jiaxin Xu, Hanfeng Zhang, Guoyue Xu and Tengfei Luo","doi":"10.1039/D4DD00267A","DOIUrl":"https://doi.org/10.1039/D4DD00267A","url":null,"abstract":"<p >Finding amorphous polymers with higher thermal conductivity (TC) is technologically important, as they are ubiquitous in applications where heat transfer is crucial. While TC is generally low in amorphous polymers, it can be enhanced by mechanical strain, which facilitates the alignment of polymer chains. However, using the conventional Edisonian approach, the discovery of polymers that may have high TC after strain can be time-consuming, with no guarantee of success. In this work, we employ an active learning scheme to speed up the discovery of amorphous polymers with high TC under strain. Polymers under 2× strain are simulated using molecular dynamics (MD), and their TCs are calculated using non-equilibrium MD. A Gaussian process gegression (GPR) model is then built using these MD data as the training set. The GPR model is used to screen the PoLyInfo database, and the predicted mean TC and uncertainty are used towards an acquisition function to recommend new polymers for labeling <em>via</em> Bayesian optimization. The TCs of these selected polymers are then labeled using MD simulations, and the obtained data are incorporated to rebuild the GPR model, initiating a new iteration of the active learning cycle. Over a few cycles, we identified ten strained polymers with significantly higher TC (&gt;1 W mK<small><sup>−1</sup></small>) than the original dataset, and the results offer valuable insights into the structural characteristics favorable for achieving high TC of polymers subject to strain.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 812-823"},"PeriodicalIF":6.2,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00267a?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing large language models for quantum chemistry simulation input generation†
IF 6.2
Digital discovery Pub Date : 2025-02-05 DOI: 10.1039/D4DD00366G
Pieter Floris Jacobs and Robert Pollice
{"title":"Developing large language models for quantum chemistry simulation input generation†","authors":"Pieter Floris Jacobs and Robert Pollice","doi":"10.1039/D4DD00366G","DOIUrl":"https://doi.org/10.1039/D4DD00366G","url":null,"abstract":"<p >Scientists across domains are often challenged to master domain-specific languages (DSLs) for their research, which are merely a means to an end but are pervasive in fields like computational chemistry. Automated code generation promises to overcome this barrier, allowing researchers to focus on their core expertise. While large language models (LLMs) have shown impressive capabilities in synthesizing code from natural language prompts, they often struggle with DSLs, likely due to their limited exposure during training. In this work, we investigate the potential of foundational LLMs for generating input files for the quantum chemistry package ORCA by establishing a general framework that can be adapted to other DSLs. To improve upon <img> as our base model, we explore the impact of prompt engineering, retrieval-augmented generation, and finetuning <em>via</em> synthetically generated datasets. We find that finetuning, even with synthetic datasets as small as 500 samples, significantly improves performance. Additionally, we observe that finetuning shows synergism with advanced prompt engineering such as chain-of-thought prompting. Consequently, our best finetuned models outperform the formally much more powerful <img> model. In turn, finetuning GPT-4o with the same small synthetic dataset leads to a further substantial performance improvement, suggesting our approach to be more general rather than limited to LLMs with poor base proficiency. All tools and datasets are made openly available for future research. We believe that this research lays the groundwork for a wider adoption of LLMs for DSLs in chemistry and beyond.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 3","pages":" 762-775"},"PeriodicalIF":6.2,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00366g?page=search","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信