John Bradshaw, Anji Zhang, Babak Mahjour, David E. Graff, Marwin H. S. Segler and Connor W. Coley*,
{"title":"Challenging Reaction Prediction Models to Generalize to Novel Chemistry","authors":"John Bradshaw, Anji Zhang, Babak Mahjour, David E. Graff, Marwin H. S. Segler and Connor W. Coley*, ","doi":"10.1021/acscentsci.5c0005510.1021/acscentsci.5c00055","DOIUrl":null,"url":null,"abstract":"<p >Deep learning models for anticipating the products of organic reactions have found many use cases, including validating retrosynthetic pathways and constraining synthesis-based molecular design tools. Despite compelling performance on popular benchmark tasks, strange and erroneous predictions sometimes ensue when using these models in practice. The core issue is that common benchmarks test models in an <i>in-distribution</i> setting, whereas many real-world uses for these models are in <i>out-of-distribution</i> settings and require a greater degree of extrapolation. To better understand how current reaction predictors work in out-of-distribution domains, we report a series of more challenging evaluations of a prototypical SMILES-based deep learning model. First, we illustrate how performance on randomly sampled data sets is overly optimistic compared to performance when generalizing to new patents or new authors. Second, we conduct time splits that evaluate how models perform when tested on reactions published years after those in their training set, mimicking real-world deployment. Finally, we consider extrapolation across reaction classes to reflect what would be required for the discovery of novel reaction types. This panel of tasks can reveal the capabilities and limitations of today’s reaction predictors, acting as a crucial first step in the development of tomorrow’s next-generation models capable of reaction discovery.</p><p >Despite excellent benchmark performance, ML models for reaction prediction can struggle on real-world data─we evaluate these limitations by challenging a model on different out-of-distribution tasks.</p>","PeriodicalId":10,"journal":{"name":"ACS Central Science","volume":"11 4","pages":"539–549 539–549"},"PeriodicalIF":12.7000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acscentsci.5c00055","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Central Science","FirstCategoryId":"92","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acscentsci.5c00055","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning models for anticipating the products of organic reactions have found many use cases, including validating retrosynthetic pathways and constraining synthesis-based molecular design tools. Despite compelling performance on popular benchmark tasks, strange and erroneous predictions sometimes ensue when using these models in practice. The core issue is that common benchmarks test models in an in-distribution setting, whereas many real-world uses for these models are in out-of-distribution settings and require a greater degree of extrapolation. To better understand how current reaction predictors work in out-of-distribution domains, we report a series of more challenging evaluations of a prototypical SMILES-based deep learning model. First, we illustrate how performance on randomly sampled data sets is overly optimistic compared to performance when generalizing to new patents or new authors. Second, we conduct time splits that evaluate how models perform when tested on reactions published years after those in their training set, mimicking real-world deployment. Finally, we consider extrapolation across reaction classes to reflect what would be required for the discovery of novel reaction types. This panel of tasks can reveal the capabilities and limitations of today’s reaction predictors, acting as a crucial first step in the development of tomorrow’s next-generation models capable of reaction discovery.
Despite excellent benchmark performance, ML models for reaction prediction can struggle on real-world data─we evaluate these limitations by challenging a model on different out-of-distribution tasks.
期刊介绍:
ACS Central Science publishes significant primary reports on research in chemistry and allied fields where chemical approaches are pivotal. As the first fully open-access journal by the American Chemical Society, it covers compelling and important contributions to the broad chemistry and scientific community. "Central science," a term popularized nearly 40 years ago, emphasizes chemistry's central role in connecting physical and life sciences, and fundamental sciences with applied disciplines like medicine and engineering. The journal focuses on exceptional quality articles, addressing advances in fundamental chemistry and interdisciplinary research.