A simple similarity metric for comparing synthetic routes†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Samuel Genheden and Jason D. Shields
{"title":"A simple similarity metric for comparing synthetic routes†","authors":"Samuel Genheden and Jason D. Shields","doi":"10.1039/D4DD00292J","DOIUrl":null,"url":null,"abstract":"<p >Experimentally validated routes to synthetic compounds can be compared to each other by quantitative metrics (step count, yield, atom economy), or by qualitative assessments (strategy, novelty). AI-predicted routes are typically compared to experimental syntheses to check for an exact match among the top-ranked predictions (top-<em>N</em> accuracy). This method is ideal for the evaluation of retrosynthetic algorithms on large datasets (&gt;10<small><sup>6</sup></small> routes), but it cannot assess a degree of similarity between routes, which would be desirable for small datasets (&lt;10<small><sup>2</sup></small> routes). Here, we present a simple method to calculate a similarity score between any two synthetic routes to a given molecule. The score is based on two concepts: which bonds are formed during the synthesis; and how the atoms of the final compound are grouped together throughout the synthesis. As a result, the similarity score overlaps well with chemists' intuition and provides a finer assessment of prediction accuracy.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 1","pages":" 46-53"},"PeriodicalIF":6.2000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d4dd00292j?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d4dd00292j","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Experimentally validated routes to synthetic compounds can be compared to each other by quantitative metrics (step count, yield, atom economy), or by qualitative assessments (strategy, novelty). AI-predicted routes are typically compared to experimental syntheses to check for an exact match among the top-ranked predictions (top-N accuracy). This method is ideal for the evaluation of retrosynthetic algorithms on large datasets (>106 routes), but it cannot assess a degree of similarity between routes, which would be desirable for small datasets (<102 routes). Here, we present a simple method to calculate a similarity score between any two synthetic routes to a given molecule. The score is based on two concepts: which bonds are formed during the synthesis; and how the atoms of the final compound are grouped together throughout the synthesis. As a result, the similarity score overlaps well with chemists' intuition and provides a finer assessment of prediction accuracy.

Abstract Image

比较合成路线的简单相似性度量
经过实验验证的合成化合物的路线可以通过定量指标(步数、产率、原子经济性)或定性评估(策略、新颖性)相互比较。人工智能预测的路线通常与实验合成进行比较,以检查排名靠前的预测之间的精确匹配(top-N精度)。这种方法对于在大数据集(<; 106条路由)上评估反向合成算法是理想的,但它不能评估路由之间的相似性程度,这对于小数据集(<;102条路由)是理想的。在这里,我们提出了一种简单的方法来计算任何两个合成路线对给定分子的相似性得分。分数基于两个概念:在合成过程中形成了哪些键;在整个合成过程中,最终化合物的原子是如何组合在一起的。因此,相似性分数与化学家的直觉很好地重合,并提供了对预测准确性的更好评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信