A simple similarity metric for comparing synthetic routes†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2024-11-06 DOI:10.1039/D4DD00292J

Samuel Genheden and Jason D. Shields

引用次数: 0

Abstract

Experimentally validated routes to synthetic compounds can be compared to each other by quantitative metrics (step count, yield, atom economy), or by qualitative assessments (strategy, novelty). AI-predicted routes are typically compared to experimental syntheses to check for an exact match among the top-ranked predictions (top-N accuracy). This method is ideal for the evaluation of retrosynthetic algorithms on large datasets (>10⁶ routes), but it cannot assess a degree of similarity between routes, which would be desirable for small datasets (<10² routes). Here, we present a simple method to calculate a similarity score between any two synthetic routes to a given molecule. The score is based on two concepts: which bonds are formed during the synthesis; and how the atoms of the final compound are grouped together throughout the synthesis. As a result, the similarity score overlaps well with chemists' intuition and provides a finer assessment of prediction accuracy.

Abstract Image

查看原文本刊更多论文

比较合成路线的简单相似性度量

经过实验验证的合成化合物的路线可以通过定量指标（步数、产率、原子经济性）或定性评估（策略、新颖性）相互比较。人工智能预测的路线通常与实验合成进行比较，以检查排名靠前的预测之间的精确匹配（top-N精度）。这种方法对于在大数据集（<； 106条路由）上评估反向合成算法是理想的，但它不能评估路由之间的相似性程度，这对于小数据集（<；102条路由）是理想的。在这里，我们提出了一种简单的方法来计算任何两个合成路线对给定分子的相似性得分。分数基于两个概念：在合成过程中形成了哪些键；在整个合成过程中，最终化合物的原子是如何组合在一起的。因此，相似性分数与化学家的直觉很好地重合，并提供了对预测准确性的更好评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量