Tensor databases empower AI for science: A case study on retrosynthetic analysis

Xueya Zhang , Guoxin Kang , Boyang Xiao , Jianfeng Zhan
{"title":"Tensor databases empower AI for science: A case study on retrosynthetic analysis","authors":"Xueya Zhang ,&nbsp;Guoxin Kang ,&nbsp;Boyang Xiao ,&nbsp;Jianfeng Zhan","doi":"10.1016/j.tbench.2025.100216","DOIUrl":null,"url":null,"abstract":"<div><div>Retrosynthetic analysis is highly significant in chemistry, biology, and materials science, providing essential support for the rational design, synthesis, and optimization of compounds across diverse Artificial Intelligence for Science (AI4S) applications. Retrosynthetic analysis focuses on exploring pathways from products to reactants, and this is typically conducted using deep learning-based generative models. However, existing retrosynthetic analysis often overlooks how reaction conditions significantly impact chemical reactions. This causes existing work to lack unified models that can provide full-cycle services for retrosynthetic analysis, and also greatly limits the overall prediction accuracy of retrosynthetic analysis. These two issues cause users to depend on various independent models and tools, leading to high labor time and cost overhead.</div><div>To solve these issues, we define the boundary conditions of chemical reactions based on the Evaluatology theory and propose BigTensorDB, the first tensor database which integrates storage, prediction generation, search, and analysis functions. BigTensorDB designs the tensor schema for efficiently storing all the key information related to chemical reactions, including reaction conditions. BigTensorDB supports a full-cycle retrosynthetic analysis pipeline. It begins with predicting generation reaction paths, searching for approximate real reactions based on the tensor schema, and concludes with feasibility analysis, which enhances the interpretability of prediction results. BigTensorDB can effectively reduce usage costs and improve efficiency for users during the full-cycle retrosynthetic analysis process. Meanwhile, it provides a potential solution to the low accuracy issue, encouraging researchers to focus on improving full-cycle accuracy.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 1","pages":"Article 100216"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485925000298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Retrosynthetic analysis is highly significant in chemistry, biology, and materials science, providing essential support for the rational design, synthesis, and optimization of compounds across diverse Artificial Intelligence for Science (AI4S) applications. Retrosynthetic analysis focuses on exploring pathways from products to reactants, and this is typically conducted using deep learning-based generative models. However, existing retrosynthetic analysis often overlooks how reaction conditions significantly impact chemical reactions. This causes existing work to lack unified models that can provide full-cycle services for retrosynthetic analysis, and also greatly limits the overall prediction accuracy of retrosynthetic analysis. These two issues cause users to depend on various independent models and tools, leading to high labor time and cost overhead.
To solve these issues, we define the boundary conditions of chemical reactions based on the Evaluatology theory and propose BigTensorDB, the first tensor database which integrates storage, prediction generation, search, and analysis functions. BigTensorDB designs the tensor schema for efficiently storing all the key information related to chemical reactions, including reaction conditions. BigTensorDB supports a full-cycle retrosynthetic analysis pipeline. It begins with predicting generation reaction paths, searching for approximate real reactions based on the tensor schema, and concludes with feasibility analysis, which enhances the interpretability of prediction results. BigTensorDB can effectively reduce usage costs and improve efficiency for users during the full-cycle retrosynthetic analysis process. Meanwhile, it provides a potential solution to the low accuracy issue, encouraging researchers to focus on improving full-cycle accuracy.

Abstract Image

张量数据库为科学赋予AI力量:一个关于反合成分析的案例研究
反合成分析在化学、生物学和材料科学中具有重要意义,为各种人工智能科学(AI4S)应用中化合物的合理设计、合成和优化提供重要支持。反合成分析侧重于探索从产物到反应物的途径,这通常使用基于深度学习的生成模型进行。然而,现有的反合成分析往往忽略了反应条件对化学反应的重要影响。这导致现有工作缺乏能够为逆合成分析提供全周期服务的统一模型,也极大地限制了逆合成分析的整体预测精度。这两个问题导致用户依赖于各种独立的模型和工具,从而导致较高的劳动时间和成本开销。为了解决这些问题,我们基于Evaluatology理论定义了化学反应的边界条件,并提出了首个集存储、预测生成、搜索和分析功能于一体的张量数据库BigTensorDB。BigTensorDB设计了张量模式,用于高效存储与化学反应相关的所有关键信息,包括反应条件。BigTensorDB支持全周期的反合成分析管道。从预测生成反应路径开始,基于张量模式寻找近似真实反应,最后进行可行性分析,增强了预测结果的可解释性。BigTensorDB可以有效降低用户在全周期反合成分析过程中的使用成本,提高效率。同时,它为低精度问题提供了一个潜在的解决方案,鼓励研究人员将重点放在提高全周期精度上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信