{"title":"Tensor databases empower AI for science: A case study on retrosynthetic analysis","authors":"Xueya Zhang , Guoxin Kang , Boyang Xiao , Jianfeng Zhan","doi":"10.1016/j.tbench.2025.100216","DOIUrl":null,"url":null,"abstract":"<div><div>Retrosynthetic analysis is highly significant in chemistry, biology, and materials science, providing essential support for the rational design, synthesis, and optimization of compounds across diverse Artificial Intelligence for Science (AI4S) applications. Retrosynthetic analysis focuses on exploring pathways from products to reactants, and this is typically conducted using deep learning-based generative models. However, existing retrosynthetic analysis often overlooks how reaction conditions significantly impact chemical reactions. This causes existing work to lack unified models that can provide full-cycle services for retrosynthetic analysis, and also greatly limits the overall prediction accuracy of retrosynthetic analysis. These two issues cause users to depend on various independent models and tools, leading to high labor time and cost overhead.</div><div>To solve these issues, we define the boundary conditions of chemical reactions based on the Evaluatology theory and propose BigTensorDB, the first tensor database which integrates storage, prediction generation, search, and analysis functions. BigTensorDB designs the tensor schema for efficiently storing all the key information related to chemical reactions, including reaction conditions. BigTensorDB supports a full-cycle retrosynthetic analysis pipeline. It begins with predicting generation reaction paths, searching for approximate real reactions based on the tensor schema, and concludes with feasibility analysis, which enhances the interpretability of prediction results. BigTensorDB can effectively reduce usage costs and improve efficiency for users during the full-cycle retrosynthetic analysis process. Meanwhile, it provides a potential solution to the low accuracy issue, encouraging researchers to focus on improving full-cycle accuracy.</div></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":"5 1","pages":"Article 100216"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485925000298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Retrosynthetic analysis is highly significant in chemistry, biology, and materials science, providing essential support for the rational design, synthesis, and optimization of compounds across diverse Artificial Intelligence for Science (AI4S) applications. Retrosynthetic analysis focuses on exploring pathways from products to reactants, and this is typically conducted using deep learning-based generative models. However, existing retrosynthetic analysis often overlooks how reaction conditions significantly impact chemical reactions. This causes existing work to lack unified models that can provide full-cycle services for retrosynthetic analysis, and also greatly limits the overall prediction accuracy of retrosynthetic analysis. These two issues cause users to depend on various independent models and tools, leading to high labor time and cost overhead.
To solve these issues, we define the boundary conditions of chemical reactions based on the Evaluatology theory and propose BigTensorDB, the first tensor database which integrates storage, prediction generation, search, and analysis functions. BigTensorDB designs the tensor schema for efficiently storing all the key information related to chemical reactions, including reaction conditions. BigTensorDB supports a full-cycle retrosynthetic analysis pipeline. It begins with predicting generation reaction paths, searching for approximate real reactions based on the tensor schema, and concludes with feasibility analysis, which enhances the interpretability of prediction results. BigTensorDB can effectively reduce usage costs and improve efficiency for users during the full-cycle retrosynthetic analysis process. Meanwhile, it provides a potential solution to the low accuracy issue, encouraging researchers to focus on improving full-cycle accuracy.