{"title":"Accurate property predictions and reliability quantification in molecular design based on molecular similarity","authors":"Youquan Xu, Zhijiang Shao, Anjan K. Tula","doi":"10.1016/j.compchemeng.2025.109241","DOIUrl":null,"url":null,"abstract":"<div><div>A crucial step in developing high-performance chemical products is the design of their constituent molecules. Computer-aided molecular design (CAMD) has gained significant attention for its potential to accelerate and enhance this design process. The typical approach involves using machine learning models trained on existing molecular databases to predict the properties of potential molecules. From these predictions, the best candidates are selected. However, prediction errors can occur, leading to unreliability in the design and limiting the effectiveness of molecular discovery. To tackle this issue, this paper presents a novel framework for modeling molecular properties based on a similarity coefficient. This framework introduces a new formula for assessing molecular similarity. By calculating the similarity between a target molecule and those in an existing database, the framework selects the most similar molecules, creating a tailored dataset for model training. This significantly enhances the accuracy of property predictions. Additionally, a quantitative reliability index is proposed based on the similarity coefficient, which allows for more informed decision-making during the molecular selection process.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"201 ","pages":"Article 109241"},"PeriodicalIF":3.9000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425002455","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
A crucial step in developing high-performance chemical products is the design of their constituent molecules. Computer-aided molecular design (CAMD) has gained significant attention for its potential to accelerate and enhance this design process. The typical approach involves using machine learning models trained on existing molecular databases to predict the properties of potential molecules. From these predictions, the best candidates are selected. However, prediction errors can occur, leading to unreliability in the design and limiting the effectiveness of molecular discovery. To tackle this issue, this paper presents a novel framework for modeling molecular properties based on a similarity coefficient. This framework introduces a new formula for assessing molecular similarity. By calculating the similarity between a target molecule and those in an existing database, the framework selects the most similar molecules, creating a tailored dataset for model training. This significantly enhances the accuracy of property predictions. Additionally, a quantitative reliability index is proposed based on the similarity coefficient, which allows for more informed decision-making during the molecular selection process.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.