Generating a chemical database of organic nanomers and applying active learning to predict HOMO, LUMO and band gap: Accelerating optoelectronic nanopolymer materials discovery.
{"title":"Generating a chemical database of organic nanomers and applying active learning to predict HOMO, LUMO and band gap: Accelerating optoelectronic nanopolymer materials discovery.","authors":"Qin Zhu, Yanwei Tang, Xinyao Ge, Chong Zhang, Xun Fu, Yongxia Wang, Dong Jin, Lizhu Dong, Jinyi Zhang, Qiang Zhao, Ying Wei, Xiaogang Cheng, Linghai Xie","doi":"10.1016/j.talanta.2025.128939","DOIUrl":null,"url":null,"abstract":"<p><p>Organic nanogrids are versatile molecular hornstones and nanoplatforms of organic high-dimensional, low-entropy materials. It is urgent to construct virtual databases of organic nanomers for accelerating the discovery and performance optimization of novel 0/1/2/3-dimensional nanopolymer optoelectronic materials. In this study, we generated a comprehensive dataset of 11,224 ladder-type gridarenes, covering a wide range of chemical compositions and structural variations. A random selection of 220 small sample sets was aggregated, and fragment-level constrained density functional theory (CDFT) was employed to extract molecular descriptors. These descriptors were then used to train machine learning models with high predictive accuracy for band gap, highest occupied molecular orbital (HOMO), and lowest unoccupied molecular orbital (LUMO) energies (the coefficient of determination values of 0.94, 0.92, and 0.87, respectively). During the active learning process, 3112 representative gridarenes were iteratively selected from our 11,224-compound library, refining band-gap predictions to a mean absolute error below 0.11 eV. This process pinpointed top candidates for blue-light emission and demonstrated an accelerated, data-driven route to next-generation organic optoelectronic nanomaterials.</p>","PeriodicalId":435,"journal":{"name":"Talanta","volume":"298 Pt A","pages":"128939"},"PeriodicalIF":6.1000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Talanta","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1016/j.talanta.2025.128939","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Organic nanogrids are versatile molecular hornstones and nanoplatforms of organic high-dimensional, low-entropy materials. It is urgent to construct virtual databases of organic nanomers for accelerating the discovery and performance optimization of novel 0/1/2/3-dimensional nanopolymer optoelectronic materials. In this study, we generated a comprehensive dataset of 11,224 ladder-type gridarenes, covering a wide range of chemical compositions and structural variations. A random selection of 220 small sample sets was aggregated, and fragment-level constrained density functional theory (CDFT) was employed to extract molecular descriptors. These descriptors were then used to train machine learning models with high predictive accuracy for band gap, highest occupied molecular orbital (HOMO), and lowest unoccupied molecular orbital (LUMO) energies (the coefficient of determination values of 0.94, 0.92, and 0.87, respectively). During the active learning process, 3112 representative gridarenes were iteratively selected from our 11,224-compound library, refining band-gap predictions to a mean absolute error below 0.11 eV. This process pinpointed top candidates for blue-light emission and demonstrated an accelerated, data-driven route to next-generation organic optoelectronic nanomaterials.
期刊介绍:
Talanta provides a forum for the publication of original research papers, short communications, and critical reviews in all branches of pure and applied analytical chemistry. Papers are evaluated based on established guidelines, including the fundamental nature of the study, scientific novelty, substantial improvement or advantage over existing technology or methods, and demonstrated analytical applicability. Original research papers on fundamental studies, and on novel sensor and instrumentation developments, are encouraged. Novel or improved applications in areas such as clinical and biological chemistry, environmental analysis, geochemistry, materials science and engineering, and analytical platforms for omics development are welcome.
Analytical performance of methods should be determined, including interference and matrix effects, and methods should be validated by comparison with a standard method, or analysis of a certified reference material. Simple spiking recoveries may not be sufficient. The developed method should especially comprise information on selectivity, sensitivity, detection limits, accuracy, and reliability. However, applying official validation or robustness studies to a routine method or technique does not necessarily constitute novelty. Proper statistical treatment of the data should be provided. Relevant literature should be cited, including related publications by the authors, and authors should discuss how their proposed methodology compares with previously reported methods.