{"title":"QMGBP-DL:量子分子图带隙预测的深度学习和机器学习方法。","authors":"Outhman Abbassi, Soumia Ziti","doi":"10.1007/s11030-025-11178-7","DOIUrl":null,"url":null,"abstract":"<p><p>Predicting molecular and quantum material properties, especially the band gap, is crucial for accelerating discoveries in drug design and material science. Although graph neural networks and probabilistic encoders are well established in molecular data analysis, their targeted integration and application for band-gap prediction remain an active research area. This paper introduces QMGBP-DL, a deep learning approach that combines a molecular graph encoder with machine learning models to improve the prediction accuracy of molecular and material band-gap energy. The encoder uses graph convolutional networks to derive latent representations of chemical structures from SMILES strings, optimized via Kullback-Leibler divergence loss. These representations serve as inputs for training various machine learning models to predict properties. QMGBP-DL's effectiveness is assessed using the QM9, PCQM4M, and OPV datasets, demonstrating significant improvements, particularly with a random forest model for property prediction. A comparative analysis against established approaches DenseGNN, MEGNet, and ALIGNN reveals that QMGBP-DL excels in predicting HOMO, LUMO, and band gap, achieving notably lower MAE values. The integration of GCN-derived latent spaces with traditional machine learning models, especially Random Forest, provides a powerful approach for band-gap prediction. The results highlight the efficacy of our integrated approach, showcasing that graph-based molecular encoding combined with machine learning, particularly Random Forest, is highly effective for accurate band-gap prediction, thereby facilitating material discovery and design.</p>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"QMGBP-DL: a deep learning and machine learning approach for quantum molecular graph band-gap prediction.\",\"authors\":\"Outhman Abbassi, Soumia Ziti\",\"doi\":\"10.1007/s11030-025-11178-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Predicting molecular and quantum material properties, especially the band gap, is crucial for accelerating discoveries in drug design and material science. Although graph neural networks and probabilistic encoders are well established in molecular data analysis, their targeted integration and application for band-gap prediction remain an active research area. This paper introduces QMGBP-DL, a deep learning approach that combines a molecular graph encoder with machine learning models to improve the prediction accuracy of molecular and material band-gap energy. The encoder uses graph convolutional networks to derive latent representations of chemical structures from SMILES strings, optimized via Kullback-Leibler divergence loss. These representations serve as inputs for training various machine learning models to predict properties. QMGBP-DL's effectiveness is assessed using the QM9, PCQM4M, and OPV datasets, demonstrating significant improvements, particularly with a random forest model for property prediction. A comparative analysis against established approaches DenseGNN, MEGNet, and ALIGNN reveals that QMGBP-DL excels in predicting HOMO, LUMO, and band gap, achieving notably lower MAE values. The integration of GCN-derived latent spaces with traditional machine learning models, especially Random Forest, provides a powerful approach for band-gap prediction. The results highlight the efficacy of our integrated approach, showcasing that graph-based molecular encoding combined with machine learning, particularly Random Forest, is highly effective for accurate band-gap prediction, thereby facilitating material discovery and design.</p>\",\"PeriodicalId\":708,\"journal\":{\"name\":\"Molecular Diversity\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Diversity\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1007/s11030-025-11178-7\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1007/s11030-025-11178-7","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
QMGBP-DL: a deep learning and machine learning approach for quantum molecular graph band-gap prediction.
Predicting molecular and quantum material properties, especially the band gap, is crucial for accelerating discoveries in drug design and material science. Although graph neural networks and probabilistic encoders are well established in molecular data analysis, their targeted integration and application for band-gap prediction remain an active research area. This paper introduces QMGBP-DL, a deep learning approach that combines a molecular graph encoder with machine learning models to improve the prediction accuracy of molecular and material band-gap energy. The encoder uses graph convolutional networks to derive latent representations of chemical structures from SMILES strings, optimized via Kullback-Leibler divergence loss. These representations serve as inputs for training various machine learning models to predict properties. QMGBP-DL's effectiveness is assessed using the QM9, PCQM4M, and OPV datasets, demonstrating significant improvements, particularly with a random forest model for property prediction. A comparative analysis against established approaches DenseGNN, MEGNet, and ALIGNN reveals that QMGBP-DL excels in predicting HOMO, LUMO, and band gap, achieving notably lower MAE values. The integration of GCN-derived latent spaces with traditional machine learning models, especially Random Forest, provides a powerful approach for band-gap prediction. The results highlight the efficacy of our integrated approach, showcasing that graph-based molecular encoding combined with machine learning, particularly Random Forest, is highly effective for accurate band-gap prediction, thereby facilitating material discovery and design.
期刊介绍:
Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including:
combinatorial chemistry and parallel synthesis;
small molecule libraries;
microwave synthesis;
flow synthesis;
fluorous synthesis;
diversity oriented synthesis (DOS);
nanoreactors;
click chemistry;
multiplex technologies;
fragment- and ligand-based design;
structure/function/SAR;
computational chemistry and molecular design;
chemoinformatics;
screening techniques and screening interfaces;
analytical and purification methods;
robotics, automation and miniaturization;
targeted libraries;
display libraries;
peptides and peptoids;
proteins;
oligonucleotides;
carbohydrates;
natural diversity;
new methods of library formulation and deconvolution;
directed evolution, origin of life and recombination;
search techniques, landscapes, random chemistry and more;