Shijie Tao , Yi Feng , Wenmin Wang , Tiantian Han , Pieter E.S. Smith , Jun Jiang
{"title":"A machine learning protocol for geometric information retrieval from molecular spectra","authors":"Shijie Tao , Yi Feng , Wenmin Wang , Tiantian Han , Pieter E.S. Smith , Jun Jiang","doi":"10.1016/j.aichem.2023.100031","DOIUrl":null,"url":null,"abstract":"<div><p>Geometric information of molecules is closely related to their properties, and vibrational spectroscopy, as a common and powerful analytical tool for determining molecular structure, can assist in gaining precise geometric information. Traditional methods used to delineate spectrum-structure correlations are often expensive, time-consuming, and require extensive professional expertise. In this work, we used a machine learning protocol to construct a map from spectra to molecular geometric structures, and employed Grad-CAM, a convolutional network interpretation technology, to analyze which kinds of chemical information are important for determining our model’s results. The results obtained for six small molecules of differing structures demonstrate that the model is capable of (1) extracting the crucial spectral features that are vital to downstream tasks without necessitating any manual preprocessing, and (2) enabling retrieval of molecular structural information with high precision.</p></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"2 1","pages":"Article 100031"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949747723000313/pdfft?md5=8aed6656166ef3e340a5e81d46b42a1c&pid=1-s2.0-S2949747723000313-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747723000313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Geometric information of molecules is closely related to their properties, and vibrational spectroscopy, as a common and powerful analytical tool for determining molecular structure, can assist in gaining precise geometric information. Traditional methods used to delineate spectrum-structure correlations are often expensive, time-consuming, and require extensive professional expertise. In this work, we used a machine learning protocol to construct a map from spectra to molecular geometric structures, and employed Grad-CAM, a convolutional network interpretation technology, to analyze which kinds of chemical information are important for determining our model’s results. The results obtained for six small molecules of differing structures demonstrate that the model is capable of (1) extracting the crucial spectral features that are vital to downstream tasks without necessitating any manual preprocessing, and (2) enabling retrieval of molecular structural information with high precision.