Molecular identification via molecular fingerprint extraction from atomic force microscopy images

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez
{"title":"Molecular identification via molecular fingerprint extraction from atomic force microscopy images","authors":"Manuel González Lastre,&nbsp;Pablo Pou,&nbsp;Miguel Wiche,&nbsp;Daniel Ebeling,&nbsp;Andre Schirmeisen,&nbsp;Rubén Pérez","doi":"10.1186/s13321-024-00921-1","DOIUrl":null,"url":null,"abstract":"<div><p>Non–Contact Atomic Force Microscopy with CO–functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR–AFM images, leading to molecular identification. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints, the 1024–bit Extended Connectivity Chemical Fingerprints of radius 2 (ECFP4), that were developed for substructure and similarity searching. ECFPs provide local structural information of the molecule, each bit correlating with a particular substructure within the molecule. Our DL model is able to extract this optimized structural descriptor from the 3D HR–AFM stacks and use it, through virtual screening, to identify molecules from their predicted ECFP4 with a retrieval accuracy on theoretical images of 95.4%. Furthermore, this approach, unlike previous DL models, assigns a confidence score, the Tanimoto similarity, to each of the candidate molecules, thus providing information on the reliability of the identification. By construction, the number of times a certain substructure is present in the molecule is lost during the hashing process, necessary to make them useful for machine learning applications. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model that predicts from the same HR–AFM stacks the chemical formula, boosting the identification accuracy up to a 97.6%. Finally, we perform a limited test with experimental images, obtaining promising results towards the application of this pipeline under real conditions.</p><p><b>Scientific contribution</b></p><p>Previous works on molecular identification from AFM images used chemical descriptors that were intuitive for humans but sub–optimal for neural networks. We propose a novel method to extract the ECFP4 from AFM images and identify the molecule via a virtual screening, beating previous state-of-the-art models.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00921-1","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00921-1","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Non–Contact Atomic Force Microscopy with CO–functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR–AFM images, leading to molecular identification. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints, the 1024–bit Extended Connectivity Chemical Fingerprints of radius 2 (ECFP4), that were developed for substructure and similarity searching. ECFPs provide local structural information of the molecule, each bit correlating with a particular substructure within the molecule. Our DL model is able to extract this optimized structural descriptor from the 3D HR–AFM stacks and use it, through virtual screening, to identify molecules from their predicted ECFP4 with a retrieval accuracy on theoretical images of 95.4%. Furthermore, this approach, unlike previous DL models, assigns a confidence score, the Tanimoto similarity, to each of the candidate molecules, thus providing information on the reliability of the identification. By construction, the number of times a certain substructure is present in the molecule is lost during the hashing process, necessary to make them useful for machine learning applications. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model that predicts from the same HR–AFM stacks the chemical formula, boosting the identification accuracy up to a 97.6%. Finally, we perform a limited test with experimental images, obtaining promising results towards the application of this pipeline under real conditions.

Scientific contribution

Previous works on molecular identification from AFM images used chemical descriptors that were intuitive for humans but sub–optimal for neural networks. We propose a novel method to extract the ECFP4 from AFM images and identify the molecule via a virtual screening, beating previous state-of-the-art models.

从原子力显微镜图像中提取分子指纹进行分子鉴定
使用 CO 功能化金属针尖的非接触式原子力显微镜(简称 HR-AFM)能以完全前所未有的分辨率观察吸附在表面上的单个分子的内部结构。之前的研究表明,深度学习(DL)模型可以检索恒定高度的 HR-AFM 图像的三维堆栈中编码的化学和结构信息,从而进行分子识别。在这项工作中,我们利用拓扑指纹(1024 位半径 2 的扩展连接化学指纹(ECFP4))对分子结构进行了完善的描述,从而克服了它们的局限性。ECFP 提供了分子的局部结构信息,每个比特与分子内的特定子结构相关。我们的 DL 模型能够从三维 HR-AFM 堆栈中提取这种优化的结构描述符,并通过虚拟筛选,利用预测的 ECFP4 识别分子,理论图像的检索准确率高达 95.4%。此外,与以往的 DL 模型不同,这种方法会给每个候选分子分配一个置信度分数,即 Tanimoto 相似度,从而提供识别可靠性的信息。根据构造,在散列过程中,分子中出现某种子结构的次数会丢失,而这是使它们在机器学习应用中发挥作用的必要条件。我们的研究表明,可以利用另一个 DL 模型提供的全局信息对基于指纹的虚拟筛选进行补充,该模型可从相同的 HR-AFM 堆栈中预测化学式,从而将识别准确率提高到 97.6%。最后,我们利用实验图像进行了有限的测试,获得了在实际条件下应用该管道的可喜成果。科学贡献 以往从原子力显微镜图像中进行分子识别的工作所使用的化学描述符对人类来说是直观的,但对神经网络来说却是次优的。我们提出了一种从原子力显微镜图像中提取 ECFP4 并通过虚拟筛选识别分子的新方法,超越了之前的先进模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信