Subgrapher: visual fingerprinting of chemical structures

IF 5.7 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Lucas Morin, Gerhard Ingmar Meijer, Valéry Weber, Luc Van Gool, Peter W. J. Staar
{"title":"Subgrapher: visual fingerprinting of chemical structures","authors":"Lucas Morin,&nbsp;Gerhard Ingmar Meijer,&nbsp;Valéry Weber,&nbsp;Luc Van Gool,&nbsp;Peter W. J. Staar","doi":"10.1186/s13321-025-01091-4","DOIUrl":null,"url":null,"abstract":"<div><p>Automatic extraction of molecules from scientific literature plays a crucial role in accelerating research across fields ranging from drug discovery to materials science. Patent documents, in particular, contain molecular information in visual form, which is often inaccessible through traditional text-based searches. In this work, we introduce SubGrapher, a method for the visual fingerprinting of molecule and Markush structure images. Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting fingerprints directly from images. Using learning-based instance segmentation, SubGrapher identifies functional groups and carbon backbones, constructing a substructure-based fingerprint that enables the retrieval of molecules and Markush structures. Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecule and Markush structure depictions. The benchmark datasets, models, and inference code are publicly available..</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-01091-4","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-01091-4","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic extraction of molecules from scientific literature plays a crucial role in accelerating research across fields ranging from drug discovery to materials science. Patent documents, in particular, contain molecular information in visual form, which is often inaccessible through traditional text-based searches. In this work, we introduce SubGrapher, a method for the visual fingerprinting of molecule and Markush structure images. Unlike conventional Optical Chemical Structure Recognition (OCSR) models that attempt to reconstruct full molecular graphs, SubGrapher focuses on extracting fingerprints directly from images. Using learning-based instance segmentation, SubGrapher identifies functional groups and carbon backbones, constructing a substructure-based fingerprint that enables the retrieval of molecules and Markush structures. Our approach is evaluated against state-of-the-art OCSR and fingerprinting methods, demonstrating superior retrieval performance and robustness across diverse molecule and Markush structure depictions. The benchmark datasets, models, and inference code are publicly available..

子图谱:化学结构的视觉指纹图谱
从科学文献中自动提取分子在加速从药物发现到材料科学等各个领域的研究中起着至关重要的作用。特别是专利文件,以视觉形式包含分子信息,这通常是通过传统的基于文本的搜索无法访问的。在这项工作中,我们介绍了SubGrapher,一种用于分子和马库什结构图像的视觉指纹识别方法。与传统的光学化学结构识别(OCSR)模型试图重建完整的分子图不同,SubGrapher专注于直接从图像中提取指纹。使用基于学习的实例分割,SubGrapher可以识别官能团和碳骨架,构建基于子结构的指纹,从而可以检索分子和马库什结构。我们的方法与最先进的OCSR和指纹识别方法进行了评估,在不同的分子和马库什结构描述中展示了卓越的检索性能和鲁棒性。基准测试数据集、模型和推理代码都是公开的。SubGrapher引入了一种新的方法,将分子和马库什结构图像直接转换为指纹,只需一步,绕过传统的SMILES或图形重建。在不同数据集(包括Markush结构图像)的子结构检测和结构检索方面,它优于现有的OCSR和指纹识别方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信