ST-KeyS:自监督变压器关键字发现在历史手写文件

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Sana Khamekhem Jemni , Sourour Ammar , Mohamed Ali Souibgui , Yousri Kessentini , Abbas Cheddad
{"title":"ST-KeyS:自监督变压器关键字发现在历史手写文件","authors":"Sana Khamekhem Jemni ,&nbsp;Sourour Ammar ,&nbsp;Mohamed Ali Souibgui ,&nbsp;Yousri Kessentini ,&nbsp;Abbas Cheddad","doi":"10.1016/j.patcog.2025.112036","DOIUrl":null,"url":null,"abstract":"<div><div>Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods rely on machine learning techniques, which typically require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpora for training. To handle the data scarcity issue, we investigate the merits of self-supervised learning to extract useful representations of the input data without relying on human annotations and then use these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm without the need for labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a fine-tuned Siamese neural network model to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. The proposed approach outperforms state-of-the-art methods trained on the same datasets in an exhaustive experimental evaluation of five widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle, George Washington, Esposalles, and RIMES).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112036"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ST-KeyS: Self-supervised Transformer for Keyword Spotting in historical handwritten documents\",\"authors\":\"Sana Khamekhem Jemni ,&nbsp;Sourour Ammar ,&nbsp;Mohamed Ali Souibgui ,&nbsp;Yousri Kessentini ,&nbsp;Abbas Cheddad\",\"doi\":\"10.1016/j.patcog.2025.112036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods rely on machine learning techniques, which typically require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpora for training. To handle the data scarcity issue, we investigate the merits of self-supervised learning to extract useful representations of the input data without relying on human annotations and then use these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm without the need for labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a fine-tuned Siamese neural network model to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. The proposed approach outperforms state-of-the-art methods trained on the same datasets in an exhaustive experimental evaluation of five widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle, George Washington, Esposalles, and RIMES).</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"170 \",\"pages\":\"Article 112036\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S003132032500696X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500696X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

历史文献关键词定位是数字化馆藏初步探索的重要工具。目前,最有效的KWS方法依赖于机器学习技术,这通常需要大量带注释的训练数据。然而,在历史手稿的情况下,缺乏用于培训的注释语料库。为了解决数据稀缺问题,我们研究了自监督学习的优点,在不依赖于人工注释的情况下提取输入数据的有用表示,然后在下游任务中使用这些表示。我们提出ST-KeyS,一种基于视觉转换器的掩码自编码器模型,其中预训练阶段基于掩码和预测范式,而不需要标记数据。在微调阶段,将预训练好的编码器集成到一个微调的Siamese神经网络模型中,以改进输入图像的特征嵌入。我们使用字符金字塔直方图(PHOC)嵌入进一步改进图像表示,以创建和利用基于文本属性的图像中间表示。在对五个广泛使用的基准数据集(Botany、Alvermann Konzilsprotokolle、George Washington、Esposalles和RIMES)进行详尽的实验评估后,所提出的方法优于在相同数据集上训练的最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ST-KeyS: Self-supervised Transformer for Keyword Spotting in historical handwritten documents
Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods rely on machine learning techniques, which typically require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpora for training. To handle the data scarcity issue, we investigate the merits of self-supervised learning to extract useful representations of the input data without relying on human annotations and then use these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm without the need for labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a fine-tuned Siamese neural network model to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. The proposed approach outperforms state-of-the-art methods trained on the same datasets in an exhaustive experimental evaluation of five widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle, George Washington, Esposalles, and RIMES).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信