AENet:基于注意机制和特征聚合的哈萨克手写文档图像检索

IF 1 4区 工程技术 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC
Gang Chen, Xuebin Xu, Jiaoyan Wang, Hornisa Mamat, Kurban Ubul
{"title":"AENet:基于注意机制和特征聚合的哈萨克手写文档图像检索","authors":"Gang Chen,&nbsp;Xuebin Xu,&nbsp;Jiaoyan Wang,&nbsp;Hornisa Mamat,&nbsp;Kurban Ubul","doi":"10.1002/tee.24122","DOIUrl":null,"url":null,"abstract":"<p>Kazakh is one of the multilingual languages of China and is widely spoken in some areas of Xinjiang, China. However, due to the fact that Kazakh is a language in which several characters are glued together to form a continuous word with a unique shape and complex structural combinations of relationships. This paper explores a solution for offline image retrieval of handwritten Kazakh words, which is a challenging task because, due to the lack of relevant datasets and the special writing morphology of the Kazakh language, traditional text image retrieval algorithms often struggle to achieve satisfactory results when dealing with writing styles that are varied and adherent to the language. Therefore, a dataset of offline Kazakh handwritten document images was created in this paper. The dataset contains 300 pages of document images with 20 500 words. Then, a new model called the ‘AENet’ is proposed. The model utilizes an attention mechanism to focus more finely on focal regions such as centers, inflection points, and contours of handwritten word images and to capture important local features from different scales. Fusion space pyramid pooling, feature aggregation, encoding operations, and feature downscaling and reconstruction are used to extract and reconstruct more representative feature representations from local to global to capture the overall information in the word images. Through experimental evaluation on Kazak-80, Zilla-64, and HWDB1.1-375 datasets, it is verified that the method significantly improves the mAP for image retrieval of handwritten words, which is especially applicable to adhesive languages like Kazakh. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.</p>","PeriodicalId":13435,"journal":{"name":"IEEJ Transactions on Electrical and Electronic Engineering","volume":"19 10","pages":"1640-1651"},"PeriodicalIF":1.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AENet: Image Retrieval of Kazakh Handwritten Documents Based on Attention Mechanism and Feature Aggregation\",\"authors\":\"Gang Chen,&nbsp;Xuebin Xu,&nbsp;Jiaoyan Wang,&nbsp;Hornisa Mamat,&nbsp;Kurban Ubul\",\"doi\":\"10.1002/tee.24122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Kazakh is one of the multilingual languages of China and is widely spoken in some areas of Xinjiang, China. However, due to the fact that Kazakh is a language in which several characters are glued together to form a continuous word with a unique shape and complex structural combinations of relationships. This paper explores a solution for offline image retrieval of handwritten Kazakh words, which is a challenging task because, due to the lack of relevant datasets and the special writing morphology of the Kazakh language, traditional text image retrieval algorithms often struggle to achieve satisfactory results when dealing with writing styles that are varied and adherent to the language. Therefore, a dataset of offline Kazakh handwritten document images was created in this paper. The dataset contains 300 pages of document images with 20 500 words. Then, a new model called the ‘AENet’ is proposed. The model utilizes an attention mechanism to focus more finely on focal regions such as centers, inflection points, and contours of handwritten word images and to capture important local features from different scales. Fusion space pyramid pooling, feature aggregation, encoding operations, and feature downscaling and reconstruction are used to extract and reconstruct more representative feature representations from local to global to capture the overall information in the word images. Through experimental evaluation on Kazak-80, Zilla-64, and HWDB1.1-375 datasets, it is verified that the method significantly improves the mAP for image retrieval of handwritten words, which is especially applicable to adhesive languages like Kazakh. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.</p>\",\"PeriodicalId\":13435,\"journal\":{\"name\":\"IEEJ Transactions on Electrical and Electronic Engineering\",\"volume\":\"19 10\",\"pages\":\"1640-1651\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEJ Transactions on Electrical and Electronic Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/tee.24122\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEJ Transactions on Electrical and Electronic Engineering","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/tee.24122","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

哈萨克语是中国的多语种语言之一,在中国新疆的一些地区广泛使用。然而,由于哈萨克语是一种将多个汉字粘连在一起构成连续词的语言,具有独特的字形和复杂的结构组合关系。由于缺乏相关的数据集,加上哈萨克语特殊的书写形态,传统的文本图像检索算法在处理哈萨克语多种多样的书写风格时往往难以取得令人满意的结果。因此,本文创建了一个离线哈萨克手写文档图像数据集。该数据集包含 300 页文档图像,共 20500 个单词。然后,本文提出了一个名为 "AENet "的新模型。该模型利用注意力机制,更精细地聚焦于手写单词图像的中心、拐点和轮廓等焦点区域,并从不同尺度捕捉重要的局部特征。通过融合空间金字塔池化、特征聚合、编码操作以及特征降维和重构,可以从局部到全局提取和重构更具代表性的特征表示,从而捕捉文字图像的整体信息。通过对 Kazak-80、Zilla-64 和 HWDB1.1-375 数据集的实验评估,验证了该方法显著提高了手写单词图像检索的 mAP,尤其适用于哈萨克语等粘性语言。© 2024 日本电气工程师学会和 Wiley Periodicals LLC。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
AENet: Image Retrieval of Kazakh Handwritten Documents Based on Attention Mechanism and Feature Aggregation

Kazakh is one of the multilingual languages of China and is widely spoken in some areas of Xinjiang, China. However, due to the fact that Kazakh is a language in which several characters are glued together to form a continuous word with a unique shape and complex structural combinations of relationships. This paper explores a solution for offline image retrieval of handwritten Kazakh words, which is a challenging task because, due to the lack of relevant datasets and the special writing morphology of the Kazakh language, traditional text image retrieval algorithms often struggle to achieve satisfactory results when dealing with writing styles that are varied and adherent to the language. Therefore, a dataset of offline Kazakh handwritten document images was created in this paper. The dataset contains 300 pages of document images with 20 500 words. Then, a new model called the ‘AENet’ is proposed. The model utilizes an attention mechanism to focus more finely on focal regions such as centers, inflection points, and contours of handwritten word images and to capture important local features from different scales. Fusion space pyramid pooling, feature aggregation, encoding operations, and feature downscaling and reconstruction are used to extract and reconstruct more representative feature representations from local to global to capture the overall information in the word images. Through experimental evaluation on Kazak-80, Zilla-64, and HWDB1.1-375 datasets, it is verified that the method significantly improves the mAP for image retrieval of handwritten words, which is especially applicable to adhesive languages like Kazakh. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEJ Transactions on Electrical and Electronic Engineering
IEEJ Transactions on Electrical and Electronic Engineering 工程技术-工程:电子与电气
CiteScore
2.70
自引率
10.00%
发文量
199
审稿时长
4.3 months
期刊介绍: IEEJ Transactions on Electrical and Electronic Engineering (hereinafter called TEEE ) publishes 6 times per year as an official journal of the Institute of Electrical Engineers of Japan (hereinafter "IEEJ"). This peer-reviewed journal contains original research papers and review articles on the most important and latest technological advances in core areas of Electrical and Electronic Engineering and in related disciplines. The journal also publishes short communications reporting on the results of the latest research activities TEEE ) aims to provide a new forum for IEEJ members in Japan as well as fellow researchers in Electrical and Electronic Engineering from around the world to exchange ideas and research findings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信