单词级场景文本脚本识别的少镜头学习

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Veronica Naosekpam, Nilkanta Sahu
{"title":"单词级场景文本脚本识别的少镜头学习","authors":"Veronica Naosekpam,&nbsp;Nilkanta Sahu","doi":"10.1111/coin.12612","DOIUrl":null,"url":null,"abstract":"<p>Script identification of text in scene images has attracted massive attention recently. However, the existing techniques primarily emphasize on scripts where data are available abundantly, such as English, European, or East Asian. Although these methods are robust in dealing with high-resource data, how these techniques will work on low-resource scripts has yet to be discovered. For example, in India, there is a disparity among the text scripts across the country's demographic. To bridge the research gap for resource-constraint script identification, we present a few-shot learning network called the TextScriptFSLNet. This network does not require huge training data while achieving state-of-the-art performance on benchmark datasets. Our proposed method acts in accordance with a <math>\n <semantics>\n <mrow>\n <mi>C</mi>\n </mrow>\n <annotation>$$ C $$</annotation>\n </semantics></math>-way <math>\n <semantics>\n <mrow>\n <mi>K</mi>\n </mrow>\n <annotation>$$ K $$</annotation>\n </semantics></math>-shot paradigm by splitting the train set as support and query set, respectively. The support set learns representative knowledge of each class and creates its prototypes. We use multi-kernel spatial attention fused 2-layer convolutional neural network and averaging technique to generate the prototype of each class. This spatial attention aids in grasping important information in an image and enriches the feature representation. To the best of our knowledge, the proposed work is the first of its kind in the scene text understanding domain. Additionally, we created a dataset called Indic-FSL2023 comprising 10 of the 22 officially recognized Indian scripts. The proposed method achieves the highest accuracy among the tested methods on the newly created Indic-FSL2023. Experiments are also conducted on MLe2e to demonstrate its versatility. Furthermore, we also showed how our proposed model performed concerning illumination changes and blur on scene text script images.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Few-shot learning for word-level scene text script identification\",\"authors\":\"Veronica Naosekpam,&nbsp;Nilkanta Sahu\",\"doi\":\"10.1111/coin.12612\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Script identification of text in scene images has attracted massive attention recently. However, the existing techniques primarily emphasize on scripts where data are available abundantly, such as English, European, or East Asian. Although these methods are robust in dealing with high-resource data, how these techniques will work on low-resource scripts has yet to be discovered. For example, in India, there is a disparity among the text scripts across the country's demographic. To bridge the research gap for resource-constraint script identification, we present a few-shot learning network called the TextScriptFSLNet. This network does not require huge training data while achieving state-of-the-art performance on benchmark datasets. Our proposed method acts in accordance with a <math>\\n <semantics>\\n <mrow>\\n <mi>C</mi>\\n </mrow>\\n <annotation>$$ C $$</annotation>\\n </semantics></math>-way <math>\\n <semantics>\\n <mrow>\\n <mi>K</mi>\\n </mrow>\\n <annotation>$$ K $$</annotation>\\n </semantics></math>-shot paradigm by splitting the train set as support and query set, respectively. The support set learns representative knowledge of each class and creates its prototypes. We use multi-kernel spatial attention fused 2-layer convolutional neural network and averaging technique to generate the prototype of each class. This spatial attention aids in grasping important information in an image and enriches the feature representation. To the best of our knowledge, the proposed work is the first of its kind in the scene text understanding domain. Additionally, we created a dataset called Indic-FSL2023 comprising 10 of the 22 officially recognized Indian scripts. The proposed method achieves the highest accuracy among the tested methods on the newly created Indic-FSL2023. Experiments are also conducted on MLe2e to demonstrate its versatility. Furthermore, we also showed how our proposed model performed concerning illumination changes and blur on scene text script images.</p>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.12612\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12612","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

场景图像文本的脚本识别是近年来备受关注的问题。然而,现有的技术主要强调数据丰富的脚本,如英语、欧洲语或东亚语。尽管这些方法在处理高资源数据方面是健壮的,但是这些技术如何处理低资源脚本还有待发现。例如,在印度,在全国人口中,文本脚本存在差异。为了弥补资源约束脚本识别的研究空白,我们提出了一个名为TextScriptFSLNet的单次学习网络。该网络不需要大量的训练数据,同时在基准数据集上实现最先进的性能。我们提出的方法根据C $$ C $$ -way K $$ K $$ -shot范式,将训练集分别拆分为支持集和查询集。支持集学习每个类的代表性知识并创建其原型。我们使用多核空间注意融合的2层卷积神经网络和平均技术来生成每个类的原型。这种空间注意有助于捕捉图像中的重要信息,丰富特征表征。据我们所知,本文是场景文本理解领域的首个同类研究。此外,我们创建了一个名为index - fsl2023的数据集,其中包含22种官方认可的印度文字中的10种。在新研制的index - fsl2023芯片上,该方法在所有测试方法中精度最高。在MLe2e上也进行了实验,以证明其通用性。此外,我们还展示了我们提出的模型如何处理场景文本脚本图像上的照明变化和模糊。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Few-shot learning for word-level scene text script identification

Script identification of text in scene images has attracted massive attention recently. However, the existing techniques primarily emphasize on scripts where data are available abundantly, such as English, European, or East Asian. Although these methods are robust in dealing with high-resource data, how these techniques will work on low-resource scripts has yet to be discovered. For example, in India, there is a disparity among the text scripts across the country's demographic. To bridge the research gap for resource-constraint script identification, we present a few-shot learning network called the TextScriptFSLNet. This network does not require huge training data while achieving state-of-the-art performance on benchmark datasets. Our proposed method acts in accordance with a C $$ C $$ -way K $$ K $$ -shot paradigm by splitting the train set as support and query set, respectively. The support set learns representative knowledge of each class and creates its prototypes. We use multi-kernel spatial attention fused 2-layer convolutional neural network and averaging technique to generate the prototype of each class. This spatial attention aids in grasping important information in an image and enriches the feature representation. To the best of our knowledge, the proposed work is the first of its kind in the scene text understanding domain. Additionally, we created a dataset called Indic-FSL2023 comprising 10 of the 22 officially recognized Indian scripts. The proposed method achieves the highest accuracy among the tested methods on the newly created Indic-FSL2023. Experiments are also conducted on MLe2e to demonstrate its versatility. Furthermore, we also showed how our proposed model performed concerning illumination changes and blur on scene text script images.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computational Intelligence
Computational Intelligence 工程技术-计算机:人工智能
CiteScore
6.90
自引率
3.60%
发文量
65
审稿时长
>12 weeks
期刊介绍: This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信