位置编码卷积网络求解连通文本字幕

IF 2.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence and Soft Computing Research Pub Date : 2021-04-01 DOI:10.2478/jaiscr-2022-0008

Ke Qing, Rongsheng Zhang

{"title":"位置编码卷积网络求解连通文本字幕","authors":"Ke Qing, Rongsheng Zhang","doi":"10.2478/jaiscr-2022-0008","DOIUrl":null,"url":null,"abstract":"Abstract Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":"121 - 133"},"PeriodicalIF":2.4000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Position-Encoding Convolutional Network to Solving Connected Text Captcha\",\"authors\":\"Ke Qing, Rongsheng Zhang\",\"doi\":\"10.2478/jaiscr-2022-0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.\",\"PeriodicalId\":48494,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"volume\":\"12 1\",\"pages\":\"121 - 133\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2478/jaiscr-2022-0008\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Soft Computing Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2478/jaiscr-2022-0008","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

摘要

摘要基于文本的CAPTCHA是一种方便有效的安全机制，已被广泛部署在各个网站上。由CNN和基于注意力的RNN组成的高效的场景文本识别端到端模型在解决基于文本的CAPTCHA方面表现出有限的性能。与街景图像和文档相比，CAPTCHA中的字符序列是非语义的。RNN失去了学习语义上下文的能力，并且仅隐式地对提取的特征的相对位置进行编码。同时，阻止字符分割和识别的安全特性大大增加了CAPTCHA的复杂性。该模型的性能对不同的CAPTCHA方案是敏感的。在本文中，我们分析了基于文本的CAPTCHA的特性，并相应地将其视为一个高度位置相对的字符序列识别任务。我们提出了一个名为PosConv的网络，在没有RNN的情况下利用字符序列中的位置信息。PosConv使用了一种新颖的填充策略和改进的卷积，将相对位置显式编码到字符的局部特征中。PosConv的这种机制使从CAPTCHA中提取的特征更具信息性和鲁棒性。我们在六个基于文本的CAPTCHA方案上验证了PosConv，它以显著更少的参数和更快的收敛速度实现了最先进或有竞争力的识别精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Position-Encoding Convolutional Network to Solving Connected Text Captcha

Abstract Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Artificial Intelligence and Soft Computing Research COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

7.00

自引率

25.00%

发文量

审稿时长

24 weeks

期刊介绍： Journal of Artificial Intelligence and Soft Computing Research (available also at Sciendo (De Gruyter)) is a dynamically developing international journal focused on the latest scientific results and methods constituting traditional artificial intelligence methods and soft computing techniques. Our goal is to bring together scientists representing both approaches and various research communities.