2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images 用于场景图像文本建议生成的弱监督文本注意网络
Li Rong, En MengYi, Liang Jianqiang, Zhang Haibin
{"title":"Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images","authors":"Li Rong, En MengYi, Liang Jianqiang, Zhang Haibin","doi":"10.1109/ICDAR.2017.61","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.61","url":null,"abstract":"Detection and recognition of textual information in scene images is useful but challenging tasks. Numerous methods have been proposed to solve the problem. Recently the best results are attained by deep neural network based methods. Training such networks needs large amounts of bounding box-level or pixel-level annotated data. Generating large amounts of such data always requires huge amounts of labor which can be expensive and time consuming. In this paper we explore the utilization of weakly supervised deep neural network for generating text proposals in natural scene images. The network allows multi-scale inputs and is trained to perform whole image binary classification to tell whether an image contains text or not. After training the network acquired learning of powerful discriminated features that are capable of distinguishing text from other objects. To get the text location, text confidence score map is generated based on feature maps from the top two convolutional layers by extracting class activation map. Value of each pixel in the score map denotes the confidence score of whether the pixel belongs to text or not. By setting a threshold the score map is converted to a binary mask map. Foregrounds of the mask map are probable text areas. Then Maximally Stable Extremal Regions (MSERs) are extracted from these probable text areas and are aggregated as groups. By processing these groups, text proposals are obtained. Experimental results show that without using any bounding boxes or pixel-level annotation, the algorithm achieves recall rate comparable to some fully supervised methods in ICDAR 2013 focused text dataset and In ICDAR 2015 incidental text dataset.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125557140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Deep Strip-Based Network with Cascade Learning for Scene Text Localization 基于深度条形网络的级联学习场景文本定位
Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao
{"title":"Deep Strip-Based Network with Cascade Learning for Scene Text Localization","authors":"Dao Wu, Rui Wang, Pengwen Dai, Yueying Zhang, Xiaochun Cao","doi":"10.1109/ICDAR.2017.140","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.140","url":null,"abstract":"Scene text detection is currently a popular research topic in the computer vision community. However, it is a challenging task due to the variations of texts and clutter backgrounds. In this paper, we propose a novel framework for scene text localization. Based on the region proposal network, a Strip-based Text Detection Network (STDN) is developed with vertical anchor mechanism to predict the text/non-text strip-shaped proposals. Meanwhile, we incorporate the recurrent neural network layers in the proposed network to refine the predicted results. Specifically, hard example mining is performed to train the STDN with cascade learning, which has a remarkable improvement in precision. Besides, we exploit a clustering algorithm to generate anchor dimensions spontaneously without hand-picking, which is portable and time-saving. The text detection framework achieves the state-of-the-art performance on ICDAR2013 with 0.89 F-measure.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126925957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Robust Symmetry-Based Method for Scene/Video Text Detection through Neural Network 基于对称性的鲁棒场景/视频文本神经网络检测方法
Yirui Wu, Wenhai Wang, P. Shivakumara, Tong Lu
{"title":"A Robust Symmetry-Based Method for Scene/Video Text Detection through Neural Network","authors":"Yirui Wu, Wenhai Wang, P. Shivakumara, Tong Lu","doi":"10.1109/ICDAR.2017.206","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.206","url":null,"abstract":"Text detection in video/scene images has gained a significant attention in the field of image processing and document analysis due to the inherent challenges caused by variations in contrast, orientation, background, text type, font type, non-uniform illumination and so on. In this paper, we propose a novel text detection method to explore symmetry property and appearance features of text for improved accuracy and robustness. First, the proposed method explores Extremal Regions (ER) for detecting text candidates in images. Then we propose a novel feature named as Multi-domain Strokes Symmetry Histogram (MSSH) for each text candidate, which describes the inherent symmetry property of stroke pixel pairs in gray, gradient and frequency domains. Furthermore, deep convolutional features are extracted to describe the appearance for each text candidate. We further fuse them by Auto-Encoder network to define a more discriminative text descriptor for classification. Finally, the proposed method constructs text lines based on the classification results. We demonstrate the effectiveness and robustness detection results of our proposed method by testing on four different benchmark databases.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114955146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Scene Text Relocation with Guidance 场景文本重新定位与指导
Anna Zhu, S. Uchida
{"title":"Scene Text Relocation with Guidance","authors":"Anna Zhu, S. Uchida","doi":"10.1109/ICDAR.2017.212","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.212","url":null,"abstract":"Applying object proposal technique for scene text detection becomes popular for its significant improvement in speed and accuracy for object detection. However, some of the text regions after the proposal classification are overlapped and hard to remove or merge. In this paper, we present a scene text relocation system that refines the detection from text proposals to text. An object proposal-based deep neural network is employed to get the text proposals. To tackle the detection overlapping problem, a refinement deep neural network relocates the overlapped regions by estimating the text probability inside, and locating the accurate text regions by thresholding. Since the spacebetweenwordsindifferenttextlinesarevarious, aguidance mechanism is proposed in text relocation to guide where to extract the text regions in word level. This refinement procedure helps boost the precision after removing multiple overlapped text regions or joint cracked text regions. The experimental results on standard benchmark ICDAR 2013 demonstrate the effectiveness of the proposed approach.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115553460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition 门控卷积递归神经网络用于多语言手写识别
Théodore Bluche, Ronaldo O. Messina
{"title":"Gated Convolutional Recurrent Neural Networks for Multilingual Handwriting Recognition","authors":"Théodore Bluche, Ronaldo O. Messina","doi":"10.1109/ICDAR.2017.111","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.111","url":null,"abstract":"In this paper, we propose a new neural network architecture for state-of-the-art handwriting recognition, alternative to multi-dimensional long short-term memory (MD-LSTM) recurrent neural networks. The model is based on a convolutional encoder of the input images, and a bidirectional LSTM decoder predicting character sequences. In this paradigm, we aim at producing generic, multilingual and reusable features with the convolutional encoder, leveraging more data for transfer learning. The architecture is also motivated by the need for a fast training on GPUs, and the requirement of a fast decoding on CPUs. The main contribution of this paper lies in the convolutional gates in the encoder, enabling hierarchical context-sensitive feature extraction. The experiments on a large benchmark including seven languages show a consistent and significant improvement of the proposed approach over our previous production systems. We also report state-of-the-art results on line and paragraph level recognition on the IAM and Rimes databases.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122374277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 100
Histogram of Exclamation Marks and Its Application for Comics Analysis 感叹号直方图及其在漫画分析中的应用
Sotaro Hiroe, S. Hotta
{"title":"Histogram of Exclamation Marks and Its Application for Comics Analysis","authors":"Sotaro Hiroe, S. Hotta","doi":"10.1109/ICDAR.2017.294","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.294","url":null,"abstract":"This paper proposes a histogram formed by counting the number of exclamation marks in comic books for comics analysis. Exclamation marks in comic books are used for expressing character's emotion and are depicted in frequently excited scenes. Also they are easy to detect from different comic books written by various authors. Hence we represent each comic book by its distribution of exclamation marks as a histogram and use such histogram for topic change detection and visualization of relationships between comic books. Experimental results on real comic books show that our bold approach has potential for approximating of contents of comic books by using the proposed histograms.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122699497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Machine Learning System for Assisting Neophyte Researchers in Digital Libraries 协助数位图书馆新手研究人员的机器学习系统
Bissan Audeh, M. Beigbeder, C. Largeron
{"title":"A Machine Learning System for Assisting Neophyte Researchers in Digital Libraries","authors":"Bissan Audeh, M. Beigbeder, C. Largeron","doi":"10.1109/ICDAR.2017.60","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.60","url":null,"abstract":"Although existing digital libraries such as Google Scholar and CiteSeerX propose advanced search functionalities, they do not take into consideration whether the user is new or specialized in the research domain of his query. As a result, neophytes can spend a lot of time checking documents that are not adapted to their initial information need. In this paper, we propose NeoTex, a machine learning based approach that combines content-based retrieval and citation graph measures to propose documents adapted to new researchers. The contributions of our work are: designing a model for scientific retrieval suited to neophytes, defining an evaluation protocol with realistic ground truths, and testing the model on a large real collection from a national digital library.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129496836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Comic Characters Detection Using Deep Learning 基于深度学习的漫画角色检测
Nhu-Van Nguyen, Christophe Rigaud, J. Burie
{"title":"Comic Characters Detection Using Deep Learning","authors":"Nhu-Van Nguyen, Christophe Rigaud, J. Burie","doi":"10.1109/ICDAR.2017.290","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.290","url":null,"abstract":"Comic characters detection has been an interesting area in comic analysis as it not only allows more efficient indexation and retrieval for comic books but also yields an adequate understanding of comics so as to help better creating the digital form of comic books. In recent years, several methods that have been proposed to extract/detect characters from comics, have given reasonable performance. However, they always use their datasets to evaluate the methods without comparing with other works or experimenting on a standard dataset. In this work, we take advantage of the recent and significant development of deep learning to apply it to comic character detection. We use the latest object detection deep networks to train the comic characters detector based on our proposed dataset. By experimenting on our proposed dataset and also on available datasets from previous works, we have found that this method significantly outperforms existing methods. We believe that this state-of-the-art approach can be considered as a reliable baseline method to compare and better understand future detection techniques.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Semi-Supervised Transfer Learning for Convolutional Neural Network Based Chinese Character Recognition 基于卷积神经网络的半监督迁移学习汉字识别
Yejun Tang, Bing Wu, Liangrui Peng, Changsong Liu
{"title":"Semi-Supervised Transfer Learning for Convolutional Neural Network Based Chinese Character Recognition","authors":"Yejun Tang, Bing Wu, Liangrui Peng, Changsong Liu","doi":"10.1109/ICDAR.2017.79","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.79","url":null,"abstract":"Although transfer learning has aroused researchers' great interest, how to utilize the unlabeled data is still an open and important problem in this area. We propose a novel semi-supervised transfer learning (STL) method by incorporating Multi-Kernel Maximum Mean Discrepancy (MK-MMD) loss into the traditional fine-tuned Convolutional Neural Network (CNN) transfer learning framework for Chinese character recognition. The proposed method includes three steps. First, a CNN model is trained by massive labeled samples in the source domain. Then the CNN model is fine-tuned by a few labeled samples in the target domain. Finally, the CNN model is trained with both a large number of unlabeled samples and the limited labeled samples in the target domain to minimize the MK-MMD loss. Experiments investigate detailed configurations and parameters of the proposed STL method with several frequently used CNN structures including AlexNet, GoogLeNet, and ResNet. Experimental results on practical Chinese character transfer learning tasks, such as Dunhuang historical Chinese character recognition, indicate that the proposed method can significantly improve recognition accuracy in the target domain.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128428796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Radical-Based Chinese Character Recognition via Multi-Labeled Learning of Deep Residual Networks 基于深度残差网络多标签学习的汉字识别
Tie-Qiang Wang, Fei Yin, Cheng-Lin Liu
{"title":"Radical-Based Chinese Character Recognition via Multi-Labeled Learning of Deep Residual Networks","authors":"Tie-Qiang Wang, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2017.100","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.100","url":null,"abstract":"The digitization of Chinese historical documents poses a new challenge that in the huge set of character categories, majority of characters are not in common use now and have few samples for training the character classifiers. To settle this problem, we consider the radical-level composition of Chinese characters, and propose to detect position-dependent radicals using a deep residual network with multi-labeled learning. This enables the recognition of novel characters without training samples if the characters are composed of radicals appearing in training samples. In multi-labeled learning, each training character sample is labeled as positive for each radical it contains, such that after training, all the radicals appearing in the character can be detected. Experimental results on a large-category-set database of printed Chinese characters demonstrate that the proposed method can detect radicals accurately. Moreover, according to radical configurations, our model can credibly recognize novel characters as well as trained characters.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128209157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信