IEEE International Conference on Document Analysis and Recognition最新文献

筛选
英文 中文
Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition 手写数学表达式识别的语义图表示学习
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-08-21 DOI: 10.1007/978-3-031-41676-7_9
Zhuang Liu, Ye Yuan, Zhilong Ji, Jingfeng Bai, X. Bai
{"title":"Semantic Graph Representation Learning for Handwritten Mathematical Expression Recognition","authors":"Zhuang Liu, Ye Yuan, Zhilong Ji, Jingfeng Bai, X. Bai","doi":"10.1007/978-3-031-41676-7_9","DOIUrl":"https://doi.org/10.1007/978-3-031-41676-7_9","url":null,"abstract":"","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114282778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection I-WAS:一种使用GPT-2进行明喻检测的数据增强方法
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-08-08 DOI: 10.48550/arXiv.2308.04109
Yongzhu Chang, Rongsheng Zhang, Jiashu Pu
{"title":"I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection","authors":"Yongzhu Chang, Rongsheng Zhang, Jiashu Pu","doi":"10.48550/arXiv.2308.04109","DOIUrl":"https://doi.org/10.48550/arXiv.2308.04109","url":null,"abstract":"Simile detection is a valuable task for many natural language processing (NLP)-based applications, particularly in the field of literature. However, existing research on simile detection often relies on corpora that are limited in size and do not adequately represent the full range of simile forms. To address this issue, we propose a simile data augmentation method based on textbf{W}ord replacement And Sentence completion using the GPT-2 language model. Our iterative process called I-WAS, is designed to improve the quality of the augmented sentences. To better evaluate the performance of our method in real-world applications, we have compiled a corpus containing a more diverse set of simile forms for experimentation. Our experimental results demonstrate the effectiveness of our proposed data augmentation method for simile detection.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121605138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Graphical Approach to Document Layout Analysis 文档布局分析的图形化方法
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-08-03 DOI: 10.48550/arXiv.2308.02051
Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, M. Sokolov, Vadym Barda, Delphine Vendryes, Christy Tanner
{"title":"A Graphical Approach to Document Layout Analysis","authors":"Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, M. Sokolov, Vadym Barda, Delphine Vendryes, Christy Tanner","doi":"10.48550/arXiv.2308.02051","DOIUrl":"https://doi.org/10.48550/arXiv.2308.02051","url":null,"abstract":"Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121350259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic RealCQA:科学图表问答作为一阶逻辑的测试平台
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-08-03 DOI: 10.48550/arXiv.2308.01979
Saleem Ahmed, Bhavin Jawade, Shubham Pandey, S. Setlur, Venugopal Govindaraju
{"title":"RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic","authors":"Saleem Ahmed, Bhavin Jawade, Shubham Pandey, S. Setlur, Venugopal Govindaraju","doi":"10.48550/arXiv.2308.01979","DOIUrl":"https://doi.org/10.48550/arXiv.2308.01979","url":null,"abstract":"We present a comprehensive study of chart visual question-answering(QA) task, to address the challenges faced in comprehending and extracting data from chart visualizations within documents. Despite efforts to tackle this problem using synthetic charts, solutions are limited by the shortage of annotated real-world data. To fill this gap, we introduce a benchmark and dataset for chart visual QA on real-world charts, offering a systematic analysis of the task and a novel taxonomy for template-based chart question creation. Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations. Our study is conducted on a real-world chart dataset from scientific literature, showcasing higher visual complexity compared to other works. Our focus is on template-based QA and how it can serve as a standard for evaluating the first-order logic capabilities of models. The results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models and advance the field of chart visual QA and formal logic verification for neural networks in general.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":" 70","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120829647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding SpaDen:用于真实世界图表理解的稀疏和密集关键点估计
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-08-03 DOI: 10.48550/arXiv.2308.01971
Saleem Ahmed, Pengyu Yan, D. Doermann, S. Setlur, Venugopal Govindaraju
{"title":"SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding","authors":"Saleem Ahmed, Pengyu Yan, D. Doermann, S. Setlur, Venugopal Govindaraju","doi":"10.48550/arXiv.2308.01971","DOIUrl":"https://doi.org/10.48550/arXiv.2308.01971","url":null,"abstract":"We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct the components within the plot area. Our novelty lies in detecting a fusion of continuous and discrete KP as predicted heatmaps. A combination of sparse and dense per-pixel objectives coupled with a uni-modal self-attention-based feature-fusion layer is applied to learn KP embeddings. Further leveraging deep metric learning for unsupervised clustering, allows us to segment the chart plot area into various objects. By further matching the chart components to the legend, we are able to obtain the data series names. A post-processing threshold is applied to the KP embeddings to refine the object reconstructions and improve accuracy. Our extensive experiments include an evaluation of different modules for KP estimation and the combination of deep layer aggregation and corner pooling approaches. The results of our experiments provide extensive evaluation for the task of real-world chart data extraction.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121129621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reading Between the Lanes: Text VideoQA on the Road 车道间阅读:道路上的文本视频qa
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-07-08 DOI: 10.48550/arXiv.2307.03948
George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C.V. Jawahar
{"title":"Reading Between the Lanes: Text VideoQA on the Road","authors":"George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C.V. Jawahar","doi":"10.48550/arXiv.2307.03948","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03948","url":null,"abstract":"Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of $3,222$ driving videos collected from multiple countries, annotated with $10,500$ questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120962146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Line Graphics Digitization: A Step Towards Full Automation 直线图形数字化:迈向完全自动化的一步
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-07-05 DOI: 10.48550/arXiv.2307.02065
Omar Moured, Jiaming Zhang, Alina Roitberg, Thorsten Schwarz, R. Stiefelhagen
{"title":"Line Graphics Digitization: A Step Towards Full Automation","authors":"Omar Moured, Jiaming Zhang, Alina Roitberg, Thorsten Schwarz, R. Stiefelhagen","doi":"10.48550/arXiv.2307.02065","DOIUrl":"https://doi.org/10.48550/arXiv.2307.02065","url":null,"abstract":"The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories. Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines. Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection. To benchmark our LG dataset, we explore 7 state-of-the-art models. To foster further research on the digitization of statistical graphs, we will make the dataset, code, and models publicly available to the community.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132257877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UTRNet: High-Resolution Urdu Text Recognition In Printed Documents UTRNet:打印文档中的高分辨率乌尔都语文本识别
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-06-27 DOI: 10.1007/978-3-031-41734-4_19
Abdur Rahman, Arjun Ghosh, Chetan Arora
{"title":"UTRNet: High-Resolution Urdu Text Recognition In Printed Documents","authors":"Abdur Rahman, Arjun Ghosh, Chetan Arora","doi":"10.1007/978-3-031-41734-4_19","DOIUrl":"https://doi.org/10.1007/978-3-031-41734-4_19","url":null,"abstract":"","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116160294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ambigram Generation by A Diffusion Model 扩散模型的双义图生成
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-06-21 DOI: 10.48550/arXiv.2306.12049
T. Shirakawa, Seiichi Uchida
{"title":"Ambigram Generation by A Diffusion Model","authors":"T. Shirakawa, Seiichi Uchida","doi":"10.48550/arXiv.2306.12049","DOIUrl":"https://doi.org/10.48550/arXiv.2306.12049","url":null,"abstract":"Ambigrams are graphical letter designs that can be read not only from the original direction but also from a rotated direction (especially with 180 degrees). Designing ambigrams is difficult even for human experts because keeping their dual readability from both directions is often difficult. This paper proposes an ambigram generation model. As its generation module, we use a diffusion model, which has recently been used to generate high-quality photographic images. By specifying a pair of letter classes, such as 'A' and 'B', the proposed model generates various ambigram images which can be read as 'A' from the original direction and 'B' from a direction rotated 180 degrees. Quantitative and qualitative analyses of experimental results show that the proposed model can generate high-quality and diverse ambigrams. In addition, we define ambigramability, an objective measure of how easy it is to generate ambigrams for each letter pair. For example, the pair of 'A' and 'V' shows a high ambigramability (that is, it is easy to generate their ambigrams), and the pair of 'D' and 'K' shows a lower ambigramability. The ambigramability gives various hints of the ambigram generation not only for computers but also for human experts. The code can be found at (https://github.com/univ-esuty/ambifusion).","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131249906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images ICDAR 2023从视觉丰富的文档图像中提取结构化文本竞赛
IEEE International Conference on Document Analysis and Recognition Pub Date : 2023-06-05 DOI: 10.48550/arXiv.2306.03287
Wenwen Yu, Chengquan Zhang, H. Cao, W. Hua, Bohan Li, Huang-wei Chen, Ming Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yu Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Chengxi Liu, Jiebo Luo, Shuicheng Yan, M. Zhang, Dimosthenis Karatzas, Xingchao Sun, Jingdong Wang, Xiang Bai
{"title":"ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images","authors":"Wenwen Yu, Chengquan Zhang, H. Cao, W. Hua, Bohan Li, Huang-wei Chen, Ming Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yu Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Chengxi Liu, Jiebo Luo, Shuicheng Yan, M. Zhang, Dimosthenis Karatzas, Xingchao Sun, Jingdong Wang, Xiang Bai","doi":"10.48550/arXiv.2306.03287","DOIUrl":"https://doi.org/10.48550/arXiv.2306.03287","url":null,"abstract":"Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot / Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128054074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信