C3E:图表分类和内容提取框架

IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Muhammad Suhaib Kanroo , Hadia Showkat Kawoosa , Kapil Rana , Puneet Goyal
{"title":"C3E:图表分类和内容提取框架","authors":"Muhammad Suhaib Kanroo ,&nbsp;Hadia Showkat Kawoosa ,&nbsp;Kapil Rana ,&nbsp;Puneet Goyal","doi":"10.1016/j.compeleceng.2024.109861","DOIUrl":null,"url":null,"abstract":"<div><div>Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"121 ","pages":"Article 109861"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"C3E: A framework for chart classification and content extraction\",\"authors\":\"Muhammad Suhaib Kanroo ,&nbsp;Hadia Showkat Kawoosa ,&nbsp;Kapil Rana ,&nbsp;Puneet Goyal\",\"doi\":\"10.1016/j.compeleceng.2024.109861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"121 \",\"pages\":\"Article 109861\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790624007882\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007882","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

在技术文档中加入图表可以简化复杂的数据表示并提高理解能力,从而增强文档的丰富性。然而,自动图表内容提取(CCE)是文档分析和理解领域的一项重大挑战。CCE 问题可通过一系列六个子任务来看待:图表分类 (CC)、文本检测和识别 (TDR)、文本角色分类 (TRC)、轴分析、图例分析和数据提取。改进这些子任务对于提高 CCE 的有效性非常重要。本文介绍了图表分类和内容提取(C3E)框架,主要侧重于 CCE 的前三个子任务:CC、TDR 和 TRC。我们为 CC 提出了一个 ChartVision 模型,这是一个基于 EfficientNet 的模型,并结合了一个新颖的混合卷积和扩张注意力模块的双分支架构。在文本检测和 TRC 方面,我们引入了一种基于 YOLOv5 的新型 CCE 方法,即 CCE-YOLO,专门用于对不同大小的文本成分进行定位和分类。此外,在文本识别方面,我们采用了具有连接主义时序分类损失的卷积递归神经网络。我们在基准数据集上进行了实验分析,以评估我们的方法在各个子任务中的有效性。具体来说,我们使用来自 ICPR2020 和 ICPR2022 CHART-Infographics 竞赛的 UB-PMC 2020 和 UB-PMC 2022 数据集对 CC、TDR 和 TRC 方法进行了评估。在 UB-PMC 2020 数据集上,CC、TDR 和 TRC 的 F1 分数分别为 94.26%、92.44% 和 80.64%;在 UB-PMC 2022 数据集上,CC、TDR 和 TRC 的 F1 分数分别为 94.0%、91.98% 和 84.48%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
C3E: A framework for chart classification and content extraction
Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Electrical Engineering
Computers & Electrical Engineering 工程技术-工程:电子与电气
CiteScore
9.20
自引率
7.00%
发文量
661
审稿时长
47 days
期刊介绍: The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信