C3E: A framework for chart classification and content extraction

IF 4.9 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Computers & Electrical Engineering Pub Date : 2024-11-22 DOI:10.1016/j.compeleceng.2024.109861

Muhammad Suhaib Kanroo , Hadia Showkat Kawoosa , Kapil Rana , Puneet Goyal

{"title":"C3E: A framework for chart classification and content extraction","authors":"Muhammad Suhaib Kanroo , Hadia Showkat Kawoosa , Kapil Rana , Puneet Goyal","doi":"10.1016/j.compeleceng.2024.109861","DOIUrl":null,"url":null,"abstract":"<div><div>Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"121 ","pages":"Article 109861"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007882","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.

查看原文本刊更多论文

C3E：图表分类和内容提取框架

在技术文档中加入图表可以简化复杂的数据表示并提高理解能力，从而增强文档的丰富性。然而，自动图表内容提取（CCE）是文档分析和理解领域的一项重大挑战。CCE 问题可通过一系列六个子任务来看待：图表分类 (CC)、文本检测和识别 (TDR)、文本角色分类 (TRC)、轴分析、图例分析和数据提取。改进这些子任务对于提高 CCE 的有效性非常重要。本文介绍了图表分类和内容提取（C3E）框架，主要侧重于 CCE 的前三个子任务：CC、TDR 和 TRC。我们为 CC 提出了一个 ChartVision 模型，这是一个基于 EfficientNet 的模型，并结合了一个新颖的混合卷积和扩张注意力模块的双分支架构。在文本检测和 TRC 方面，我们引入了一种基于 YOLOv5 的新型 CCE 方法，即 CCE-YOLO，专门用于对不同大小的文本成分进行定位和分类。此外，在文本识别方面，我们采用了具有连接主义时序分类损失的卷积递归神经网络。我们在基准数据集上进行了实验分析，以评估我们的方法在各个子任务中的有效性。具体来说，我们使用来自 ICPR2020 和 ICPR2022 CHART-Infographics 竞赛的 UB-PMC 2020 和 UB-PMC 2022 数据集对 CC、TDR 和 TRC 方法进行了评估。在 UB-PMC 2020 数据集上，CC、TDR 和 TRC 的 F1 分数分别为 94.26%、92.44% 和 80.64%；在 UB-PMC 2022 数据集上，CC、TDR 和 TRC 的 F1 分数分别为 94.0%、91.98% 和 84.48%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Electrical Engineering 工程技术-工程：电子与电气

CiteScore

9.20

自引率

7.00%

发文量

661

审稿时长

47 days

期刊介绍： The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.