Chart-to-text generation using a hybrid deep network

Nontaporn Wonglek, Siriwalai Maneesinthu, Sivakorn Srichaiyaperk, Teerapon Saengmuang, Thitirat Siriborvornratanakul
{"title":"Chart-to-text generation using a hybrid deep network","authors":"Nontaporn Wonglek,&nbsp;Siriwalai Maneesinthu,&nbsp;Sivakorn Srichaiyaperk,&nbsp;Teerapon Saengmuang,&nbsp;Thitirat Siriborvornratanakul","doi":"10.1007/s43674-023-00066-y","DOIUrl":null,"url":null,"abstract":"<div><p>Text generation from charts is a task that involves automatically generating natural language text descriptions of data presented in chart form. This is a useful capability for tasks such as summarizing data for presentation or providing alternative representations of data for accessibility. In this work, we propose a hybrid deep network approach for text generation from table images in an academic format. The input to the model is a table image, which is first processed using Tesseract OCR (optical character recognition) to extract the data. The data are then passed through a Transformer (i.e., T5, K2T) model to generate the final text output. We evaluate the performance of our model on a dataset of academic papers. Results show that our network is able to generate high-quality text descriptions of charts. Specifically, the average BLEU scores are 0.072355 for T5 and 0.037907 for K2T. Our results demonstrate the effectiveness of the hybrid deep network approach for text generation from table images in an academic format.</p></div>","PeriodicalId":72089,"journal":{"name":"Advances in computational intelligence","volume":"3 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in computational intelligence","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43674-023-00066-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text generation from charts is a task that involves automatically generating natural language text descriptions of data presented in chart form. This is a useful capability for tasks such as summarizing data for presentation or providing alternative representations of data for accessibility. In this work, we propose a hybrid deep network approach for text generation from table images in an academic format. The input to the model is a table image, which is first processed using Tesseract OCR (optical character recognition) to extract the data. The data are then passed through a Transformer (i.e., T5, K2T) model to generate the final text output. We evaluate the performance of our model on a dataset of academic papers. Results show that our network is able to generate high-quality text descriptions of charts. Specifically, the average BLEU scores are 0.072355 for T5 and 0.037907 for K2T. Our results demonstrate the effectiveness of the hybrid deep network approach for text generation from table images in an academic format.

Abstract Image

使用混合深度网络生成图表到文本
从图表生成文本是一项任务,涉及自动生成以图表形式呈现的数据的自然语言文本描述。这对于总结数据以供呈现或提供数据的替代表示以供访问等任务来说是一项有用的功能。在这项工作中,我们提出了一种混合深度网络方法,用于从学术格式的表格图像中生成文本。模型的输入是表格图像,首先使用Tesseract OCR(光学字符识别)对其进行处理以提取数据。然后,数据通过Transformer(即T5、K2T)模型来生成最终的文本输出。我们在学术论文的数据集上评估了我们的模型的性能。结果表明,我们的网络能够生成高质量的图表文本描述。具体而言,T5的平均BLEU得分为0.072355,K2T的平均BLEU得分为0.037907。我们的结果证明了混合深度网络方法在从学术格式的表格图像中生成文本方面的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信