DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong
{"title":"DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization","authors":"Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong","doi":"arxiv-2408.07401","DOIUrl":null,"url":null,"abstract":"Data visualization (DV) is the fundamental and premise tool to improve the\nefficiency in conveying the insights behind the big data, which has been widely\naccepted in existing data-driven world. Task automation in DV, such as\nconverting natural language queries to visualizations (i.e., text-to-vis),\ngenerating explanations from visualizations (i.e., vis-to-text), answering\nDV-related questions in free form (i.e. FeVisQA), and explicating tabular data\n(i.e., table-to-text), is vital for advancing the field. Despite their\npotential, the application of pre-trained language models (PLMs) like T5 and\nBERT in DV has been limited by high costs and challenges in handling\ncross-modal information, leading to few studies on PLMs for DV. We introduce\n\\textbf{DataVisT5}, a novel PLM tailored for DV that enhances the T5\narchitecture through a hybrid objective pre-training and multi-task fine-tuning\nstrategy, integrating text and DV datasets to effectively interpret cross-modal\nsemantics. Extensive evaluations on public datasets show that DataVisT5\nconsistently outperforms current state-of-the-art models on various DV-related\ntasks. We anticipate that DataVisT5 will not only inspire further research on\nvertical PLMs but also expand the range of applications for PLMs.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"440 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in free form (i.e. FeVisQA), and explicating tabular data (i.e., table-to-text), is vital for advancing the field. Despite their potential, the application of pre-trained language models (PLMs) like T5 and BERT in DV has been limited by high costs and challenges in handling cross-modal information, leading to few studies on PLMs for DV. We introduce \textbf{DataVisT5}, a novel PLM tailored for DV that enhances the T5 architecture through a hybrid objective pre-training and multi-task fine-tuning strategy, integrating text and DV datasets to effectively interpret cross-modal semantics. Extensive evaluations on public datasets show that DataVisT5 consistently outperforms current state-of-the-art models on various DV-related tasks. We anticipate that DataVisT5 will not only inspire further research on vertical PLMs but also expand the range of applications for PLMs.
DataVisT5:用于联合理解文本和数据可视化的预训练语言模型
数据可视化(Data Visualization,DV)是提高传达大数据背后见解效率的基础和前提工具,在现有的数据驱动世界中已被广泛接受。DV 中的任务自动化,如将自然语言查询转换为可视化(即文本到可视化)、从可视化生成解释(即可视化到文本)、以自由形式回答 DV 相关问题(即 FeVisQA)以及阐释表格数据(即表格到文本),对于推动该领域的发展至关重要。尽管预训练语言模型(PLMs)(如 T5 和 BERT)潜力巨大,但其在 DV 中的应用却因成本高昂和处理跨模态信息的挑战而受到限制,导致针对 DV 的预训练语言模型的研究寥寥无几。我们介绍了textbf{DataVisT5},这是一种为DV量身定制的新型PLM,它通过混合目标预训练和多任务微调策略增强了T5架构,整合了文本和DV数据集,从而有效地解释了跨模态语义。在公共数据集上进行的广泛评估表明,DataVisT5 在各种 DV 相关任务上的表现始终优于当前最先进的模型。我们预计,DataVisT5 不仅会激发对垂直 PLM 的进一步研究,而且还会扩大 PLM 的应用范围。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信