The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery

Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper
{"title":"The Story Behind the Lines: Line Charts as a Gateway to Dataset Discovery","authors":"Daomin Ji, Hui Luo, Zhifeng Bao, J. Shane Culpepper","doi":"arxiv-2408.09506","DOIUrl":null,"url":null,"abstract":"Line charts are a valuable tool for data analysis and exploration, distilling\nessential insights from a dataset. However, access to the underlying dataset\nbehind a line chart is rarely readily available. In this paper, we explore a\nnovel dataset discovery problem, dataset discovery via line charts, focusing on\nthe use of line charts as queries to discover datasets within a large data\nrepository that are capable of generating similar line charts. To solve this\nproblem, we propose a novel approach called Fine-grained Cross-modal Relevance\nLearning Model (FCM), which aims to estimate the relevance between a line chart\nand a candidate dataset. To achieve this goal, FCM first employs a visual\nelement extractor to extract informative visual elements, i.e., lines and\ny-ticks, from a line chart. Then, two novel segment-level encoders are adopted\nto learn representations for a line chart and a dataset, preserving\nfine-grained information, followed by a cross-modal matcher to match the\nlearned representations in a fine-grained way. Furthermore, we extend FCM to\nsupport line chart queries generated based on data aggregation. Last, we\npropose a benchmark tailored for this problem since no such dataset exists.\nExtensive evaluation on the new benchmark verifies the effectiveness of our\nproposed method. Specifically, our proposed approach surpasses the best\nbaseline by 30.1% and 41.0% in terms of prec@50 and ndcg@50, respectively.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Line charts are a valuable tool for data analysis and exploration, distilling essential insights from a dataset. However, access to the underlying dataset behind a line chart is rarely readily available. In this paper, we explore a novel dataset discovery problem, dataset discovery via line charts, focusing on the use of line charts as queries to discover datasets within a large data repository that are capable of generating similar line charts. To solve this problem, we propose a novel approach called Fine-grained Cross-modal Relevance Learning Model (FCM), which aims to estimate the relevance between a line chart and a candidate dataset. To achieve this goal, FCM first employs a visual element extractor to extract informative visual elements, i.e., lines and y-ticks, from a line chart. Then, two novel segment-level encoders are adopted to learn representations for a line chart and a dataset, preserving fine-grained information, followed by a cross-modal matcher to match the learned representations in a fine-grained way. Furthermore, we extend FCM to support line chart queries generated based on data aggregation. Last, we propose a benchmark tailored for this problem since no such dataset exists. Extensive evaluation on the new benchmark verifies the effectiveness of our proposed method. Specifically, our proposed approach surpasses the best baseline by 30.1% and 41.0% in terms of prec@50 and ndcg@50, respectively.
线条背后的故事折线图是发现数据集的入口
折线图是数据分析和探索的重要工具,能从数据集中提炼出重要的见解。然而,人们很少能随时访问折线图背后的底层数据集。在本文中,我们探讨了一个新的数据集发现问题--通过折线图发现数据集,重点是使用折线图作为查询来发现大型数据存储库中能够生成类似折线图的数据集。为了解决这个问题,我们提出了一种名为细粒度跨模态相关性学习模型(FCM)的新方法,旨在估计折线图与候选数据集之间的相关性。为实现这一目标,FCM 首先使用视觉元素提取器从折线图中提取信息丰富的视觉元素,即线条和y-ticks。然后,采用两个新颖的分段级编码器来学习线形图和数据集的表征,保留细粒度信息,接着采用跨模态匹配器以细粒度方式匹配学习到的表征。此外,我们还将 FCM 扩展到支持基于数据聚合生成的折线图查询。最后,我们提出了一个专门针对这一问题的基准,因为目前还不存在这样的数据集。具体来说,我们提出的方法在prec@50和ndcg@50方面分别比最佳基准高出30.1%和41.0%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信