Visualization question answering using introspective program synthesis

Yanju Chen, Xifeng Yan, Yu Feng
{"title":"Visualization question answering using introspective program synthesis","authors":"Yanju Chen, Xifeng Yan, Yu Feng","doi":"10.1145/3519939.3523709","DOIUrl":null,"url":null,"abstract":"While data visualization plays a crucial role in gaining insights from data, generating answers over complex visualizations from natural language questions is far from an easy task. Mainstream approaches reduce data visualization queries to a semantic parsing problem, which either relies on expensive-to-annotate supervised training data that pairs natural language questions with logical forms, or weakly supervised models that incorporate a larger corpus but fail on long-tailed queries without explanations. This paper aims to answer data visualization queries by automatically synthesizing the corresponding program from natural language. At the core of our technique is an abstract synthesis engine that is bootstrapped by an off-the-shelf weakly supervised model and an optimal synthesis algorithm guided by triangle alignment constraints, which represent consistency among natural language, visualization, and the synthesized program. Starting with a few tentative answers obtained from an off-the-shelf statistical model, our approach first involves an abstract synthesizer that generates a set of sketches that are consistent with the answers. Then we design an instance of optimal synthesis to complete one of the candidate sketches by satisfying common type constraints and maximizing the consistency among three parties, i.e., natural language, the visualization, and the candidate program. We implement the proposed idea in a system called Poe that can answer visualization queries from natural language. Our method is fully automated and does not require users to know the underlying schema of the visualizations. We evaluate Poe on 629 visualization queries and our experiment shows that Poe outperforms state-of-the-arts by improving the accuracy from 44% to 59%.","PeriodicalId":140942,"journal":{"name":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3519939.3523709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

While data visualization plays a crucial role in gaining insights from data, generating answers over complex visualizations from natural language questions is far from an easy task. Mainstream approaches reduce data visualization queries to a semantic parsing problem, which either relies on expensive-to-annotate supervised training data that pairs natural language questions with logical forms, or weakly supervised models that incorporate a larger corpus but fail on long-tailed queries without explanations. This paper aims to answer data visualization queries by automatically synthesizing the corresponding program from natural language. At the core of our technique is an abstract synthesis engine that is bootstrapped by an off-the-shelf weakly supervised model and an optimal synthesis algorithm guided by triangle alignment constraints, which represent consistency among natural language, visualization, and the synthesized program. Starting with a few tentative answers obtained from an off-the-shelf statistical model, our approach first involves an abstract synthesizer that generates a set of sketches that are consistent with the answers. Then we design an instance of optimal synthesis to complete one of the candidate sketches by satisfying common type constraints and maximizing the consistency among three parties, i.e., natural language, the visualization, and the candidate program. We implement the proposed idea in a system called Poe that can answer visualization queries from natural language. Our method is fully automated and does not require users to know the underlying schema of the visualizations. We evaluate Poe on 629 visualization queries and our experiment shows that Poe outperforms state-of-the-arts by improving the accuracy from 44% to 59%.
使用内省程序合成的可视化问题回答
虽然数据可视化在从数据中获得洞察力方面起着至关重要的作用,但从自然语言问题中生成复杂可视化的答案远非一件容易的事情。主流方法将数据可视化查询简化为语义解析问题,这要么依赖于昂贵的注释监督训练数据,将自然语言问题与逻辑形式配对,要么依赖于包含更大语料库的弱监督模型,但在没有解释的长尾查询上失败。本文旨在通过从自然语言中自动合成相应的程序来回答数据可视化查询。我们技术的核心是一个抽象的合成引擎,它由一个现成的弱监督模型和一个由三角形对齐约束引导的最优合成算法引导,这代表了自然语言、可视化和合成程序之间的一致性。从从现成的统计模型中获得的一些试探性答案开始,我们的方法首先涉及一个抽象合成器,该合成器生成一组与答案一致的草图。然后,我们设计了一个最优综合实例,通过满足公共类型约束和最大化三方(即自然语言、可视化和候选程序)之间的一致性来完成候选草图之一。我们在一个名为Poe的系统中实现了所提出的想法,该系统可以回答来自自然语言的可视化查询。我们的方法是完全自动化的,不需要用户知道可视化的底层模式。我们在629个可视化查询中对Poe进行了评估,我们的实验表明Poe的准确率从44%提高到59%,超过了目前的水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信