An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) Pub Date : 2022-03-01 DOI:10.1109/SANER53432.2022.00069

Kevin Moran, Ali Yachnes, George Purnell, Juanyed Mahmud, Michele Tufano, Carlos Bernal Cardenas, D. Poshyvanyk, Zach H’Doubler

{"title":"An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation","authors":"Kevin Moran, Ali Yachnes, George Purnell, Juanyed Mahmud, Michele Tufano, Carlos Bernal Cardenas, D. Poshyvanyk, Zach H’Doubler","doi":"10.1109/SANER53432.2022.00069","DOIUrl":null,"url":null,"abstract":"Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.","PeriodicalId":437520,"journal":{"name":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER53432.2022.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.

查看原文本刊更多论文

在自动化软件文档中使用图像字幕的实证研究

现有的软件文档自动化技术通常试图在两个主要信息源之间进行推理:代码和自然语言。然而，这种推理过程往往由于更抽象的自然语言和更结构化的编程语言之间的词汇差距而变得复杂。图形用户界面(GUI)是弥补这一差距的一个潜在桥梁，因为GUI固有地将有关底层程序功能的重要信息编码为丰富的、基于像素的数据表示。本文提供了对gui与软件的功能、自然语言描述之间的联系的第一个全面的实证研究之一。首先，我们收集、分析并开源了一个大型的功能GUI描述数据集，其中包括来自流行Android应用程序的10204个截图的45,998个描述。这些描述是从人类贴标者那里获得的，并经历了几种质量控制机制。为了深入了解gui的表示潜力，我们研究了四种神经图像字幕模型在提供截图作为输入时预测不同粒度的自然语言描述的能力。我们使用常见的机器翻译指标定量地评估这些模型，并通过大规模的用户研究定性地评估这些模型。最后，我们提供了经验教训，并讨论了多模态模型所显示的潜力，以增强自动化软件文档的未来技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)

自引率

0.00%

发文量