Learning Semantically Rich Network-based Multi-modal Mobile User Interface Embeddings

IF 4.8 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Interactive Intelligent Systems Pub Date : 2022-11-04 DOI:https://dl.acm.org/doi/10.1145/3533856

Gary Ang, Ee-Peng Lim

{"title":"Learning Semantically Rich Network-based Multi-modal Mobile User Interface Embeddings","authors":"Gary Ang, Ee-Peng Lim","doi":"https://dl.acm.org/doi/10.1145/3533856","DOIUrl":null,"url":null,"abstract":"<p>Semantically rich information from multiple modalities—text, code, images, categorical and numerical data—co-exist in the user interface (UI) design of mobile applications. Moreover, each UI design is composed of inter-linked UI entities that support different functions of an application, e.g., a UI screen comprising a UI taskbar, a menu, and multiple button elements. Existing UI representation learning methods unfortunately are not designed to capture multi-modal and linkage structure between UI entities. To support effective search and recommendation applications over mobile UIs, we need UI representations that integrate latent semantics present in both multi-modal information and linkages between UI entities. In this article, we present a novel self-supervised model—Multi-modal Attention-based Attributed Network Embedding (MAAN) model. MAAN is designed to capture structural network information present within the linkages between UI entities, as well as multi-modal attributes of the UI entity nodes. Based on the variational autoencoder framework, MAAN learns semantically rich UI embeddings in a self-supervised manner by reconstructing the attributes of UI entities and the linkages between them. The generated embeddings can be applied to a variety of downstream tasks: predicting UI elements associated with UI screens, inferring missing UI screen and element attributes, predicting UI user ratings, and retrieving UIs. Extensive experiments, including user evaluations, conducted on datasets from RICO, a rich real-world mobile UI repository, demonstrate that MAAN out-performs other state-of-the-art models. The number of linkages between UI entities can provide further information on the role of different UI entities in UI designs. However, MAAN does not capture edge attributes. To extend and generalize MAAN to learn even richer UI embeddings, we further propose EMAAN to capture edge attributes. We conduct additional extensive experiments on EMAAN, which show that it improves the performance of MAAN and similarly out-performs state-of-the-art models.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"185 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Interactive Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3533856","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Semantically rich information from multiple modalities—text, code, images, categorical and numerical data—co-exist in the user interface (UI) design of mobile applications. Moreover, each UI design is composed of inter-linked UI entities that support different functions of an application, e.g., a UI screen comprising a UI taskbar, a menu, and multiple button elements. Existing UI representation learning methods unfortunately are not designed to capture multi-modal and linkage structure between UI entities. To support effective search and recommendation applications over mobile UIs, we need UI representations that integrate latent semantics present in both multi-modal information and linkages between UI entities. In this article, we present a novel self-supervised model—Multi-modal Attention-based Attributed Network Embedding (MAAN) model. MAAN is designed to capture structural network information present within the linkages between UI entities, as well as multi-modal attributes of the UI entity nodes. Based on the variational autoencoder framework, MAAN learns semantically rich UI embeddings in a self-supervised manner by reconstructing the attributes of UI entities and the linkages between them. The generated embeddings can be applied to a variety of downstream tasks: predicting UI elements associated with UI screens, inferring missing UI screen and element attributes, predicting UI user ratings, and retrieving UIs. Extensive experiments, including user evaluations, conducted on datasets from RICO, a rich real-world mobile UI repository, demonstrate that MAAN out-performs other state-of-the-art models. The number of linkages between UI entities can provide further information on the role of different UI entities in UI designs. However, MAAN does not capture edge attributes. To extend and generalize MAAN to learn even richer UI embeddings, we further propose EMAAN to capture edge attributes. We conduct additional extensive experiments on EMAAN, which show that it improves the performance of MAAN and similarly out-performs state-of-the-art models.

查看原文本刊更多论文

学习语义丰富的基于网络的多模态移动用户界面嵌入

在移动应用程序的用户界面(UI)设计中，来自多种形态(文本、代码、图像、分类和数字数据)的语义丰富信息共存。此外，每个UI设计都是由相互关联的UI实体组成的，这些UI实体支持应用程序的不同功能，例如，一个UI屏幕包含一个UI任务栏、一个菜单和多个按钮元素。遗憾的是，现有的UI表示学习方法不能捕获UI实体之间的多模态和链接结构。为了在移动UI上支持有效的搜索和推荐应用程序，我们需要UI表示，它集成了存在于多模态信息和UI实体之间的链接中的潜在语义。在本文中，我们提出了一种新的自监督模型-基于多模态注意的属性网络嵌入(MAAN)模型。MAAN旨在捕获UI实体之间的链接中存在的结构网络信息，以及UI实体节点的多模态属性。MAAN基于变分自编码器框架，通过重构UI实体的属性和它们之间的联系，以自监督的方式学习语义丰富的UI嵌入。生成的嵌入可以应用于各种下游任务:预测与UI屏幕关联的UI元素，推断缺失的UI屏幕和元素属性，预测UI用户评级，以及检索UI。广泛的实验，包括用户评估，在RICO(一个丰富的真实世界的移动UI存储库)的数据集上进行，证明了MAAN优于其他最先进的模型。UI实体之间的链接数量可以提供关于不同UI实体在UI设计中的作用的进一步信息。然而，MAAN不捕获边缘属性。为了扩展和推广MAAN以学习更丰富的UI嵌入，我们进一步提出了EMAAN来捕获边缘属性。我们对EMAAN进行了额外的广泛实验，结果表明它提高了MAAN的性能，并且同样优于最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Interactive Intelligent Systems Computer Science-Human-Computer Interaction

CiteScore

7.80

自引率

2.90%

发文量

期刊介绍： The ACM Transactions on Interactive Intelligent Systems (TiiS) publishes papers on research concerning the design, realization, or evaluation of interactive systems that incorporate some form of machine intelligence. TIIS articles come from a wide range of research areas and communities. An article can take any of several complementary views of interactive intelligent systems, focusing on: the intelligent technology, the interaction of users with the system, or both aspects at once.