Seeing the world from its words: All-embracing Transformers for fingerprint-based indoor localization

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Pervasive and Mobile Computing Pub Date : 2024-03-11 DOI:10.1016/j.pmcj.2024.101912

Son Minh Nguyen , Duc Viet Le , Paul J.M. Havinga

{"title":"Seeing the world from its words: All-embracing Transformers for fingerprint-based indoor localization","authors":"Son Minh Nguyen , Duc Viet Le , Paul J.M. Havinga","doi":"10.1016/j.pmcj.2024.101912","DOIUrl":null,"url":null,"abstract":"<div><p>In this paper, we present all-embracing Transformers (AaTs) that are capable of deftly manipulating attention mechanism for Received Signal Strength (RSS) fingerprints in order to invigorate localizing performance. Since most machine learning models applied to the RSS modality do not possess any attention mechanism, they can merely capture superficial representations. Moreover, compared to textual and visual modalities, the RSS modality is inherently notorious for its sensitivity to environmental dynamics. Such adversities inhibit their access to subtle but distinct representations that characterize the corresponding location, ultimately resulting in significant degradation in the testing phase. In contrast, a major appeal of AaTs is the ability to focus exclusively on relevant anchors in RSS sequences, allowing full rein to the exploitation of subtle and distinct representations for specific locations. This also facilitates disregarding redundant clues formed by noisy ambient conditions, thus enhancing accuracy in localization. Apart from that, explicitly resolving the representation collapse (<em>i.e.</em>, none-informative or homogeneous features, and gradient vanishing) can further invigorate the self-attention process in transformer blocks, by which subtle but distinct representations to specific locations are radically captured with ease. For that purpose, we first enhance our proposed model with two sub-constraints, namely covariance and variance losses at the <em>Anchor2Vec</em>. The proposed constraints are automatically mediated with the primary task towards a novel multi-task learning manner. In an advanced manner, we present further the ultimate in design with a few simple tweaks carefully crafted for transformer encoder blocks. This effort aims to promote representation augmentation via stabilizing the inflow of gradients to these blocks. Thus, the problems of representation collapse in regular Transformers can be tackled. To evaluate our AaTs, we compare the models with the state-of-the-art (SoTA) methods on three benchmark indoor localization datasets. The experimental results confirm our hypothesis and show that our proposed models could deliver much higher and more stable accuracy.</p></div>","PeriodicalId":49005,"journal":{"name":"Pervasive and Mobile Computing","volume":"100 ","pages":"Article 101912"},"PeriodicalIF":3.0000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1574119224000385/pdfft?md5=119bfe7ecffea68a2c6b1240cc6ebda1&pid=1-s2.0-S1574119224000385-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pervasive and Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1574119224000385","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we present all-embracing Transformers (AaTs) that are capable of deftly manipulating attention mechanism for Received Signal Strength (RSS) fingerprints in order to invigorate localizing performance. Since most machine learning models applied to the RSS modality do not possess any attention mechanism, they can merely capture superficial representations. Moreover, compared to textual and visual modalities, the RSS modality is inherently notorious for its sensitivity to environmental dynamics. Such adversities inhibit their access to subtle but distinct representations that characterize the corresponding location, ultimately resulting in significant degradation in the testing phase. In contrast, a major appeal of AaTs is the ability to focus exclusively on relevant anchors in RSS sequences, allowing full rein to the exploitation of subtle and distinct representations for specific locations. This also facilitates disregarding redundant clues formed by noisy ambient conditions, thus enhancing accuracy in localization. Apart from that, explicitly resolving the representation collapse (i.e., none-informative or homogeneous features, and gradient vanishing) can further invigorate the self-attention process in transformer blocks, by which subtle but distinct representations to specific locations are radically captured with ease. For that purpose, we first enhance our proposed model with two sub-constraints, namely covariance and variance losses at the Anchor2Vec. The proposed constraints are automatically mediated with the primary task towards a novel multi-task learning manner. In an advanced manner, we present further the ultimate in design with a few simple tweaks carefully crafted for transformer encoder blocks. This effort aims to promote representation augmentation via stabilizing the inflow of gradients to these blocks. Thus, the problems of representation collapse in regular Transformers can be tackled. To evaluate our AaTs, we compare the models with the state-of-the-art (SoTA) methods on three benchmark indoor localization datasets. The experimental results confirm our hypothesis and show that our proposed models could deliver much higher and more stable accuracy.

查看原文本刊更多论文

从文字看世界基于指纹的室内定位的全能变形金刚

在本文中，我们介绍了能够巧妙地操纵接收信号强度（RSS）指纹的注意力机制，以提高定位性能的全方位变换器（AaTs）。由于大多数应用于 RSS 模式的机器学习模型都不具备任何注意力机制，因此它们只能捕捉肤浅的表征。此外，与文本和视觉模态相比，RSS 模态对环境动态的敏感性与生俱来。这些不利因素抑制了它们对相应位置的微妙而独特的表征的获取，最终导致测试阶段的效果大打折扣。与此相反，AaTs 的一个主要优点是能够只关注 RSS 序列中的相关锚点，从而能够充分利用特定位置的微妙而独特的表征。这也有利于忽略嘈杂环境条件下形成的冗余线索，从而提高定位的准确性。除此以外，明确解决表征坍塌（即无信息或同质特征以及梯度消失）还能进一步激活转换块中的自我注意过程，从而从根本上轻松捕捉特定位置的微妙而独特的表征。为此，我们首先用两个子约束条件来增强我们提出的模型，即 Anchor2Vec 的协方差和方差损失。所提出的约束条件会自动与主要任务相结合，从而实现一种新颖的多任务学习方式。在高级方面，我们通过对变压器编码器模块精心设计的一些简单调整，进一步实现了设计的终极目标。这一努力旨在通过稳定梯度流入这些区块来促进表征增强。因此，普通变压器中的表示崩溃问题可以得到解决。为了评估我们的 AaTs，我们在三个基准室内定位数据集上将这些模型与最先进的 SoTA 方法进行了比较。实验结果证实了我们的假设，并表明我们提出的模型可以提供更高更稳定的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pervasive and Mobile Computing COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS

CiteScore

7.70

自引率

2.30%

发文量

审稿时长

68 days

期刊介绍： As envisioned by Mark Weiser as early as 1991, pervasive computing systems and services have truly become integral parts of our daily lives. Tremendous developments in a multitude of technologies ranging from personalized and embedded smart devices (e.g., smartphones, sensors, wearables, IoTs, etc.) to ubiquitous connectivity, via a variety of wireless mobile communications and cognitive networking infrastructures, to advanced computing techniques (including edge, fog and cloud) and user-friendly middleware services and platforms have significantly contributed to the unprecedented advances in pervasive and mobile computing. Cutting-edge applications and paradigms have evolved, such as cyber-physical systems and smart environments (e.g., smart city, smart energy, smart transportation, smart healthcare, etc.) that also involve human in the loop through social interactions and participatory and/or mobile crowd sensing, for example. The goal of pervasive computing systems is to improve human experience and quality of life, without explicit awareness of the underlying communications and computing technologies. The Pervasive and Mobile Computing Journal (PMC) is a high-impact, peer-reviewed technical journal that publishes high-quality scientific articles spanning theory and practice, and covering all aspects of pervasive and mobile computing and systems.