Household Appliance Identification Using Vision Transformers and Multimodal Data Fusion

IF 10.9 2区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Consumer Electronics Pub Date : 2025-04-30 DOI:10.1109/TCE.2025.3565850

Mohammed Ayub;El-Sayed M. El-Alfy

{"title":"Household Appliance Identification Using Vision Transformers and Multimodal Data Fusion","authors":"Mohammed Ayub;El-Sayed M. El-Alfy","doi":"10.1109/TCE.2025.3565850","DOIUrl":null,"url":null,"abstract":"Accurately identifying household appliances from power consumption data collected via smart meters opens up new possibilities for improving energy management in smart homes and providing substantial benefits to both utilities and consumers. It enables real-time optimization of energy use, offers personalized savings recommendations, enhances demand forecasting, provides detailed appliance load profiling, and supports the promotion of energy-efficient technologies. While low-resolution consumption data are preferred due to the limited processing capabilities of residential smart meters, they lack the granularity needed to capture detailed consumption patterns, resulting in performance degradation in many cases. This paper explores a novel approach based on a revised version of a vision transformer for household appliance identification using low-resolution and low-volume data. To maintain superior algorithmic performance, we first fuse different time-series imaging to augment and compensate for features that might be missed by a single technique, enabling efficient and robust feature representation. Next, real-time data augmentation and pretrained weights from Hugging Face transformers are leveraged and fine-tuned through transfer learning to enhance model performance with limited data, accelerate the training process, and improve model generalization. We compare three variants of our proposed solution: (i) multi-class classification problem, (ii) multi-label classification problem, and (iii) multi-target appliance-specific classification problem. Extensive experiments on four public datasets (ENERTALK, UK-DALE, iWAE, and REFIT) demonstrate that our proposed multimodal data fusion vision transformer outperforms non-fusion baseline models. It can achieve near-perfect results across multi-class, multi-label, and multi-target tasks, with overall F1 scores above 97% and perfect scores for several appliances. Several cross-house and cross-dataset experiments are also conducted to assess the generalization capability of the models on data from previously unseen households and datasets. Additionally, an ablation study demonstrates the model’s scalability, as well as its computational and energy efficiency under different appliance combinations.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 2","pages":"2774-2792"},"PeriodicalIF":10.9000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10980365/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately identifying household appliances from power consumption data collected via smart meters opens up new possibilities for improving energy management in smart homes and providing substantial benefits to both utilities and consumers. It enables real-time optimization of energy use, offers personalized savings recommendations, enhances demand forecasting, provides detailed appliance load profiling, and supports the promotion of energy-efficient technologies. While low-resolution consumption data are preferred due to the limited processing capabilities of residential smart meters, they lack the granularity needed to capture detailed consumption patterns, resulting in performance degradation in many cases. This paper explores a novel approach based on a revised version of a vision transformer for household appliance identification using low-resolution and low-volume data. To maintain superior algorithmic performance, we first fuse different time-series imaging to augment and compensate for features that might be missed by a single technique, enabling efficient and robust feature representation. Next, real-time data augmentation and pretrained weights from Hugging Face transformers are leveraged and fine-tuned through transfer learning to enhance model performance with limited data, accelerate the training process, and improve model generalization. We compare three variants of our proposed solution: (i) multi-class classification problem, (ii) multi-label classification problem, and (iii) multi-target appliance-specific classification problem. Extensive experiments on four public datasets (ENERTALK, UK-DALE, iWAE, and REFIT) demonstrate that our proposed multimodal data fusion vision transformer outperforms non-fusion baseline models. It can achieve near-perfect results across multi-class, multi-label, and multi-target tasks, with overall F1 scores above 97% and perfect scores for several appliances. Several cross-house and cross-dataset experiments are also conducted to assess the generalization capability of the models on data from previously unseen households and datasets. Additionally, an ablation study demonstrates the model’s scalability, as well as its computational and energy efficiency under different appliance combinations.

查看原文本刊更多论文

基于视觉变压器和多模态数据融合的家用电器识别

通过智能电表收集的电力消耗数据准确识别家用电器，为改善智能家居的能源管理开辟了新的可能性，并为公用事业和消费者提供了巨大的利益。它可以实现能源使用的实时优化，提供个性化的节约建议，增强需求预测，提供详细的设备负载分析，并支持节能技术的推广。由于住宅智能电表的处理能力有限，低分辨率的消费数据是首选，但它们缺乏捕获详细消费模式所需的粒度，在许多情况下导致性能下降。本文探讨了一种基于修订版本的家用电器识别视觉变压器的新方法，该变压器使用低分辨率和小容量数据。为了保持优越的算法性能，我们首先融合不同的时间序列图像来增强和补偿单一技术可能遗漏的特征，从而实现高效和鲁棒的特征表示。接下来，利用实时数据增强和预训练的权重，通过迁移学习进行微调，以增强有限数据下的模型性能，加速训练过程，提高模型泛化。我们比较了我们提出的解决方案的三个变体：(i)多类分类问题，（ii）多标签分类问题，以及（iii）多目标特定设备分类问题。在四个公共数据集（ENERTALK, UK-DALE， iWAE和REFIT）上进行的大量实验表明，我们提出的多模态数据融合视觉转换器优于非融合基线模型。它可以在多类别、多标签和多目标任务中获得近乎完美的结果，F1总分超过97%，在多个设备中获得满分。还进行了几个跨房屋和跨数据集的实验，以评估模型对以前未见过的家庭和数据集的数据的泛化能力。此外，烧蚀研究证明了该模型的可扩展性，以及在不同设备组合下的计算和能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Consumer Electronics 工程技术-电信学

CiteScore

7.70

自引率

9.30%

发文量

审稿时长

3.3 months

期刊介绍： The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.