{"title":"Household Appliance Identification Using Vision Transformers and Multimodal Data Fusion","authors":"Mohammed Ayub;El-Sayed M. El-Alfy","doi":"10.1109/TCE.2025.3565850","DOIUrl":null,"url":null,"abstract":"Accurately identifying household appliances from power consumption data collected via smart meters opens up new possibilities for improving energy management in smart homes and providing substantial benefits to both utilities and consumers. It enables real-time optimization of energy use, offers personalized savings recommendations, enhances demand forecasting, provides detailed appliance load profiling, and supports the promotion of energy-efficient technologies. While low-resolution consumption data are preferred due to the limited processing capabilities of residential smart meters, they lack the granularity needed to capture detailed consumption patterns, resulting in performance degradation in many cases. This paper explores a novel approach based on a revised version of a vision transformer for household appliance identification using low-resolution and low-volume data. To maintain superior algorithmic performance, we first fuse different time-series imaging to augment and compensate for features that might be missed by a single technique, enabling efficient and robust feature representation. Next, real-time data augmentation and pretrained weights from Hugging Face transformers are leveraged and fine-tuned through transfer learning to enhance model performance with limited data, accelerate the training process, and improve model generalization. We compare three variants of our proposed solution: (i) multi-class classification problem, (ii) multi-label classification problem, and (iii) multi-target appliance-specific classification problem. Extensive experiments on four public datasets (ENERTALK, UK-DALE, iWAE, and REFIT) demonstrate that our proposed multimodal data fusion vision transformer outperforms non-fusion baseline models. It can achieve near-perfect results across multi-class, multi-label, and multi-target tasks, with overall F1 scores above 97% and perfect scores for several appliances. Several cross-house and cross-dataset experiments are also conducted to assess the generalization capability of the models on data from previously unseen households and datasets. Additionally, an ablation study demonstrates the model’s scalability, as well as its computational and energy efficiency under different appliance combinations.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 2","pages":"2774-2792"},"PeriodicalIF":10.9000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10980365/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Accurately identifying household appliances from power consumption data collected via smart meters opens up new possibilities for improving energy management in smart homes and providing substantial benefits to both utilities and consumers. It enables real-time optimization of energy use, offers personalized savings recommendations, enhances demand forecasting, provides detailed appliance load profiling, and supports the promotion of energy-efficient technologies. While low-resolution consumption data are preferred due to the limited processing capabilities of residential smart meters, they lack the granularity needed to capture detailed consumption patterns, resulting in performance degradation in many cases. This paper explores a novel approach based on a revised version of a vision transformer for household appliance identification using low-resolution and low-volume data. To maintain superior algorithmic performance, we first fuse different time-series imaging to augment and compensate for features that might be missed by a single technique, enabling efficient and robust feature representation. Next, real-time data augmentation and pretrained weights from Hugging Face transformers are leveraged and fine-tuned through transfer learning to enhance model performance with limited data, accelerate the training process, and improve model generalization. We compare three variants of our proposed solution: (i) multi-class classification problem, (ii) multi-label classification problem, and (iii) multi-target appliance-specific classification problem. Extensive experiments on four public datasets (ENERTALK, UK-DALE, iWAE, and REFIT) demonstrate that our proposed multimodal data fusion vision transformer outperforms non-fusion baseline models. It can achieve near-perfect results across multi-class, multi-label, and multi-target tasks, with overall F1 scores above 97% and perfect scores for several appliances. Several cross-house and cross-dataset experiments are also conducted to assess the generalization capability of the models on data from previously unseen households and datasets. Additionally, an ablation study demonstrates the model’s scalability, as well as its computational and energy efficiency under different appliance combinations.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.