Zoubir Barraz , Imane Sebari , Hicham Oufettoul , Kenza Ait el kadi , Nassim Lamrini , Ibtihal Ait Abdelmoula
{"title":"大型光伏电站实时异常检测与分类的整体多模态方法","authors":"Zoubir Barraz , Imane Sebari , Hicham Oufettoul , Kenza Ait el kadi , Nassim Lamrini , Ibtihal Ait Abdelmoula","doi":"10.1016/j.egyai.2025.100525","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a holistic multimodal approach for real-time anomaly detection and classification in large-scale photovoltaic plants. The approach encompasses segmentation, geolocation, and classification of individual photovoltaic modules. A fine-tuned Yolov7 model was trained for the individual module’s segmentation of both modalities; RGB and IR images. The localization of individual solar panels relies on photogrammetric measurements to facilitate maintenance operations. The localization process also links extracted images of the same panel using their geographical coordinates and preprocesses them for the multimodal model input. The study also focuses on optimizing pre-trained models using Bayesian search to improve and fine-tune them with our dataset. The dataset was collected from different systems and technologies within our research platform. It has been curated into 1841 images and classified into five anomaly classes. Grad-CAM, an explainable AI tool, is utilized to compare the use of multimodality to a single modality. Finally, for real-time optimization, the ONNX format was used to optimize the model further for deployment in real-time. The improved ConvNext-Tiny model performed well in both modalities, with 99 % precision, recall, and F1-score for binary classification and 85 % for multi-class classification. In terms of latency, the segmentation models have an inference time of 14 ms and 12 ms for RGB and IR images and 24 ms for detection and classification. The proposed holistic approach includes a built-in feedback loop to ensure the model’s robustness against domain shifts in the production environment.</div></div>","PeriodicalId":34138,"journal":{"name":"Energy and AI","volume":"21 ","pages":"Article 100525"},"PeriodicalIF":9.6000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A holistic multimodal approach for real-time anomaly detection and classification in large-scale photovoltaic plants\",\"authors\":\"Zoubir Barraz , Imane Sebari , Hicham Oufettoul , Kenza Ait el kadi , Nassim Lamrini , Ibtihal Ait Abdelmoula\",\"doi\":\"10.1016/j.egyai.2025.100525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a holistic multimodal approach for real-time anomaly detection and classification in large-scale photovoltaic plants. The approach encompasses segmentation, geolocation, and classification of individual photovoltaic modules. A fine-tuned Yolov7 model was trained for the individual module’s segmentation of both modalities; RGB and IR images. The localization of individual solar panels relies on photogrammetric measurements to facilitate maintenance operations. The localization process also links extracted images of the same panel using their geographical coordinates and preprocesses them for the multimodal model input. The study also focuses on optimizing pre-trained models using Bayesian search to improve and fine-tune them with our dataset. The dataset was collected from different systems and technologies within our research platform. It has been curated into 1841 images and classified into five anomaly classes. Grad-CAM, an explainable AI tool, is utilized to compare the use of multimodality to a single modality. Finally, for real-time optimization, the ONNX format was used to optimize the model further for deployment in real-time. The improved ConvNext-Tiny model performed well in both modalities, with 99 % precision, recall, and F1-score for binary classification and 85 % for multi-class classification. In terms of latency, the segmentation models have an inference time of 14 ms and 12 ms for RGB and IR images and 24 ms for detection and classification. The proposed holistic approach includes a built-in feedback loop to ensure the model’s robustness against domain shifts in the production environment.</div></div>\",\"PeriodicalId\":34138,\"journal\":{\"name\":\"Energy and AI\",\"volume\":\"21 \",\"pages\":\"Article 100525\"},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Energy and AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666546825000576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and AI","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666546825000576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A holistic multimodal approach for real-time anomaly detection and classification in large-scale photovoltaic plants
This paper presents a holistic multimodal approach for real-time anomaly detection and classification in large-scale photovoltaic plants. The approach encompasses segmentation, geolocation, and classification of individual photovoltaic modules. A fine-tuned Yolov7 model was trained for the individual module’s segmentation of both modalities; RGB and IR images. The localization of individual solar panels relies on photogrammetric measurements to facilitate maintenance operations. The localization process also links extracted images of the same panel using their geographical coordinates and preprocesses them for the multimodal model input. The study also focuses on optimizing pre-trained models using Bayesian search to improve and fine-tune them with our dataset. The dataset was collected from different systems and technologies within our research platform. It has been curated into 1841 images and classified into five anomaly classes. Grad-CAM, an explainable AI tool, is utilized to compare the use of multimodality to a single modality. Finally, for real-time optimization, the ONNX format was used to optimize the model further for deployment in real-time. The improved ConvNext-Tiny model performed well in both modalities, with 99 % precision, recall, and F1-score for binary classification and 85 % for multi-class classification. In terms of latency, the segmentation models have an inference time of 14 ms and 12 ms for RGB and IR images and 24 ms for detection and classification. The proposed holistic approach includes a built-in feedback loop to ensure the model’s robustness against domain shifts in the production environment.