Adrian-Silviu Roman , Béla Genge , Roland Bolboacă
{"title":"Privacy-Oriented Feature Selection for Multivariate Time Series Classification","authors":"Adrian-Silviu Roman , Béla Genge , Roland Bolboacă","doi":"10.1016/j.procs.2024.09.430","DOIUrl":null,"url":null,"abstract":"<div><div>The widespread use of sensors in various domains such as automotive or healthcare has greatly increased the amount of collected data, leading to important privacy issues. Although the sensor data does not directly identify individuals, it can still reveal sensitive information. Addressing privacy in the context of Time Series Classification (TSC) presents challenges, including the need to balance data utility and privacy protection. The current study introduces a novel privacy-oriented feature selection methodology for TSC, aiming to improve data privacy while preserving utility. We propose a dual-model approach, leveraging two classifiers with opposing objectives: a Utility-Focused Classifier (UFC) and a Privacy-Breaking Classifier (PBC). The methodology introduces the Importance Difference Score (IDS) for feature ranking with the objective of selecting features important for the UFC while removing the features essential for the PBC. The approach includes two feature clustering techniques, one based on IDS and the other on K-means clustering, to optimize the feature selection process. The experiments performed on two driving datasets and two Human Activity Recognition (HAR) datasets, evaluate the effectiveness in reducing the accuracy of potential adversarial classifiers while maintaining appropriate levels of utility. We contribute to the state-of-the-art by offering a configurable framework for feature selection to balance the privacy and utility of the data in TSC.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"246 ","pages":"Pages 500-509"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050924024700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The widespread use of sensors in various domains such as automotive or healthcare has greatly increased the amount of collected data, leading to important privacy issues. Although the sensor data does not directly identify individuals, it can still reveal sensitive information. Addressing privacy in the context of Time Series Classification (TSC) presents challenges, including the need to balance data utility and privacy protection. The current study introduces a novel privacy-oriented feature selection methodology for TSC, aiming to improve data privacy while preserving utility. We propose a dual-model approach, leveraging two classifiers with opposing objectives: a Utility-Focused Classifier (UFC) and a Privacy-Breaking Classifier (PBC). The methodology introduces the Importance Difference Score (IDS) for feature ranking with the objective of selecting features important for the UFC while removing the features essential for the PBC. The approach includes two feature clustering techniques, one based on IDS and the other on K-means clustering, to optimize the feature selection process. The experiments performed on two driving datasets and two Human Activity Recognition (HAR) datasets, evaluate the effectiveness in reducing the accuracy of potential adversarial classifiers while maintaining appropriate levels of utility. We contribute to the state-of-the-art by offering a configurable framework for feature selection to balance the privacy and utility of the data in TSC.