Abhinay Pandya, Panos Kostakos, Hassan Mehmood, Marta Cortés, Ekaterina Gilman, M. Oussalah, S. Pirttikangas
{"title":"Privacy preserving sentiment analysis on multiple edge data streams with Apache NiFi","authors":"Abhinay Pandya, Panos Kostakos, Hassan Mehmood, Marta Cortés, Ekaterina Gilman, M. Oussalah, S. Pirttikangas","doi":"10.1109/EISIC49498.2019.9108851","DOIUrl":null,"url":null,"abstract":"Sentiment analysis, also known as opinion mining, plays a big role in both private and public sector Business Intelligence (BI); it attempts to improve public and customer experience. Nevertheless, de-identified sentiment scores from public social media posts can compromise individual privacy due to their vulnerability to record linkage attacks. Established privacy-preserving methods like k-anonymity, l-diversity and t-closeness are offline models exclusively designed for data at rest. Recently, a number of online anonymization algorithms (CASTLE, SKY, SWAF) have been proposed to complement the functional requirements of streaming applications, but without open-source implementation. In this paper, we present a reusable Apache NiFi dataflow that buffers tweets from multiple edge devices and performs anonymized sentiment analysis in real-time, using randomization. The solution can be easily adapted to suit different scenarios, enabling researchers to deploy custom anonymization algorithms.","PeriodicalId":117256,"journal":{"name":"2019 European Intelligence and Security Informatics Conference (EISIC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 European Intelligence and Security Informatics Conference (EISIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EISIC49498.2019.9108851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Sentiment analysis, also known as opinion mining, plays a big role in both private and public sector Business Intelligence (BI); it attempts to improve public and customer experience. Nevertheless, de-identified sentiment scores from public social media posts can compromise individual privacy due to their vulnerability to record linkage attacks. Established privacy-preserving methods like k-anonymity, l-diversity and t-closeness are offline models exclusively designed for data at rest. Recently, a number of online anonymization algorithms (CASTLE, SKY, SWAF) have been proposed to complement the functional requirements of streaming applications, but without open-source implementation. In this paper, we present a reusable Apache NiFi dataflow that buffers tweets from multiple edge devices and performs anonymized sentiment analysis in real-time, using randomization. The solution can be easily adapted to suit different scenarios, enabling researchers to deploy custom anonymization algorithms.