Xia Wu, Zehan Li, Yong Wang, Qing Zhao, Ke Wang, Hangyu Hu
{"title":"Privacy Text Clustering Method Based on Burst Feature of Words","authors":"Xia Wu, Zehan Li, Yong Wang, Qing Zhao, Ke Wang, Hangyu Hu","doi":"10.1002/cpe.70269","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Real-time detection of privacy-relevant events in social media faces two fundamental challenges: (1) cluster instability caused by sparse and noisy text data, which leads to center drift; and (2) poor event discernibility in traditional online clustering methods. These limitations severely impair effective privacy monitoring in dynamic social media environments. To address these challenges, we propose an innovative edge intelligence-driven framework that integrates adaptive burst word detection using wavelet-based signal analysis; spectral clustering of identified burst words to establish stable event anchors; and real-time incremental text clustering centered around these fixed anchors. We conduct a comprehensive evaluation on a dataset of 116 million COVID-19-related tweets and obtain the following results: Burst word identification accuracy of 86.28%; cluster purity of 0.875 (37% improvement over the baseline method); throughput of 3000 tweets per minute; and 78% reduction of irrelevant content through effective noise filtering. The key advantages of our approach include: Addressing the persistent cluster drift problem via burst anchoring centers; enabling efficient distributed processing via edge intelligence architecture; providing a practical and scalable solution for real-time social media monitoring; and establishing a new paradigm for privacy-aware event detection systems.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"37 23-24","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.70269","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Real-time detection of privacy-relevant events in social media faces two fundamental challenges: (1) cluster instability caused by sparse and noisy text data, which leads to center drift; and (2) poor event discernibility in traditional online clustering methods. These limitations severely impair effective privacy monitoring in dynamic social media environments. To address these challenges, we propose an innovative edge intelligence-driven framework that integrates adaptive burst word detection using wavelet-based signal analysis; spectral clustering of identified burst words to establish stable event anchors; and real-time incremental text clustering centered around these fixed anchors. We conduct a comprehensive evaluation on a dataset of 116 million COVID-19-related tweets and obtain the following results: Burst word identification accuracy of 86.28%; cluster purity of 0.875 (37% improvement over the baseline method); throughput of 3000 tweets per minute; and 78% reduction of irrelevant content through effective noise filtering. The key advantages of our approach include: Addressing the persistent cluster drift problem via burst anchoring centers; enabling efficient distributed processing via edge intelligence architecture; providing a practical and scalable solution for real-time social media monitoring; and establishing a new paradigm for privacy-aware event detection systems.
期刊介绍:
Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of:
Parallel and distributed computing;
High-performance computing;
Computational and data science;
Artificial intelligence and machine learning;
Big data applications, algorithms, and systems;
Network science;
Ontologies and semantics;
Security and privacy;
Cloud/edge/fog computing;
Green computing; and
Quantum computing.