Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li
{"title":"When graph convolution meets double attention: online privacy disclosure detection with multi-label text classification","authors":"Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li","doi":"10.1007/s10618-023-00992-y","DOIUrl":null,"url":null,"abstract":"<p>With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.</p>","PeriodicalId":55183,"journal":{"name":"Data Mining and Knowledge Discovery","volume":"80 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Mining and Knowledge Discovery","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10618-023-00992-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With the rise of Web 2.0 platforms such as online social media, people’s private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.
期刊介绍:
Advances in data gathering, storage, and distribution have created a need for computational tools and techniques to aid in data analysis. Data Mining and Knowledge Discovery in Databases (KDD) is a rapidly growing area of research and application that builds on techniques and theories from many fields, including statistics, databases, pattern recognition and learning, data visualization, uncertainty modelling, data warehousing and OLAP, optimization, and high performance computing.