{"title":"Multi-granularity attribute similarity model for user alignment across social platforms under pre-aligned data sparsity","authors":"Yongqiang Peng , Xiaoliang Chen , Duoqian Miao , Xiaolin Qin , Xu Gu , Peng Lu","doi":"10.1016/j.ipm.2024.103866","DOIUrl":null,"url":null,"abstract":"<div><p>Cross-platform User Alignment (UA) aims to identify accounts belonging to the same individual across multiple social network platforms. This study seeks to enhance the performance of UA tasks while reducing the required sample data. Previous research has focused excessively on model design, lacking optimization throughout the entire process, making it challenging to achieve performance without heavy reliance on labeled data. This paper proposes a semi-supervised Multi-Granularity Attribute Similarity Model (MGASM). First, MGASM optimizes the embedding process through multi-granularity modeling at the levels of characters, words, articles, structures, and labels, and enhances missing data by leveraging adjacent text attributes. Next, MGASM quantifies the correlation between attributes of the same granularity by constructing Multi-Granularity Attribute Cosine Distance Distribution Vectors (MA-CDDVs). These vectors form the basis for a binary classification similarity model trained to calculate similarity scores for user pairs. Additionally, an attribute reappearance score correction (ARSC) mechanism is introduced to further refine the ranking of candidate users. Extensive experiments on the Weibo-Douban and DBLP17-DBLP19 datasets demonstrate that compared to state-of-the-art methods, The hit-precision of the MGASM series has significantly improved by 68.15% and 27.02%, almost reaching 100% precision. The F1 score has increased by 37.6% and 21.4%.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"61 6","pages":"Article 103866"},"PeriodicalIF":7.4000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002255","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-platform User Alignment (UA) aims to identify accounts belonging to the same individual across multiple social network platforms. This study seeks to enhance the performance of UA tasks while reducing the required sample data. Previous research has focused excessively on model design, lacking optimization throughout the entire process, making it challenging to achieve performance without heavy reliance on labeled data. This paper proposes a semi-supervised Multi-Granularity Attribute Similarity Model (MGASM). First, MGASM optimizes the embedding process through multi-granularity modeling at the levels of characters, words, articles, structures, and labels, and enhances missing data by leveraging adjacent text attributes. Next, MGASM quantifies the correlation between attributes of the same granularity by constructing Multi-Granularity Attribute Cosine Distance Distribution Vectors (MA-CDDVs). These vectors form the basis for a binary classification similarity model trained to calculate similarity scores for user pairs. Additionally, an attribute reappearance score correction (ARSC) mechanism is introduced to further refine the ranking of candidate users. Extensive experiments on the Weibo-Douban and DBLP17-DBLP19 datasets demonstrate that compared to state-of-the-art methods, The hit-precision of the MGASM series has significantly improved by 68.15% and 27.02%, almost reaching 100% precision. The F1 score has increased by 37.6% and 21.4%.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.