Multi-granularity attribute similarity model for user alignment across social platforms under pre-aligned data sparsity

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2024-08-23 DOI:10.1016/j.ipm.2024.103866

Yongqiang Peng , Xiaoliang Chen , Duoqian Miao , Xiaolin Qin , Xu Gu , Peng Lu

{"title":"Multi-granularity attribute similarity model for user alignment across social platforms under pre-aligned data sparsity","authors":"Yongqiang Peng , Xiaoliang Chen , Duoqian Miao , Xiaolin Qin , Xu Gu , Peng Lu","doi":"10.1016/j.ipm.2024.103866","DOIUrl":null,"url":null,"abstract":"<div><p>Cross-platform User Alignment (UA) aims to identify accounts belonging to the same individual across multiple social network platforms. This study seeks to enhance the performance of UA tasks while reducing the required sample data. Previous research has focused excessively on model design, lacking optimization throughout the entire process, making it challenging to achieve performance without heavy reliance on labeled data. This paper proposes a semi-supervised Multi-Granularity Attribute Similarity Model (MGASM). First, MGASM optimizes the embedding process through multi-granularity modeling at the levels of characters, words, articles, structures, and labels, and enhances missing data by leveraging adjacent text attributes. Next, MGASM quantifies the correlation between attributes of the same granularity by constructing Multi-Granularity Attribute Cosine Distance Distribution Vectors (MA-CDDVs). These vectors form the basis for a binary classification similarity model trained to calculate similarity scores for user pairs. Additionally, an attribute reappearance score correction (ARSC) mechanism is introduced to further refine the ranking of candidate users. Extensive experiments on the Weibo-Douban and DBLP17-DBLP19 datasets demonstrate that compared to state-of-the-art methods, The hit-precision of the MGASM series has significantly improved by 68.15% and 27.02%, almost reaching 100% precision. The F1 score has increased by 37.6% and 21.4%.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"61 6","pages":"Article 103866"},"PeriodicalIF":7.4000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324002255","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-platform User Alignment (UA) aims to identify accounts belonging to the same individual across multiple social network platforms. This study seeks to enhance the performance of UA tasks while reducing the required sample data. Previous research has focused excessively on model design, lacking optimization throughout the entire process, making it challenging to achieve performance without heavy reliance on labeled data. This paper proposes a semi-supervised Multi-Granularity Attribute Similarity Model (MGASM). First, MGASM optimizes the embedding process through multi-granularity modeling at the levels of characters, words, articles, structures, and labels, and enhances missing data by leveraging adjacent text attributes. Next, MGASM quantifies the correlation between attributes of the same granularity by constructing Multi-Granularity Attribute Cosine Distance Distribution Vectors (MA-CDDVs). These vectors form the basis for a binary classification similarity model trained to calculate similarity scores for user pairs. Additionally, an attribute reappearance score correction (ARSC) mechanism is introduced to further refine the ranking of candidate users. Extensive experiments on the Weibo-Douban and DBLP17-DBLP19 datasets demonstrate that compared to state-of-the-art methods, The hit-precision of the MGASM series has significantly improved by 68.15% and 27.02%, almost reaching 100% precision. The F1 score has increased by 37.6% and 21.4%.

查看原文本刊更多论文

预对齐数据稀疏性下跨社交平台用户对齐的多粒度属性相似性模型

跨平台用户对齐（UA）旨在识别多个社交网络平台上属于同一人的账户。本研究旨在提高 UA 任务的性能，同时减少所需的样本数据。以往的研究过度关注模型设计，缺乏对整个过程的优化，因此在不严重依赖标记数据的情况下实现性能具有挑战性。本文提出了一种半监督多粒度属性相似性模型（MGASM）。首先，MGASM 通过字符、单词、文章、结构和标签层面的多粒度建模优化嵌入过程，并利用相邻文本属性增强缺失数据。接下来，MGASM 通过构建多粒度属性余弦分布向量 (MA-CDDV) 来量化相同粒度属性之间的相关性。这些向量构成了二元分类相似性模型的基础，经过训练后可计算用户对的相似性得分。此外，还引入了属性重现得分校正（ARSC）机制，以进一步完善候选用户的排名。在微博-豆瓣和 DBLP17-DBLP19 数据集上的广泛实验表明，与最先进的方法相比，MGASM 系列的命中精度显著提高了 68.15% 和 27.02%，几乎达到了 100%。F1 分数分别提高了 37.6% 和 21.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.