{"title":"Identity Linkage Across Diverse Social Networks","authors":"Youcef Benkhedda, F. Azouaou, Sofiane Abbar","doi":"10.1109/ASONAM49781.2020.9381445","DOIUrl":null,"url":null,"abstract":"User identity linkage across online social networks has gained a significant interest in the last few years in diverse applications such as data fusion, de-duplication, personalized advertisement, user profiling, and expert recommendation. Existing techniques investigated the use of personal discrete attributes such as user name, gender, location, and email which are not always available. Other techniques explored the use of network relations. In our proposal, we attempt to design a generic framework for user identity linkage across diverse social networks based exclusively on the widely available textual user generated content. We intentionally selected two social networks, Twitter and Quora, which have different contribution models and serve different purposes, and explore different supervised and unsupervised techniques for matching profiles as well as different language models ranging from simple tf*idf vectorization to more sophisticated BERT embeddings. We discuss the limits of different choices and present some encouraging preliminary results. For example, we find that prolific users can be identified with 84% accuracy. We also present a framework we designed to create the largest publicly available annotated dataset for profile linkage in social networks.","PeriodicalId":196317,"journal":{"name":"2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM49781.2020.9381445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
User identity linkage across online social networks has gained a significant interest in the last few years in diverse applications such as data fusion, de-duplication, personalized advertisement, user profiling, and expert recommendation. Existing techniques investigated the use of personal discrete attributes such as user name, gender, location, and email which are not always available. Other techniques explored the use of network relations. In our proposal, we attempt to design a generic framework for user identity linkage across diverse social networks based exclusively on the widely available textual user generated content. We intentionally selected two social networks, Twitter and Quora, which have different contribution models and serve different purposes, and explore different supervised and unsupervised techniques for matching profiles as well as different language models ranging from simple tf*idf vectorization to more sophisticated BERT embeddings. We discuss the limits of different choices and present some encouraging preliminary results. For example, we find that prolific users can be identified with 84% accuracy. We also present a framework we designed to create the largest publicly available annotated dataset for profile linkage in social networks.