{"title":"DCDIMB: Dynamic Community-based Diversified Influence Maximization using Bridge Nodes","authors":"Sunil Meena, SHASHANK SINGH, Kuldeep Singh","doi":"10.1145/3664618","DOIUrl":"https://doi.org/10.1145/3664618","url":null,"abstract":"<p>Influence maximization (IM) is the fundamental study of social network analysis. The IM problem finds the top <i>k</i> nodes that have maximum influence in the network. Most of the studies in IM focus on maximizing the number of activated nodes in the static social network. But in real life, social networks are dynamic in nature. This work addresses the diversification of activated nodes in the dynamic social network. This work proposes an objective function that maximizes the number of communities by utilizing bridge nodes. We also propose a diffusion model that considers the role of inactive nodes in influencing a node. We prove the submodularity, and monotonicity of the objective function under the proposed diffusion model. This work analyzes the impact of different ratios of bridge nodes in the seed set on real-world and synthetic datasets. Further, we prove the NP-Hardness of the objective function under the proposed diffusion model. The experiments are conducted on various real-world and synthetic datasets with known and unknown community information. The proposed work experimentally shows that the objective function gives the maximum number of communities considering bridge nodes compared to the benchmark algorithms.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"154 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Know their Customers: An Empirical Study of Online Account Enumeration Attacks","authors":"Maël Maceiras, Kavous Salehzadeh Niksirat, Gaël Bernard, Benoit Garbinato, Mauro Cherubini, Mathias Humbert, Kévin Huguenin","doi":"10.1145/3664201","DOIUrl":"https://doi.org/10.1145/3664201","url":null,"abstract":"<p>Internet users possess accounts on dozens of online services where they are often identified by one of their e-mail addresses. They often use the same address on multiple services and for communicating with their contacts. In this paper, we investigate attacks that enable an adversary (e.g., company, friend) to determine (stealthily or not) whether an individual, identified by their e-mail address, has an account on certain services (i.e., an <i>account enumeration attack</i>). Such attacks on <i>account privacy</i> have serious implications as information about one’s accounts can be used to (1) profile them and (2) improve the effectiveness of phishing. We take a multifaceted approach and study these attacks through a combination of experiments (63 services), surveys (318 respondents), and focus groups (13 participants). We demonstrate the high vulnerability of popular services (93.7%) and the concerns of users about their account privacy, as well as their increased susceptibility to phishing e-mails that impersonate services on which they have an account. We also provide findings on the challenges in implementing countermeasures for service providers and on users’ ideas for enhancing their account privacy. Finally, our interaction with national data protection authorities led to the inclusion of recommendations in their developers’ guide.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"80 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140885847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Dynamic Multimodal Network Slot Concepts from the Web for Forecasting Environmental, Social and Governance Ratings","authors":"Gary Ang, Ee-Peng Lim","doi":"10.1145/3663674","DOIUrl":"https://doi.org/10.1145/3663674","url":null,"abstract":"<p>Dynamic multimodal networks are networks with node attributes from different modalities where the attributes and network relationships evolve across time, i.e. both networks and multimodal attributes are dynamic. For example, dynamic relationship networks between companies that evolve across time due to changes in business strategies and alliances, which are associated with dynamic company attributes from multiple modalities such as textual online news, categorical events, and numerical financial-related data. Such information can be useful in predictive tasks involving companies. Environmental, social and governance (ESG) ratings of companies are important for assessing the sustainability risks of companies. The process of generating ESG ratings by expert analysts is however laborious and time-intensive. We thus explore the use of dynamic multimodal networks extracted from the web for forecasting ESG ratings. Learning such dynamic multimodal networks from the web for forecasting ESG ratings is however challenging due to its heterogeneity, and the low signal-to-noise ratios and non-stationary distributions of web information. Human analysts cope with such issues by learning concepts from past experience through relational thinking, and scanning for such concepts when analyzing new information about a company. In this paper, we propose the Dynamic Multimodal Slot Concept Attention-based Network (DynScan) model. DynScan utilizes slot attention mechanisms together with slot concept alignment and disentanglement loss functions to learn latent slot concepts from dynamic multimodal networks to improve performance on ESG rating forecasting tasks. DynScan is evaluated on forecasting tasks on six data sets, comprising three ESG ratings across two sets of companies. Our experiments show that DynScan outperforms other state-of-the-art models on these forecasting tasks. We also visualize the slot concepts learnt by DynScan on five synthetic datasets and three real-world datasets and observe distinct and meaningful slot concepts being learnt by DynScan across both synthetic and real-world datasets.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"18 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140830824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MuLX-QA: Classifying Multi-Labels and Extracting Rationale Spans in Social Media Posts","authors":"Soham Poddar, Rajdeep Mukherjee, Azlaan Mustafa Samad, Niloy Ganguly, Saptarshi Ghosh","doi":"10.1145/3653303","DOIUrl":"https://doi.org/10.1145/3653303","url":null,"abstract":"<p>While social media platforms play an important role in our daily lives in obtaining the latest news and trends from across the globe, they are known to be prone to widespread proliferation of harmful information in different forms leading to misconceptions among the masses. Accordingly, several prior works have attempted to tag social media posts with labels/classes reflecting their veracity, sentiments, hate content, etc. However, in order to have a convincing impact, it is important to additionally extract the post snippets on which the labelling decision is based. We call such a post snippet as the ‘rationale’. These rationales significantly improve human trust and debuggability of the predictions, especially when detecting misinformation or stigmas from social media posts. These rationale spans or snippets are also helpful in post-classification social analysis, such as for finding out the target communities in hate-speech, or for understanding the arguments or concerns against the intake of vaccines. Also it is observed that a post may express multiple notions of misinformation, hate, sentiment, etc. Thus, the task of determining (one or multiple) labels for a given piece of text, along with the <i>text snippets explaining the rationale behind each of the identified labels</i> is a challenging <i>multi-label, multi-rationale</i> classification task, which is still nascent in the literature. </p><p>While <i>transformer</i>-based encoder-decoder generative models such as BART and T5 are well-suited for the task, in this work we show how a relatively simpler <b>encoder-only</b> discriminative question-answering (QA) model can be effectively trained using <b>simple template-based questions</b> to accomplish the task. We thus propose <b>MuLX-QA</b> and demonstrate its utility in producing (label, rationale span) pairs in two different settings: <i>multi-class</i> (on the <i>HateXplain</i> dataset related to hate speech on social media), and <i>multi-label</i> (on the <i>CAVES</i> dataset related to COVID-19 anti-vaccine concerns). <b>MuLX-QA outperforms heavier generative models</b> in both settings. We also demonstrate the relative advantage of our proposed model MuLX-QA over strong baselines when trained with limited data. We perform several ablation studies, and experiments to better understand the effect of training MuLX-QA with different question prompts, and draw interesting inferences. Additionally, we show that MuLX-QA is effective on social media posts in resource-poor non-English languages as well. Finally, we perform a qualitative analysis of our model predictions and compare them with those of our strongest baseline.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"20 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140200422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangping Zhang, Dongsheng Li, Hansu Gu, Tun Lu, Ning Gu
{"title":"Heterogeneous Graph Neural Network with Personalized and Adaptive Diversity for News Recommendation","authors":"Guangping Zhang, Dongsheng Li, Hansu Gu, Tun Lu, Ning Gu","doi":"10.1145/3649886","DOIUrl":"https://doi.org/10.1145/3649886","url":null,"abstract":"<p>The emergence of online media has facilitated the dissemination of news, but has also introduced the problem of information overload. To address this issue, providing users with accurate and diverse news recommendations has become increasingly important. News possesses rich and heterogeneous content, and the factors that attract users to news reading are varied. Consequently, accurate news recommendation requires modeling of both the heterogeneous content of news and the heterogeneous user-news relationships. Furthermore, users’ news consumption is highly dynamic, which is reflected in the differences in topic concentration among different users and in the real-time changes in user interests. To this end, we propose a Heterogeneous Graph Neural Network with Personalized and Adaptive Diversity for News Recommendation (DivHGNN). DivHGNN first represents the heterogeneous content of news and the heterogeneous user-news relationships as an attributed heterogeneous graph. Then, through a heterogeneous node content adapter, it models the heterogeneous node attributes into aligned and fused node representations. With the proposed attributed heterogeneous graph neural network, DivHGNN integrates the heterogeneous relationships to enhance node representation for accurate news recommendations. We also discuss relation pruning, model deployment, and cold-start issues to further improve model efficiency. In terms of diversity, DivHGNN simultaneously models the variance of nodes through variational representation learning for providing personalized diversity. Additionally, a time-continuous exponentially decaying distribution cache is proposed to model the temporal dynamics of user real-time interests for providing adaptive diversity. Extensive experiments on real-world news datasets demonstrate the effectiveness of the proposed method.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"279 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140073483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fuzzy Influence Maximization in Social Networks","authors":"Ahmad Zareie, Rizos Sakellariou","doi":"10.1145/3650179","DOIUrl":"https://doi.org/10.1145/3650179","url":null,"abstract":"<p>Influence maximization is a fundamental problem in social network analysis. This problem refers to the identification of a set of influential users as initial spreaders to maximize the spread of a message in a network. When such a message is spread, some users may be influenced by it. A common assumption of existing work is that the impact of a message is essentially binary: a user is either influenced (activated) or not influenced (non-activated). However, how strongly a user is influenced by a message may play an important role in this user’s attempt to influence subsequent users and spread the message further; existing methods may fail to model accurately the spreading process and identify influential users. In this paper, we propose a novel approach to model a social network as a fuzzy graph where a fuzzy variable is used to represent the extent to which a user is influenced by a message (user’s activation level). By extending a diffusion model to simulate the spreading process in such a fuzzy graph we conceptually formulate the fuzzy influence maximization problem for which three methods are proposed to identify influential users. Experimental results demonstrate the accuracy of the proposed methods in determining influential users in social networks.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"13 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140003813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Envisioning Information Access Systems: What Makes for Good Tools and a Healthy Web?","authors":"Chirag Shah, Emily M. Bender","doi":"10.1145/3649468","DOIUrl":"https://doi.org/10.1145/3649468","url":null,"abstract":"<p>We observe a recent trend towards applying large language models (LLMs) in search and positioning them as effective information access systems. While the interfaces may look appealing and the apparent breadth of applicability is exciting, we are concerned that the field is rushing ahead with a technology without sufficient study of the uses it is meant to serve, how it would be used, and what its use would mean. We argue that it is important to reassert the central research focus of the field of information retrieval, because information access is not merely an application to be solved by the so-called ‘AI’ techniques du jour. Rather, it is a key human activity, with impacts on both individuals and society. As information scientists, we should be asking what do people and society want and need from information access systems and how do we design and build systems to meet those needs? With that goal, in this conceptual paper we investigate fundamental questions concerning information access from user and societal viewpoints. We revisit foundational work related to information behavior, information seeking, information retrieval, information filtering, and information access to resurface what we know about these fundamental questions and what may be missing. We then provide our conceptual framing about how we could fill this gap, focusing on methods as well as experimental and evaluation frameworks. We consider the Web as an information ecosystem and explore the ways in which synthetic media, produced by LLMs and otherwise, endangers that ecosystem. The primary goal of this conceptual paper is to shed light on what we still do not know about the potential impacts of LLM-based information access systems, how to advance our understanding of user behaviors, and where the next generations of students, scholars, and developers could fruitfully invest their energies.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"35 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Umair, Athman Bouguettaya, Abdallah Lakhdari, Mourad Ouzzani, Yuyun Liu
{"title":"Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using Metadata","authors":"Muhammad Umair, Athman Bouguettaya, Abdallah Lakhdari, Mourad Ouzzani, Yuyun Liu","doi":"10.1145/3645094","DOIUrl":"https://doi.org/10.1145/3645094","url":null,"abstract":"<p>In the context of social media, the integrity of images is often dubious. To tackle this challenge, we introduce <i>Exif2Vec</i>, a novel framework specifically designed to discover modifications in social media images. The proposed framework leverages an image’s metadata to discover changes in an image. We use a service-oriented approach that considers <i>discovery of changes in images</i> as a <i>service</i>. A novel word-embedding based approach is proposed to discover semantic inconsistencies in an image metadata that are reflective of the changes in an image. These inconsistencies are used to measure the severity of changes. The novelty of the approach resides in that it does not require the use of images to determine the underlying changes. We use a pretrained Word2Vec model to conduct experiments. The model is validated on two different fact-checked image datasets, i.e., images related to general context and a context specific image dataset. Notably, our findings showcase the remarkable efficacy of our approach, yielding results of up to 80% accuracy. This underscores the potential of our framework.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"15 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeLink: An Adversarial Framework for Defending against Cross-site User Identity Linkage","authors":"Peng Zhang, Qi Zhou, Tun Lu, Hansu Gu, Ning Gu","doi":"10.1145/3643828","DOIUrl":"https://doi.org/10.1145/3643828","url":null,"abstract":"<p>Cross-site user identity linkage (UIL) aims to link the identities of the same person across different social media platforms. Social media practitioners and service providers can construct composite user portraits based on cross-site UIL, which helps understand user behavior holistically and conduct accurate recommendations and personalization. However, many social media users expect each profile to stay within the platform where it was created and thus do not want the identities of different platforms to be linked. For this problem, we first investigate the approaches people would like to use to defend against cross-site UIL and the corresponding challenges. Based on the findings, we build an adversarial framework - DeLink based on the thoughts of adversarial text generation to help people improve their social media screen names to defend against cross-site UIL. DeLink can support both Chinese and English languages and has good generalizability to the varying numbers of social media accounts and different cross-site user identity linkage models. Extensive evaluations validate DeLink’s better performance, including a higher success rate, higher efficiency, less impact on human perception, and capability to defend against different cross-site UIL models.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"297 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139689969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media","authors":"Lingyao Li, Lizhou Fan, Shubham Atreja, Libby Hemphill","doi":"10.1145/3643829","DOIUrl":"https://doi.org/10.1145/3643829","url":null,"abstract":"<p>Harmful textual content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to this issue is developing detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful textual content. We used ChatGPT to investigate this potential and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful textual content on social media: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically, the model displays a more consistent classification for non-HOT comments than HOT comments compared to human annotations. Our findings also suggest that ChatGPT classifications align with the provided HOT definitions. However, ChatGPT classifies “hateful” and “offensive” as subsets of “toxic.” Moreover, the choice of prompts used to interact with ChatGPT impacts its performance. Based on these insights, our study provides several meaningful implications for employing ChatGPT to detect HOT content, particularly regarding the reliability and consistency of its performance, its understanding and reasoning of the HOT concept, and the impact of prompts on its performance. Overall, our study provides guidance on the potential of using generative AI models for moderating large volumes of user-generated textual content on social media.</p>","PeriodicalId":50940,"journal":{"name":"ACM Transactions on the Web","volume":"8 1","pages":""},"PeriodicalIF":3.5,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139689758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}