{"title":"Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction","authors":"Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto","doi":"arxiv-2408.14762","DOIUrl":"https://doi.org/arxiv-2408.14762","url":null,"abstract":"Commuting flow prediction is an essential task for municipal operations in\u0000the real world. Previous studies have revealed that it is feasible to estimate\u0000the commuting origin-destination (OD) demand within a city using multiple\u0000auxiliary data. However, most existing methods are not suitable to deal with a\u0000similar task at a large scale, namely within a prefecture or the whole nation,\u0000owing to the increased number of geographical units that need to be maintained.\u0000In addition, region representation learning is a universal approach for gaining\u0000urban knowledge for diverse metropolitan downstream tasks. Although many\u0000researchers have developed comprehensive frameworks to describe urban units\u0000from multi-source data, they have not clarified the relationship between the\u0000selected geographical elements. Furthermore, metropolitan areas naturally\u0000preserve ranked structures, like cities and their inclusive districts, which\u0000makes elucidating relations between cross-level urban units necessary.\u0000Therefore, we develop a heterogeneous graph-based model to generate meaningful\u0000region embeddings at multiple spatial resolutions for predicting different\u0000types of inter-level OD flows. To demonstrate the effectiveness of the proposed\u0000method, extensive experiments were conducted using real-world aggregated mobile\u0000phone datasets collected from Shizuoka Prefecture, Japan. The results indicate\u0000that our proposed model outperforms existing models in terms of a uniform urban\u0000structure. We extend the understanding of predicted results using reasonable\u0000explanations to enhance the credibility of the model.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Graph Prompt Learning: A Survey and Beyond","authors":"Qingqing Long, Yuchen Yan, Peiyan Zhang, Chen Fang, Wentao Cui, Zhiyuan Ning, Meng Xiao, Ning Cao, Xiao Luo, Lingjun Xu, Shiyue Jiang, Zheng Fang, Chong Chen, Xian-Sheng Hua, Yuanchun Zhou","doi":"arxiv-2408.14520","DOIUrl":"https://doi.org/arxiv-2408.14520","url":null,"abstract":"Large-scale \"pre-train and prompt learning\" paradigms have demonstrated\u0000remarkable adaptability, enabling broad applications across diverse domains\u0000such as question answering, image recognition, and multimodal retrieval. This\u0000approach fully leverages the potential of large-scale pre-trained models,\u0000reducing downstream data requirements and computational costs while enhancing\u0000model applicability across various tasks. Graphs, as versatile data structures\u0000that capture relationships between entities, play pivotal roles in fields such\u0000as social network analysis, recommender systems, and biological graphs. Despite\u0000the success of pre-train and prompt learning paradigms in Natural Language\u0000Processing (NLP) and Computer Vision (CV), their application in graph domains\u0000remains nascent. In graph-structured data, not only do the node and edge\u0000features often have disparate distributions, but the topological structures\u0000also differ significantly. This diversity in graph data can lead to\u0000incompatible patterns or gaps between pre-training and fine-tuning on\u0000downstream graphs. We aim to bridge this gap by summarizing methods for\u0000alleviating these disparities. This includes exploring prompt design\u0000methodologies, comparing related techniques, assessing application scenarios\u0000and datasets, and identifying unresolved problems and challenges. This survey\u0000categorizes over 100 relevant works in this field, summarizing general design\u0000principles and the latest applications, including text-attributed graphs,\u0000molecules, proteins, and recommendation systems. Through this extensive review,\u0000we provide a foundational understanding of graph prompt learning, aiming to\u0000impact not only the graph mining community but also the broader Artificial\u0000General Intelligence (AGI) community.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, George Chacko
{"title":"Synthetic Networks That Preserve Edge Connectivity","authors":"Lahari Anne, The-Anh Vu-Le, Minhyuk Park, Tandy Warnow, George Chacko","doi":"arxiv-2408.13647","DOIUrl":"https://doi.org/arxiv-2408.13647","url":null,"abstract":"Since true communities within real-world networks are rarely known, synthetic\u0000networks with planted ground truths are valuable for evaluating the performance\u0000of community detection methods. Of the synthetic network generation tools\u0000available, Stochastic Block Models (SBMs) produce networks with ground truth\u0000clusters that well approximate input parameters from real-world networks and\u0000clusterings. However, we show that SBMs can produce disconnected ground truth\u0000clusters, even when given parameters from clusterings where all clusters are\u0000connected. Here we describe the REalistic Cluster Connectivity Simulator\u0000(RECCS), a technique that modifies an SBM synthetic network to improve the fit\u0000to a given clustered real-world network with respect to edge connectivity\u0000within clusters, while maintaining the good fit with respect to other network\u0000and cluster statistics. Using real-world networks up to 13.9 million nodes in\u0000size, we show that RECCS, applied to stochastic block models, results in\u0000synthetic networks that have a better fit to cluster edge connectivity than\u0000unmodified SBMs, while providing roughly the same quality fit for other network\u0000and clustering parameters as unmodified SBMs.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142214962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Bernecker, Ghalia Rehawi, Francesco Paolo Casale, Janine Knauer-Arloth, Annalisa Marsico
{"title":"Random Walk Diffusion for Efficient Large-Scale Graph Generation","authors":"Tobias Bernecker, Ghalia Rehawi, Francesco Paolo Casale, Janine Knauer-Arloth, Annalisa Marsico","doi":"arxiv-2408.04461","DOIUrl":"https://doi.org/arxiv-2408.04461","url":null,"abstract":"Graph generation addresses the problem of generating new graphs that have a\u0000data distribution similar to real-world graphs. While previous diffusion-based\u0000graph generation methods have shown promising results, they often struggle to\u0000scale to large graphs. In this work, we propose ARROW-Diff (AutoRegressive\u0000RandOm Walk Diffusion), a novel random walk-based diffusion approach for\u0000efficient large-scale graph generation. Our method encompasses two components\u0000in an iterative process of random walk sampling and graph pruning. We\u0000demonstrate that ARROW-Diff can scale to large graphs efficiently, surpassing\u0000other baseline methods in terms of both generation time and multiple graph\u0000statistics, reflecting the high quality of the generated graphs.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Academic collaboration on large language model studies increases overall but varies across disciplines","authors":"Lingyao Li, Ly Dinh, Songhua Hu, Libby Hemphill","doi":"arxiv-2408.04163","DOIUrl":"https://doi.org/arxiv-2408.04163","url":null,"abstract":"Interdisciplinary collaboration is crucial for addressing complex scientific\u0000challenges. Recent advancements in large language models (LLMs) have shown\u0000significant potential in benefiting researchers across various fields. To\u0000explore the application of LLMs in scientific disciplines and their\u0000implications for interdisciplinary collaboration, we collect and analyze 50,391\u0000papers from OpenAlex, an open-source platform for scholarly metadata. We first\u0000employ Shannon entropy to assess the diversity of collaboration in terms of\u0000authors' institutions and departments. Our results reveal that most fields have\u0000exhibited varying degrees of increased entropy following the release of\u0000ChatGPT, with Computer Science displaying a consistent increase. Other fields\u0000such as Social Science, Decision Science, Psychology, Engineering, Health\u0000Professions, and Business, Management & Accounting have shown minor to\u0000significant increases in entropy in 2024 compared to 2023. Statistical testing\u0000further indicates that the entropy in Computer Science, Decision Science, and\u0000Engineering is significantly lower than that in health-related fields like\u0000Medicine and Biochemistry, Genetics & Molecular Biology. In addition, our\u0000network analysis based on authors' affiliation information highlights the\u0000prominence of Computer Science, Medicine, and other Computer Science-related\u0000departments in LLM research. Regarding authors' institutions, our analysis\u0000reveals that entities such as Stanford University, Harvard University,\u0000University College London, and Google are key players, either dominating\u0000centrality measures or playing crucial roles in connecting research networks.\u0000Overall, this study provides valuable insights into the current landscape and\u0000evolving dynamics of collaboration networks in LLM research.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Zhang, Laia Castro Herrero, Frank Esser, Alexandre Bovet
{"title":"More than 'Left and Right': Revealing Multilevel Online Political Selective Exposure","authors":"Yuan Zhang, Laia Castro Herrero, Frank Esser, Alexandre Bovet","doi":"arxiv-2408.03828","DOIUrl":"https://doi.org/arxiv-2408.03828","url":null,"abstract":"Selective exposure, individuals' inclination to seek out information that\u0000supports their beliefs while avoiding information that contradicts them, plays\u0000an important role in the emergence of polarization. In the political domain,\u0000selective exposure is usually measured on a left-right ideology scale, ignoring\u0000finer details. Here, we combine survey and Twitter data collected during the\u00002022 Brazilian Presidential Election and investigate selective exposure\u0000patterns between the survey respondents and political influencers. We analyze\u0000the followship network between survey respondents and political influencers and\u0000find a multilevel community structure that reveals a hierarchical organization\u0000more complex than a simple split between left and right. Moreover, depending on\u0000the level we consider, we find different associations between network indices\u0000of exposure patterns and 189 individual attributes of the survey respondents.\u0000For example, at finer levels, the number of influencer communities a survey\u0000respondent follows is associated with several factors, such as demographics,\u0000news consumption frequency, and incivility perception. In comparison, only\u0000their political ideology is a significant factor at coarser levels. Our work\u0000demonstrates that measuring selective exposure at a single level, such as left\u0000and right, misses important information necessary to capture this phenomenon\u0000correctly.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Role Identification based Method for Cyberbullying Analysis in Social Edge Computing","authors":"Runyu Wang, Tun Lu, Peng Zhang, Ning Gu","doi":"arxiv-2408.03502","DOIUrl":"https://doi.org/arxiv-2408.03502","url":null,"abstract":"Over the past few years, many efforts have been dedicated to studying\u0000cyberbullying in social edge computing devices, and most of them focus on three\u0000roles: victims, perpetrators, and bystanders. If we want to obtain a deep\u0000insight into the formation, evolution, and intervention of cyberbullying in\u0000devices at the edge of the Internet, it is necessary to explore more\u0000fine-grained roles. This paper presents a multi-level method for role feature\u0000modeling and proposes a differential evolution-assisted K-means (DEK) method to\u0000identify diverse roles. Our work aims to provide a role identification scheme\u0000for cyberbullying scenarios for social edge computing environments to alleviate\u0000the general safety issues that cyberbullying brings. The experiments on ten\u0000real-world datasets obtained from Weibo and five public datasets show that the\u0000proposed DEK outperforms the existing approaches on the method level. After\u0000clustering, we obtained nine roles and analyzed the characteristics of each\u0000role and their evolution trends under different cyberbullying scenarios. Our\u0000work in this paper can be placed in devices at the edge of the Internet,\u0000leading to better real-time identification performance and adapting to the\u0000broad geographic location and high mobility of mobile devices.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-Scale Graphs Community Detection using Spark GraphFrames","authors":"Elena-Simona Apostol, Adrian-Cosmin Cojocaru, Ciprian-Octavian Truică","doi":"arxiv-2408.03966","DOIUrl":"https://doi.org/arxiv-2408.03966","url":null,"abstract":"With the emergence of social networks, online platforms dedicated to\u0000different use cases, and sensor networks, the emergence of large-scale graph\u0000community detection has become a steady field of research with real-world\u0000applications. Community detection algorithms have numerous practical\u0000applications, particularly due to their scalability with data size.\u0000Nonetheless, a notable drawback of community detection algorithms is their\u0000computational intensity~cite{Apostol2014}, resulting in decreasing performance\u0000as data size increases. For this purpose, new frameworks that employ\u0000distributed systems such as Apache Hadoop and Apache Spark which can seamlessly\u0000handle large-scale graphs must be developed. In this paper, we propose a novel\u0000framework for community detection algorithms, i.e., K-Cliques, Louvain, and\u0000Fast Greedy, developed using Apache Spark GraphFrames. We test their\u0000performance and scalability on two real-world datasets. The experimental\u0000results prove the feasibility of developing graph mining algorithms using\u0000Apache Spark GraphFrames.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erfan Samieyan Sahneh, Gianluca Nogara, Matthew R. DeVerna, Nick Liu, Luca Luceri, Filippo Menczer, Francesco Pierri, Silvia Giordano
{"title":"The Dawn of Decentralized Social Media: An Exploration of Bluesky's Public Opening","authors":"Erfan Samieyan Sahneh, Gianluca Nogara, Matthew R. DeVerna, Nick Liu, Luca Luceri, Filippo Menczer, Francesco Pierri, Silvia Giordano","doi":"arxiv-2408.03146","DOIUrl":"https://doi.org/arxiv-2408.03146","url":null,"abstract":"Bluesky is a Twitter-like decentralized social media platform that has\u0000recently grown in popularity. After an invite-only period, it opened to the\u0000public worldwide on February 6th, 2024. In this paper, we provide a\u0000longitudinal analysis of user activity in the two months around the opening,\u0000studying changes in the general characteristics of the platform due to the\u0000rapid growth of the user base. We observe a broad distribution of activity\u0000similar to more established platforms, but a higher volume of original than\u0000reshared content, and very low toxicity. After opening to the public, Bluesky\u0000experienced a large surge in new users and activity, especially posting English\u0000and Japanese content. In particular, several accounts entered the discussion\u0000with suspicious behavior, like following many accounts and sharing content from\u0000low-credibility news outlets. Some of these have already been classified as\u0000spam or suspended, suggesting effective moderation.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141946779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jibing Gong, Jiquan Peng, Jin Qu, ShuYing Du, Kaiyu Wang
{"title":"Enhancing Twitter Bot Detection via Multimodal Invariant Representations","authors":"Jibing Gong, Jiquan Peng, Jin Qu, ShuYing Du, Kaiyu Wang","doi":"arxiv-2408.03096","DOIUrl":"https://doi.org/arxiv-2408.03096","url":null,"abstract":"Detecting Twitter Bots is crucial for maintaining the integrity of online\u0000discourse, safeguarding democratic processes, and preventing the spread of\u0000malicious propaganda. However, advanced Twitter Bots today often employ\u0000sophisticated feature manipulation and account farming techniques to blend\u0000seamlessly with genuine user interactions, posing significant challenges to\u0000existing detection models. In response to these challenges, this paper proposes\u0000a novel Twitter Bot Detection framework called BotSAI. This framework enhances\u0000the consistency of multimodal user features, accurately characterizing various\u0000modalities to distinguish between real users and bots. Specifically, the\u0000architecture integrates information from users, textual content, and\u0000heterogeneous network topologies, leveraging customized encoders to obtain\u0000comprehensive user feature representations. The heterogeneous network encoder\u0000efficiently aggregates information from neighboring nodes through oversampling\u0000techniques and local relationship transformers. Subsequently, a multi-channel\u0000representation mechanism maps user representations into invariant and specific\u0000subspaces, enhancing the feature vectors. Finally, a self-attention mechanism\u0000is introduced to integrate and refine the enhanced user representations,\u0000enabling efficient information interaction. Extensive experiments demonstrate\u0000that BotSAI outperforms existing state-of-the-art methods on two major Twitter\u0000Bot Detection benchmarks, exhibiting superior performance. Additionally,\u0000systematic experiments reveal the impact of different social relationships on\u0000detection accuracy, providing novel insights for the identification of social\u0000bots.","PeriodicalId":501032,"journal":{"name":"arXiv - CS - Social and Information Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141969666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}