{"title":"Natural Language Processing for Dialects of a Language: A Survey","authors":"Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, Doris Dippold","doi":"10.1145/3712060","DOIUrl":"https://doi.org/10.1145/3712060","url":null,"abstract":"State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German, among others. We observe that past work in NLP concerning dialects goes deeper than mere dialect classification, and extends to several NLU and NLG tasks. For these tasks, we describe classical machine learning using statistical models, along with the recent deep learning-based approaches based on pre-trained language models. We expect that this survey will be useful to NLP researchers interested in building equitable language technologies by rethinking LLM benchmarks and model architectures.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"87 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Security and Privacy Challenges of Large Language Models: A Survey","authors":"Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu","doi":"10.1145/3712001","DOIUrl":"https://doi.org/10.1145/3712001","url":null,"abstract":"Large language models (LLMs) have demonstrated extraordinary capabilities and contributed to multiple fields, such as generating and summarizing text, language translation, and question-answering. Nowadays, LLMs have become very popular tools in natural language processing (NLP) tasks, with the capability to analyze complicated linguistic patterns and provide relevant responses depending on the context. While offering significant advantages, these models are also vulnerable to security and privacy attacks, such as jailbreaking attacks, data poisoning attacks, and personally identifiable information (PII) leakage attacks. This survey provides a thorough review of the security and privacy challenges of LLMs, along with the application-based risks in various domains, such as transportation, education, and healthcare. We assess the extent of LLM vulnerabilities, investigate emerging security and privacy attacks against LLMs, and review potential defense mechanisms. Additionally, the survey outlines existing research gaps and highlights future research directions.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"29 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comprehensive Review on Group Re-identification in Surveillance Videos","authors":"KAMAKSHYA NAYAK, Debi Prosad Dogra","doi":"10.1145/3711126","DOIUrl":"https://doi.org/10.1145/3711126","url":null,"abstract":"Computer vision plays an important role in the automated analysis of human groups. The appearance of human groups has been studied for various reasons, including detection, identification, tracking, and re-identification. Person re-identification has been studied extensively over the last decade. Despite significant efforts by the computer vision research community, person re-identification often suffers from issues such as similar clothing appearances, occlusion, viewpoint changes, etc. On the contrary, group re-identification has not received much attention. It involves identifying human groups across multiple non-overlapping camera views. It is a challenging problem that suffers from issues related to person re-identification and additional challenges like variations in the number of persons, the structural layout of groups, etc. This paper summarises the research paradigms of human group analysis. It reviews the recent advancements in group re-identification, including key challenges, datasets, and state-of-the-art methods. The paper concludes with a discussion of open research challenges and future directions in group re-identification, including the need for reliable techniques, varied datasets, and ethical considerations regarding privacy. Overall, this paper offers a thorough and up-to-date summary of the most recent findings in group re-identification. It also identifies the research gaps as placeholders for further study.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"16 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Trustworthy AI-Empowered Real-Time Bidding for Online Advertisement Auctioning","authors":"Xiaoli Tang, Han Yu","doi":"10.1145/3701741","DOIUrl":"https://doi.org/10.1145/3701741","url":null,"abstract":"Artificial intelligence-empowred Real-Time Bidding (AIRTB) is regarded as one of the most enabling technologies for online advertising. It has attracted significant research attention from diverse fields such as pattern recognition, game theory and mechanism design. Despite of its remarkable development and deployment, the AIRTB system can sometimes harm the interest of its participants (e.g., depleting the advertisers’ budget with various kinds of fraud). As such, building trustworthy AIRTB auctioning systems has emerged as an important direction of research in this field in recent years. Due to the highly interdisciplinary nature of this field and a lack of a comprehensive survey, it is a challenge for researchers to enter this field and contribute towards building trustworthy AIRTB technologies. This paper bridges this important gap in trustworthy AIRTB literature. We start by analysing the key concerns of various AIRTB stakeholders and identify five main dimensions of trust building in AIRTB, namely robustness, explainability, fairness, auditability & accountability, and environmental well-being. For each of these dimensions, we propose a unique taxonomy of the state of the art, trace the root causes of possible breakdown of trust, and discuss the necessity of the given dimension. This is followed by a comprehensive review of existing strategies for fulfilling the requirements of each trust dimension. In addition, we discuss the promising future directions of research essential towards building trustworthy AIRTB systems to benefit the field of online advertising.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"82 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ikram Ullah, Munam Ali Shah, Abid Khan, Mohsen Guizani
{"title":"Location Privacy Schemes in Vehicular Networks: Taxonomy, Comparative Analysis, Design Challenges, and Future Opportunities","authors":"Ikram Ullah, Munam Ali Shah, Abid Khan, Mohsen Guizani","doi":"10.1145/3711681","DOIUrl":"https://doi.org/10.1145/3711681","url":null,"abstract":"Vehicular ad-hoc networks (VANETs) have revolutionized the world with smart traffic management, better utilizing the road environment, and providing safety and convenience to the vehicles’ drivers. Despite the useful features of VANETs, there are some privacy issues, which hinder their way toward achieving smarter and safer traffic in the world. Location privacy is one of the critical research challenges for the efficient deployment of VANETs. This challenge can be solved using a pseudonym instead of an actual vehicle identity in the beacon messages. For this purpose, many location privacy schemes are introduced in the literature. In this paper, we thoroughly review the existing location privacy schemes and present their comprehensive taxonomy. We discuss the design challenges for the development of an efficient location privacy scheme. Moreover, the existing location privacy techniques are critically analyzed based on diverse road network environments and parameters. Various issues and challenges regarding the pseudonym-changing process are elaborated in detail. Finally, we discuss the future trends for the implementation of location privacy in a vehicular network.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"25 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative AI Empowered Network Digital Twins: Architecture, Technologies, and Applications","authors":"Tong Li, Qingyue Long, Haoye Chai, Shiyuan Zhang, Fenyu Jiang, Haoqiang Liu, Wenzhen Huang, Depeng Jin, Yong Li","doi":"10.1145/3711682","DOIUrl":"https://doi.org/10.1145/3711682","url":null,"abstract":"The rapid advancement of mobile networks highlights the limitations of traditional network planning and optimization methods, particularly in modeling, evaluation, and application. Network Digital Twins, which simulate networks in the digital domain for evaluation, offer a solution to these challenges. This concept is further enhanced by generative AI technology, which promises more efficient and accurate AI-driven data generation for network simulation and optimization. This survey provides insights into generative AI-empowered network digital twins. We begin by outlining the architecture of a network digital twin, which encompasses both digital and physical domains. This architecture involves four key steps: data processing and network monitoring, digital replication and network simulation, designing and training network optimizers, Sim2Real and network control. Next, we systematically discuss the related studies in each step and make a detailed taxonomy of the problem studied, the methods used, and the key designs leveraged. Each step is examined with a focus on the role of generative AI, from estimating missing data and simulating network behaviors to designing control strategies and bridging the gap between digital and physical domains. Finally, we discuss the open issues and challenges of generative AI-based network digital twins.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"82 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142961341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trustworthy AI-based Performance Diagnosis Systems for Cloud Applications: A Review","authors":"Ruyue Xin, Jingye Wang, Peng Chen, Zhiming Zhao","doi":"10.1145/3701740","DOIUrl":"https://doi.org/10.1145/3701740","url":null,"abstract":"Performance diagnosis systems are defined as detecting abnormal performance phenomena and play a crucial role in cloud applications. An effective performance diagnosis system is often developed based on artificial intelligence (AI) approaches, which can be summarized into a general framework from data to models. However, the AI-based framework has potential hazards that could degrade the user experience and trust. For example, a lack of data privacy may compromise the security of AI models, and low robustness can be hard to apply in complex cloud environments. Therefore, defining the requirements for building a trustworthy AI-based performance diagnosis system has become essential. This article systematically reviews trustworthiness requirements in AI-based performance diagnosis systems. We first introduce trustworthiness requirements and extract six key requirements from a technical perspective, including data privacy, fairness, robustness, explainability, efficiency, and human intervention. We then unify these requirements into a general performance diagnosis framework, ranging from data collection to model development. Next, we comprehensively provide related works for each component and concrete actions to improve trustworthiness in the framework. Finally, we identify possible research directions and challenges for the future development of trustworthy AI-based performance diagnosis systems.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"57 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142940440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed Jubur, PrakashPrakash Shrestha, Nitesh Saxena
{"title":"An In-Depth Analysis of Password Managers and Two-Factor Authentication Tools","authors":"Mohammed Jubur, PrakashPrakash Shrestha, Nitesh Saxena","doi":"10.1145/3711117","DOIUrl":"https://doi.org/10.1145/3711117","url":null,"abstract":"Passwords remain the primary authentication method in online services, a domain increasingly crucial in our digital age. However, passwords suffer from several well-documented security and usability issues. Addressing these concerns, password managers and two-factor authentication (2FA) have emerged as key solutions. This paper examines these methods with a focus on enhancing password security without compromising usability. Utilizing an adapted Bonneau et al. (IEEE S&P 2012) framework tailored to the specific challenges of password managers and 2FA. This allows us to categorize and evaluate prominent solutions from both academic research and industry practice, with a focus on their security, privacy, and usability. A crucial aspect of our study involves evaluating the effectiveness of a combined PM+2FA system in balancing security and usability. This study not only examines current trends but also suggests potential areas for future research, offering valuable insights to both users and developers in the evolving landscape of digital security.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"12 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterization of Android Malwares and their families","authors":"Tejpal Sharma, Dhavleesh Rattan","doi":"10.1145/3708500","DOIUrl":"https://doi.org/10.1145/3708500","url":null,"abstract":"Nowadays, smartphones have made our lives easier and have become essential gadgets for us. Apart from calling, mobiles are used for various purposes, such as banking, chatting, data storage, connecting to the internet and running apps which make life easier. Therefore, attackers are developing new methods or malware to steal smartphone data. Primarily, the study outlines various types of Android malware families, the evolution of Android malware and its effects on detection techniques over time. We report malware timelines and Android app datasets with their source web links. Data is collected from various recent studies and reported. In this study, we have reported 384 Android malware families and their year of discovery, i.e., from 2001 to 2020. According to the malfunctions they perform on the device, we categorized the families into 11 types. Information about datasets which is divided into three categories, along with their source links is presented. The categorization and timeline of malware will make it easy for researchers to focus on upcoming trends according to the malware category and activities they perform. Various open issues and future challenges are also addressed for future researchers.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"3 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ons Aouedi, Van An Le, Kandaraj Piamrat, Yusheng Ji
{"title":"Deep Learning on Network Traffic Prediction: Recent Advances, Analysis, and Future Directions","authors":"Ons Aouedi, Van An Le, Kandaraj Piamrat, Yusheng Ji","doi":"10.1145/3703447","DOIUrl":"https://doi.org/10.1145/3703447","url":null,"abstract":"From the perspective of telecommunications, next-generation networks or beyond 5G will inevitably face the challenge of a growing number of users and devices. Such growth results in high-traffic generation with limited network resources. Thus, the analysis of the traffic and the precise forecast of user demands is essential for developing an intelligent network. In this line, Machine Learning (ML) and especially Deep Learning (DL) models can further benefit from the huge amount of network data. They can act in the background to analyze and predict traffic conditions more accurately than ever, and help to optimize the design and management of network services. Recently, a significant amount of research effort has been devoted to this area, greatly advancing network traffic prediction (NTP) abilities. In this paper, we bring together NTP and DL-based models and present recent advances in DL for NTP. We provide a detailed explanation of popular approaches and categorize the literature based on these approaches. Moreover, as a technical study, we conduct different data analyses and experiments with several DL-based models for traffic prediction. Finally, discussions regarding the challenges and future directions are provided.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"80 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142929449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}