Shang Wang, Tianqing Zhu, Bo Liu, Ming Ding, Dayong Ye, Wanlei Zhou, Philip Yu
{"title":"Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey","authors":"Shang Wang, Tianqing Zhu, Bo Liu, Ming Ding, Dayong Ye, Wanlei Zhou, Philip Yu","doi":"10.1145/3764113","DOIUrl":"https://doi.org/10.1145/3764113","url":null,"abstract":"With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with four specific scenarios: pre-training, fine-tuning, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines and analyzes potential countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"41 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145035280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ezekiel Soremekun, Mike Papadakis, Maxime Cordy, Yves Le Traon
{"title":"Software Fairness: An Analysis and Survey","authors":"Ezekiel Soremekun, Mike Papadakis, Maxime Cordy, Yves Le Traon","doi":"10.1145/3762170","DOIUrl":"https://doi.org/10.1145/3762170","url":null,"abstract":"In the last decade, researchers have studied fairness as a software property. In particular, how to engineer fair software systems. This includes specifying, designing, and validating fairness properties. However, the landscape of works addressing bias as a software engineering concern is unclear, i.e., techniques and studies that analyze the fairness properties of learning-based software. In this work, we provide a clear view of the state-of-the-art in software fairness analysis. To this end, we collect, categorize and conduct in-depth analysis of 164 publications investigating the fairness of learning-based software systems. Specifically, we study the evaluated fairness measure, the studied tasks, the type of fairness analysis, the main idea of the proposed approaches and the access level (e.g., black, white or grey box). Our findings include the following: (1) Fairness concerns (such as fairness specification and requirements engineering) are under-studied; (2) Fairness measures such as conditional, sequential and intersectional fairness are under-explored; (3) Semi-structured datasets (e.g., audio, image, code and text) are barely studied for fairness analysis in the SE community; and (4) Software fairness analysis techniques hardly employ white-box, in-processing machine learning (ML) analysis methods. In summary, we observed several open challenges including the need to study intersectional/sequential bias, policy-based bias handling and human-in-the-loop, socio-technical bias mitigation.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"69 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145017300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Louis Ohl, Pierre-Alexandre Mattei, Frederic Precioso
{"title":"A Tutorial on Discriminative Clustering and Mutual Information","authors":"Louis Ohl, Pierre-Alexandre Mattei, Frederic Precioso","doi":"10.1145/3748255","DOIUrl":"https://doi.org/10.1145/3748255","url":null,"abstract":"To cluster data is to separate samples into distinctive groups that should ideally have some cohesive properties. Today, numerous clustering algorithms exist, and their differences lie essentially in what can be perceived as “cohesive properties”. Therefore, hypotheses on the nature of clusters must be set: they can be either generative or discriminative. As the last decade witnessed the impressive growth of deep clustering methods that involve neural networks to handle high-dimensional data often in a discriminative manner; we concentrate mainly on the discriminative hypotheses. In this paper, our aim is to provide an accessible historical perspective on the evolution of discriminative clustering methods and notably how the nature of assumptions of the discriminative models changed over time: from decision boundaries to invariance critics. We notably highlight how mutual information has been a historical cornerstone of the progress of (deep) discriminative clustering methods. We also show some known limitations of mutual information and how discriminative clustering methods tried to circumvent those. We then discuss the challenges that discriminative clustering faces with respect to the selection of the number of clusters. Finally, we showcase these techniques using the dedicated Python package, GemClus , that we have developed for discriminative clustering.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"304 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145002955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey","authors":"Chen Ling, Xujiang Zhao, Jiaying Lu, Chengyuan Deng, Can Zheng, Junxiang Wang, Tanmoy Chowdhury, Yun Li, Hejie Cui, Xuchao Zhang, Tianjiao Zhao, Amit Panalkar, Dhagash Mehta, Stefano Pasquali, Wei Cheng, Haoyu Wang, Yanchi Liu, Zhengzhang Chen, Haifeng Chen, Chris White, Quanquan Gu, Jian Pei, Carl Yang, Liang Zhao","doi":"10.1145/3764579","DOIUrl":"https://doi.org/10.1145/3764579","url":null,"abstract":"Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). Domain specification techniques are key to making large language models disruptive in many applications. Specifically, to solve these hurdles, there has been a notable increase in research and practices conducted in recent years on the domain specialization of LLMs. This emerging field of study, with its substantial potential for impact, necessitates a comprehensive and systematic review to summarize better and guide ongoing work in this area. In this article, we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications. First, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. Second, we present an extensive taxonomy of critical application domains that can benefit dramatically from specialized LLMs, discussing their practical significance and open challenges. Last, we offer our insights into the current research status and future trends in this area.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"63 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144930904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minfeng Qi, Qin Wang, Zhipeng Wang, Manvir Schneider, Tianqing Zhu, Shiping Chen, William Knottenbelt, Thomas Hardjono
{"title":"SoK: Bitcoin Layer Two (L2)","authors":"Minfeng Qi, Qin Wang, Zhipeng Wang, Manvir Schneider, Tianqing Zhu, Shiping Chen, William Knottenbelt, Thomas Hardjono","doi":"10.1145/3763232","DOIUrl":"https://doi.org/10.1145/3763232","url":null,"abstract":"In this paper, we present the first Systematization of Knowledge (SoK) on constructing Layer Two (L2) solutions for Bitcoin. We carefully examine a representative subset of ongoing Bitcoin L2 solutions (40 out of 335 extensively investigated cases) and provide a concise yet impactful identification of six classic design patterns through two approaches (i.e., modifying transactions & creating proofs). Notably, we are the first to incorporate the inscription technology (emerged in mid-2023), along with a series of related innovations. We further establish a reference framework that serves as a baseline criterion ideally suited for evaluating the security aspects of Bitcoin L2 solutions, and which can also be extended to broader L2 applications. We apply this framework to evaluate each of the projects we investigated. We find that the inscription-based approaches introduce new <jats:italic toggle=\"yes\">functionality</jats:italic> (i.e., programability) to Bitcoin systems, whereas existing proof-based solutions primarily address scalability challenges. Our security analysis reveals new attack vectors targeting data/state (availability, verification), assets (withdrawal, recovery), and users (disputes, censorship).","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"38 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144924225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdullahi Kutiriko Abubakar, Lee Gillam, Nishanth Sastry
{"title":"The Role of the Internet of Things (IoT) in Achieving the United Nations (UN) Sustainable Development Goals (SDGs) - A Systematic Review","authors":"Abdullahi Kutiriko Abubakar, Lee Gillam, Nishanth Sastry","doi":"10.1145/3765516","DOIUrl":"https://doi.org/10.1145/3765516","url":null,"abstract":"As the 2030 deadline for achieving the Sustainable Development Goals (SDGs) approaches, the Internet of Things (IoT) has become a key enabler of sustainable development. This paper presents a systematic review of IoT-SDG research from 2015–2024, mapping its applications across seven macro-sectors: health, food and agriculture, energy and environment, education and employment, industry and innovation, governance and human rights, and smart cities and smart spaces. Our analysis identifies three major trends: (i) a shift from conceptual designs to real-world deployments, including grassroots innovations in developing economies tailored to local priorities; (ii) increasing reliance on enabling technologies such as cloud, edge, and machine learning, which together enhance scalability and responsiveness; and (iii) the growing use of IoT data not only for operational efficiency, but to quantify the impact of SDG interventions and identify areas for refinement. Despite this progress, barriers remain, including limited connectivity, dependence on centralised infrastructures, and challenges of interoperability, particularly in low-resource settings. These findings underscore the need for context-specific, edge-driven architectures and scalable mobile applications that can bridge digital divides. By synthesising achievements, gaps, and future opportunities, this review offers actionable insights for policymakers, technologists, and researchers seeking to harness IoT more effectively in support of an inclusive and sustainable SDG Vision 2030.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"25 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144919143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory Analysis for Malware Detection: A Comprehensive Survey Using the OSCAR Methodology","authors":"Yasin Dehfouli, Arash Habibi Lashkari","doi":"10.1145/3764580","DOIUrl":"https://doi.org/10.1145/3764580","url":null,"abstract":"The steady growth of malware over the years has now sharply escalated, with a 30% surge in global cyberattacks in 2024. This rise demands advanced detection, as traditional methods often miss sophisticated or fileless malware. Memory analysis detects traces left by any malware in volatile memory, revealing runtime behaviors, privilege escalation attempts, and active processes. An examination of prior research shows that existing surveys on memory analysis have significant gaps, as none provide a comprehensive overview of the field. To address these gaps, this survey systematically proposes key research questions and addresses them using the OSCAR (Obtain, Strategize, Collect, Analyze, Report) methodology. Memory acquisition techniques and tools have been discussed with the most diverse taxonomy provided to the best of our knowledge. Furthermore, forensic methods, tools, and studies are categorized into four distinct approaches, with a comprehensive taxonomy at the end. We also evaluated and ranked memory dump datasets using our proposed scoring system. Finally, the survey covers malware detection methods, examining both machine learning and traditional approaches and their accuracy, benefits, drawbacks, and challenges. This survey aims to provide a comprehensive and up-to-date overview of the field of memory analysis, with a focus on detecting malicious activities.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"29 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emerging Trends in Early Dementia Diagnosis: An Analysis on Advanced Machine Learning Approaches","authors":"Badal Gami, Manav Agrawal, Rahul Katarya","doi":"10.1145/3764578","DOIUrl":"https://doi.org/10.1145/3764578","url":null,"abstract":"Dementia is the waning of cognitive abilities, which is typically seen with the natural aging process and includes issues with memory, language, and problem-solving abilities. Artificial Intelligence (AI) techniques are one viable method for the diagnosis of dementia. Despite recent advances in dementia informatics research and AI, accurate early diagnoses are still far from ideal. This study focuses on showcasing a comprehensive analysis of emerging AI approaches applied to early dementia diagnosis, highlighting trends across neuroimaging, speech, EEG, and clinical data. The proposed work’s main contributions include a summary of the potential challenges and vulnerabilities with dementia informatics research, a wide range of diagnostic issues in dementia care, a descriptive comparison of the elementary manuscripts judged on evaluation parameters such as precision, responsiveness, and definiteness and an offering of a descriptive set of data for developing Machine Learning (ML) and Deep Learning (DL) models. The manuscript also provides a valuable overview of new avenues for informatics research on dementia and advanced ML. The main objective is to fill a gap in the literature by offering an in-depth analysis and overview of the application of AI in dementia research, providing a foundational roadmap for accelerating impactful, data-driven dementia care solutions.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"29 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Slicing of Probabilistic Programs: A Review of Existing Approaches","authors":"Federico Olmedo","doi":"10.1145/3764581","DOIUrl":"https://doi.org/10.1145/3764581","url":null,"abstract":"Program slicing aims to simplify programs by identifying and removing non-essential parts while preserving program behavior. It is widely used for program understanding, debugging, and software maintenance. This article provides an overview of slicing techniques for probabilistic programs, which blend traditional programming constructs with random sampling and conditioning. These programs have experienced a notable resurgence in recent years due to new applications in fields such as artificial intelligence and differential privacy. Concretely, we review the three major slicing techniques currently available for probabilistic programs: the foundational technique by Hur et al., the subsequent development by Amtoft and Banerjee based on probabilistic control flow graphs, and the more recent approach by Navarro and Olmedo based on program specifications. We provide a clear, accessible, and self-contained presentation of these techniques, and compare them across multiple dimensions to provide a deeper insight into the current state-of-the-art in probabilistic program slicing.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"8 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144916111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pierre Nagorny, Bart Kevelham, Sylvain Chagué, Caecilia Charbonnier
{"title":"A Comprehensive Review of Real-Time Multi-View Multi-Person Markerless Motion Capture","authors":"Pierre Nagorny, Bart Kevelham, Sylvain Chagué, Caecilia Charbonnier","doi":"10.1145/3757733","DOIUrl":"https://doi.org/10.1145/3757733","url":null,"abstract":"Markerless human body motion capture promises to remove markers from capture studios, thus simplifying its diverse application fields, from life science to virtual reality. This comprehensive review examines recent advances in real-time markerless motion capture systems from 2020 to 2024, focusing on real-time multi-view, multi-person tracking solutions. Recent advancements, particularly driven by neural network-based pose estimation, have enabled real-time tracking with minimal latency, achieving at least 25 frames per second. Through systematic analysis, we evaluate these methods based on three key metrics: accuracy in pose reconstruction, end-to-end latency, and computational efficiency. Special attention is given to how architectural decisions impact system scalability regarding the number of camera viewpoints and tracked individuals. While current methods show promise for applications like sports analysis and virtual reality, challenges remain in achieving optimal performance across all metrics. Through systematic analysis of leading real-time pipelines, we identify key technical advances and persistent challenges. This synthesis provides critical insights for researchers and practitioners working to develop more robust markerless motion capture systems, while outlining important directions for future research.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"32 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144906146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}