{"title":"A Survey of Reasoning with Foundation Models: Concepts, Methodologies, and Outlook","authors":"Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Yuan Wu, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li","doi":"10.1145/3729218","DOIUrl":"https://doi.org/10.1145/3729218","url":null,"abstract":"Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, e.g. Large Language Models (LLMs), and contribute to the development of AGI.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"37 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Jin, Antonio Pepe, Jianning Li, Christina Gsaxner, Yuxuan Chen, Behrus Puladi, Fen-hua Zhao, Kelsey Pomykala, Jens Kleesiek, Alejandro Frangi, Jan Egger
{"title":"Aortic Vessel Tree Segmentation for Cardiovascular Diseases Treatment: Status Quo","authors":"Yuan Jin, Antonio Pepe, Jianning Li, Christina Gsaxner, Yuxuan Chen, Behrus Puladi, Fen-hua Zhao, Kelsey Pomykala, Jens Kleesiek, Alejandro Frangi, Jan Egger","doi":"10.1145/3728632","DOIUrl":"https://doi.org/10.1145/3728632","url":null,"abstract":"The aortic vessel tree, composed of the aorta and its branches, is crucial for blood supply to the body. Aortic diseases, such as aneurysms and dissections, can lead to life-threatening ruptures, often requiring open surgery. Therefore, patients commonly undergo treatment under constant monitoring, which requires regular inspections of the vessels through medical imaging techniques. Overlapping and comparing aortic vessel tree geometries from consecutive images allows for tracking changes in both the aorta and its branches. Manual reconstruction of the vessel tree is time-consuming and impractical in clinical settings. In contrast, automatic or semi-automatic segmentation algorithms can perform this task much faster, making them suitable for routine clinical use. This paper systematically reviews methods for the automatic and semi-automatic segmentation of the aortic vessel tree, concluding with a discussion on their clinical applicability, the current research landscape, and ongoing challenges.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"50 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geometric Constraints in Deep Learning Frameworks: A Survey","authors":"Vibhas K Vats, David Crandall","doi":"10.1145/3729221","DOIUrl":"https://doi.org/10.1145/3729221","url":null,"abstract":"Stereophotogrammetry [62] is an established technique for scene understanding. Its origins go back to at least the 1800s when people first started to investigate using photographs to measure the physical properties of the world. Since then, thousands of approaches have been explored. The classic geometric technique of Shape from Stereo is built on using geometry to define constraints on scene and camera deep learning without any attempt to explicitly model the geometry. In this survey, we explore geometry-inspired deep learning-based frameworks. We compare and contrast geometry enforcing constraints integrated into deep learning frameworks for depth estimation and other closely related vision tasks. We present a new taxonomy for prevalent geometry enforcing constraints used in modern deep learning frameworks. We also present insightful observations and potential future research directions.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"59 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent Advances in Vision Transformer Robustness Against Adversarial Attacks in Traffic Sign Detection and Recognition: A Survey","authors":"Oluwajuwon Fawole, Danda Rawat","doi":"10.1145/3729167","DOIUrl":"https://doi.org/10.1145/3729167","url":null,"abstract":"The emergence of Vision Transformers (ViTs) has marked a significant advancement in machine learning, particularly in applications requiring robust visual recognition capabilities, such as traffic sign detection for autonomous driving systems. But, deploying these models in adversarial environments where robustness is critical remains a challenge. This survey provides a comprehensive review of the integration of ViTs in traffic sign detection and recognition, emphasizing their vulnerability to adversarial attacks and the methods developed to enhance their robustness. This paper also presents a compressive comparison of ViTs in a tabular form for side-by-side comparison.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"9 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Wendzel, Luca Caviglione, Wojciech Mazurczyk, Aleksandra Mileva, Jana Dittmann, Christian Krätzer, Kevin Lamshöft, Claus Vielhauer, Laura Hartmann, Jörg Keller, Tom Neubert, Sebastian Zillien
{"title":"A Generic Taxonomy for Steganography Methods","authors":"Steffen Wendzel, Luca Caviglione, Wojciech Mazurczyk, Aleksandra Mileva, Jana Dittmann, Christian Krätzer, Kevin Lamshöft, Claus Vielhauer, Laura Hartmann, Jörg Keller, Tom Neubert, Sebastian Zillien","doi":"10.1145/3729165","DOIUrl":"https://doi.org/10.1145/3729165","url":null,"abstract":"A unified understanding of terms is essential for every scientific discipline: steganography is no exception. Being divided into several domains (e.g., network and text steganography), it is crucial to provide a unified terminology as well as a taxonomy that is not limited to few applications or areas. A prime attempt towards a unified understanding of terms was conducted in 2015 with the introduction of a pattern-based taxonomy for network steganography. In 2021, the first work towards a pattern-based taxonomy for all domains of steganography was proposed. However, this initial attempt still faced several shortcomings, e.g., remaining inconsistencies and a lack of patterns for several steganography domains. As the consortium who published the previous studies on steganography patterns, we present the first comprehensive pattern-based taxonomy tailored to fit all known domains of steganography, including smaller and emerging areas, such as filesystem, IoT/CPS, and AI/ML steganography. To make our contribution more effective and promote the use of the taxonomy to advance research, we also provide a unified description method joint with a thorough tutorial on its utilization.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"183 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143805846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proof Scores: A Survey","authors":"Adrián Riesco, Kazuhiro Ogata, Masaki Nakamura, Daniel Gaina, Duong Dinh Tran, Kokichi Futatsugi","doi":"10.1145/3729166","DOIUrl":"https://doi.org/10.1145/3729166","url":null,"abstract":"Proof scores can be regarded as outlines of the formal verification of system properties. They have been historically used by the OBJ family of specification languages. The main advantage of proof scores is that they follow the same syntax as the specification language they are used in, so specifiers can easily adopt them and use as many features as the particular language provides. In this way, proof scores have been successfully used to prove properties of a large number of systems and protocols. However, proof scores also present a number of disadvantages that prevented a large audience from adopting them as proving mechanism. In this paper we present the theoretical foundations of proof scores; the different systems where they have been adopted and their latest developments; the classes of systems successfully verified using proof scores, including the main techniques used for it; the main reasons why they have not been widely adopted; and finally we discuss some directions of future work that might solve the problems discussed previously.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"24 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143805852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-Generated Content (AIGC) for Various Data Modalities: A Survey","authors":"Lin Geng Foo, Hossein Rahmani, Jun Liu","doi":"10.1145/3728633","DOIUrl":"https://doi.org/10.1145/3728633","url":null,"abstract":"AI-generated content (AIGC) methods aim to produce text, images, videos, 3D assets, and other media using AI algorithms. Due to its wide range of applications and the potential of recent works, AIGC developments – especially in Machine Learning (ML) and Deep Learning (DL) – have been attracting significant attention, and this survey focuses on comprehensively reviewing such advancements in ML/DL. AIGC methods have been developed for various data modalities, such as image, video, text, 3D shape, 3D scene, 3D human avatar, 3D motion, and audio – each presenting unique characteristics and challenges. Furthermore, there have been significant developments in cross-modality AIGC methods, where generative methods receive conditioning input in one modality and produce outputs in another. Examples include going from various modalities to image, video, 3D, and audio. This paper provides a comprehensive review of AIGC methods across different data modalities, including both single-modality and cross-modality methods, highlighting the various challenges, representative works, and recent technical directions in each setting. We also survey the representative datasets throughout the modalities, and present comparative results for various modalities. Moreover, we discuss the typical applications of AIGC methods in various domains, challenges, and future research directions.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"227 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143797962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Byeong Su Kim, Jieun Kim, Deokwoo Lee, Beakcheol Jang
{"title":"Visual Question Answering: A Survey of Methods, Datasets, Evaluation, and Challenges","authors":"Byeong Su Kim, Jieun Kim, Deokwoo Lee, Beakcheol Jang","doi":"10.1145/3728635","DOIUrl":"https://doi.org/10.1145/3728635","url":null,"abstract":"Visual question answering (VQA) is a dynamic field of research that aims to generate textual answers from given visual and question information. It is a multimodal field that has garnered significant interest from the computer vision and natural language processing communities. Furthermore, recent advances in these fields have yielded numerous achievements in VQA research. In VQA research, achieving balanced learning that avoids bias towards either visual or question information is crucial. The primary challenge in VQA lies in eliminating noise, while utilizing valuable and accurate information from different modalities. Various research methodologies have been developed to address these issues. In this study, we classify these research methods into three categories: Joint Embedding, Attention Mechanism, and Model-agnostic methods. We analyze the advantages, disadvantages, and limitations of each approach. In addition, we trace the evolution of datasets in VQA research, categorizing them into three types: Real Image, Synthetic Image, and Unbiased datasets. This study also provides an overview of evaluation metrics based on future research directions. Finally, we discuss future research and application directions for VQA research. We anticipate that this survey will offer useful perspectives and essential information to researchers and practitioners seeking to address visual questions effectively.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"31 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143805574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max Klabunde, Tobias Schumacher, Markus Strohmaier, Florian Lemmerich
{"title":"Similarity of Neural Network Models: A Survey of Functional and Representational Measures","authors":"Max Klabunde, Tobias Schumacher, Markus Strohmaier, Florian Lemmerich","doi":"10.1145/3728458","DOIUrl":"https://doi.org/10.1145/3728458","url":null,"abstract":"Measuring similarity of neural networks to understand and improve their behavior has become an issue of great importance and research interest. In this survey, we provide a comprehensive overview of two complementary perspectives of measuring neural network similarity: (i) representational similarity, which considers how <jats:italic>activations</jats:italic> of intermediate layers differ, and (ii) functional similarity, which considers how models differ in their <jats:italic>outputs</jats:italic> . In addition to providing detailed descriptions of existing measures, we summarize and discuss results on the properties of and relationships between these measures, and point to open research problems. We hope our work lays a foundation for more systematic research on the properties and applicability of similarity measures for neural network models.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"183 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143798357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrids of Reinforcement Learning and Evolutionary Computation in Finance: A Survey","authors":"Sandarbh Yadav, Vadlamani Ravi, Shivaram Kalyanakrishnan","doi":"10.1145/3728634","DOIUrl":"https://doi.org/10.1145/3728634","url":null,"abstract":"Many sequential decision-making problems in finance like trading, portfolio optimisation, etc. have been modelled using reinforcement learning (RL) and evolutionary computation (EC). Recent studies on problems from various domains have shown that EC can be used to improve the performance of RL and vice versa. Over the years, researchers have proposed different ways of hybridising RL and EC for trading and portfolio optimisation. However, there is a lack of a thorough survey in this research area, which lies at the intersection of RL, EC, and finance. This paper surveys hybrid techniques combining EC and RL for financial applications and presents a novel taxonomy. Research gaps have been discovered in existing works and some open problems have been identified for future works. A detailed discussion about different design choices made in the existing literature is also included.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"60 1 1","pages":""},"PeriodicalIF":16.6,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143805576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}