ArXivPub Date : 2024-03-11DOI: 10.1609/aaai.v38i5.28314
Shuai Tan, Bin Ji, Yu Ding, Ye Pan
{"title":"Say Anything with Any Style","authors":"Shuai Tan, Bin Ji, Yu Ding, Ye Pan","doi":"10.1609/aaai.v38i5.28314","DOIUrl":"https://doi.org/10.1609/aaai.v38i5.28314","url":null,"abstract":"Generating stylized talking head with diverse head motions is crucial for achieving natural-looking videos but still remains challenging. Previous works either adopt a regressive method to capture the speaking style, resulting in a coarse style that is averaged across all training data, or employ a universal network to synthesize videos with different styles which causes suboptimal performance. To address these, we propose a novel dynamic-weight method, namely Say Anything with Any Style (SAAS), which queries the discrete style representation via a generative model with a learned style codebook. Specifically, we develop a multi-task VQ-VAE that incorporates three closely related tasks to learn a style codebook as a prior for style extraction. This discrete prior, along with the generative model, enhances the precision and robustness when extracting the speaking styles of the given style clips. By utilizing the extracted style, a residual architecture comprising a canonical branch and style-specific branch is employed to predict the mouth shapes conditioned on any driving audio while transferring the speaking style from the source to any desired one. To adapt to different speaking styles, we steer clear of employing a universal network by exploring an elaborate HyperStyle to produce the style-specific weights offset for the style branch. Furthermore, we construct a pose generator and a pose codebook to store the quantized pose representation, allowing us to sample diverse head motions aligned with the audio and the extracted style. Experiments demonstrate that our approach surpasses state-of-the-art methods in terms of both lip-synchronization and stylized expression. Besides, we extend our SAAS to video-driven style editing field and achieve satisfactory performance as well.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-11DOI: 10.1145/3613904.3642124
Sukran Karaosmanoglu, S. Cmentowski, Lennart E. Nacke, Frank Steinicke
{"title":"Born to Run, Programmed to Play: Mapping the Extended Reality Exergames Landscape","authors":"Sukran Karaosmanoglu, S. Cmentowski, Lennart E. Nacke, Frank Steinicke","doi":"10.1145/3613904.3642124","DOIUrl":"https://doi.org/10.1145/3613904.3642124","url":null,"abstract":"Many people struggle to exercise regularly, raising the risk of serious health-related issues. Extended reality (XR) exergames address these hurdles by combining physical exercises with enjoyable, immersive gameplay. While a growing body of research explores XR exergames, no previous review has structured this rapidly expanding research landscape. We conducted a scoping review of the current state of XR exergame research to (i) provide a structured overview, (ii) highlight trends, and (iii) uncover knowledge gaps. After identifying 1318 papers in human-computer interaction and medical databases, we ultimately included 186 papers in our analysis. We provide a quantitative and qualitative summary of XR exergame research, showing current trends and potential future considerations. Finally, we provide a taxonomy of XR exergames to help future design and methodological investigation and reporting.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation","authors":"Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan","doi":"10.1609/aaai.v38i16.29765","DOIUrl":"https://doi.org/10.1609/aaai.v38i16.29765","url":null,"abstract":"Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We call this approach ``the lens of perturbation\". Using this lens, we conduct experiments with various artificial perturbations to explore their impact on LLM performance. Our findings reveal several connections between the properties of perturbations and LLM performance, providing insights into the failure cases of uniform quantization and suggesting potential solutions to improve the robustness of LLM quantization.\u0000To demonstrate the significance of our findings, we implement a simple non-uniform quantization approach based on our insights. Our experiments show that this approach achieves minimal performance degradation on both 4-bit weight quantization and 8-bit quantization for weights and activations. These results validate the correctness of our approach and highlight its potential to improve the efficiency of LLMs without sacrificing performance.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thought Graph: Generating Thought Process for Biological Reasoning","authors":"Chi-Yang Hsu, Kyle Cox, Jiawei Xu, Zhen Tan, Tianhua Zhai, Mengzhou Hu, Dexter Pratt, Tianlong Chen, Ziniu Hu, Ying Ding","doi":"10.1145/3589335.3651572","DOIUrl":"https://doi.org/10.1145/3589335.3651572","url":null,"abstract":"We present the Thought Graph as a novel framework to support complex reasoning and use gene set analysis as an example to uncover semantic relationships between biological processes. Our framework stands out for its ability to provide a deeper understanding of gene sets, significantly surpassing GSEA by 40.28% and LLM baselines by 5.38% based on cosine similarity to human annotations. Our analysis further provides insights into future directions of biological processes naming, and implications for bioinformatics and precision medicine.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Binder, Ingo Skupin, Tim Süberkrüb, Klaus Ostermann
{"title":"Deriving Dependently-Typed OOP from First Principles - Extended Version with Additional Appendices","authors":"David Binder, Ingo Skupin, Tim Süberkrüb, Klaus Ostermann","doi":"10.1145/3649846","DOIUrl":"https://doi.org/10.1145/3649846","url":null,"abstract":"The expression problem describes how most types can easily be extended with new ways to produce the type or new ways to consume the type, but not both. When abstract syntax trees are defined as an algebraic data type, for example, they can easily be extended with new consumers, such as print or eval, but adding a new constructor requires the modification of all existing pattern matches. The expression problem is one way to elucidate the difference between functional or data-oriented programs (easily extendable by new consumers) and object-oriented programs (easily extendable by new producers). This difference between programs which are extensible by new producers or new consumers also exists for dependently typed programming, but with one core difference: Dependently-typed programming almost exclusively follows the functional programming model and not the object-oriented model, which leaves an interesting space in the programming language landscape unexplored. In this paper, we explore the field of dependently-typed object-oriented programming by deriving it from first principles using the principle of duality. That is, we do not extend an existing object-oriented formalism with dependent types in an ad-hoc fashion, but instead start from a familiar data-oriented language and derive its dual fragment by the systematic use of defunctionalization and refunctionalization. Our central contribution is a dependently typed calculus which contains two dual language fragments. We provide type- and semantics-preserving transformations between these two language fragments: defunctionalization and refunctionalization. We have implemented this language and these transformations and use this implementation to explain the various ways in which constructions in dependently typed programming can be explained as special instances of the phenomenon of duality.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140395881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Cubes in Hand: A Design Space of Tangible Cubes for Visualizing 3D Spatio-Temporal Data in Mixed Reality","authors":"Shuqi He, Haonan Yao, Luyan Jiang, Kaiwen Li, Nan Xiang, Yue Li, Hai-Ning Liang, Lingyun Yu","doi":"10.1145/3613904.3642740","DOIUrl":"https://doi.org/10.1145/3613904.3642740","url":null,"abstract":"Tangible interfaces in mixed reality (MR) environments allow for intuitive data interactions. Tangible cubes, with their rich interaction affordances, high maneuverability, and stable structure, are particularly well-suited for exploring multi-dimensional data types. However, the design potential of these cubes is underexplored. This study introduces a design space for tangible cubes in MR, focusing on interaction space, visualization space, sizes, and multiplicity. Using spatio-temporal data, we explored the interaction affordances of these cubes in a workshop (N=24). We identified unique interactions like rotating, tapping, and stacking, which are linked to augmented reality (AR) visualization commands. Integrating user-identified interactions, we created a design space for tangible-cube interactions and visualization. A prototype visualizing global health spending with small cubes was developed and evaluated, supporting both individual and combined cube manipulation. This research enhances our grasp of tangible interaction in MR, offering insights for future design and application in diverse data contexts.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-11DOI: 10.1145/3613905.3650750
Abdallah El Ali, Karthikeya Puttur Venkatraj, Sophie Morosoli, Laurens Naudts, Natali Helberger, Pablo César
{"title":"Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How","authors":"Abdallah El Ali, Karthikeya Puttur Venkatraj, Sophie Morosoli, Laurens Naudts, Natali Helberger, Pablo César","doi":"10.1145/3613905.3650750","DOIUrl":"https://doi.org/10.1145/3613905.3650750","url":null,"abstract":"Advances in Generative Artificial Intelligence (AI) are resulting in AI-generated media output that is (nearly) indistinguishable from human-created content. This can drastically impact users and the media sector, especially given global risks of misinformation. While the currently discussed European AI Act aims at addressing these risks through Article 52's AI transparency obligations, its interpretation and implications remain unclear. In this early work, we adopt a participatory AI approach to derive key questions based on Article 52's disclosure obligations. We ran two workshops with researchers, designers, and engineers across disciplines (N=16), where participants deconstructed Article 52's relevant clauses using the 5W1H framework. We contribute a set of 149 questions clustered into five themes and 18 sub-themes. We believe these can not only help inform future legal developments and interpretations of Article 52, but also provide a starting point for Human-Computer Interaction research to (re-)examine disclosure transparency from a human-centered AI lens.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-11DOI: 10.1007/978-3-031-56992-0_26
Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis
{"title":"MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains","authors":"Marvin Zammit, Antonios Liapis, Georgios N. Yannakakis","doi":"10.1007/978-3-031-56992-0_26","DOIUrl":"https://doi.org/10.1007/978-3-031-56992-0_26","url":null,"abstract":"","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140395873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chart4Blind: An Intelligent Interface for Chart Accessibility Conversion","authors":"Omar Moured, Morris Baumgarten-Egemole, Alina Roitberg, Karin Müller, Thorsten Schwarz, Rainer Stiefelhagen","doi":"10.1145/3640543.3645175","DOIUrl":"https://doi.org/10.1145/3640543.3645175","url":null,"abstract":"In a world driven by data visualization, ensuring the inclusive accessibility of charts for Blind and Visually Impaired (BVI) individuals remains a significant challenge. Charts are usually presented as raster graphics without textual and visual metadata needed for an equivalent exploration experience for BVI people. Additionally, converting these charts into accessible formats requires considerable effort from sighted individuals. Digitizing charts with metadata extraction is just one aspect of the issue; transforming it into accessible modalities, such as tactile graphics, presents another difficulty. To address these disparities, we propose Chart4Blind, an intelligent user interface that converts bitmap image representations of line charts into universally accessible formats. Chart4Blind achieves this transformation by generating Scalable Vector Graphics (SVG), Comma-Separated Values (CSV), and alternative text exports, all comply with established accessibility standards. Through interviews and a formal user study, we demonstrate that even inexperienced sighted users can make charts accessible in an average of 4 minutes using Chart4Blind, achieving a System Usability Scale rating of 90%. In comparison to existing approaches, Chart4Blind provides a comprehensive solution, generating end-to-end accessible SVGs suitable for assistive technologies such as embossed prints (papers and laser cut), 2D tactile displays, and screen readers. For additional information, including open-source codes and demos, please visit our project page https://moured.github.io/chart4blind/.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140395834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-11DOI: 10.1145/3613904.3642651
A. D'Adamo, M. Roel-Lesur, L. Turmo-Vidal, Mohammad Mahdi Dehshibi, D. D. L. Prida, J. R. Díaz-Durán, L. A. Azpicueta-Ruiz, A. Valjamae, A. T. I. Lab, Dei Interactive Systems Group, Department of Computer Science, Engineering, U. C. I. Madrid, Madrid, Spain, Department of Quantitative Theory, Communications, Johan Skytte Institute of Political Studies, University of Tartu, Tartu, Estonia., Ucl Interaction Centre, University College London, London, United Kingdom.
{"title":"SoniWeight Shoes: Investigating Effects and Personalization of a Wearable Sound Device for Altering Body Perception and Behavior","authors":"A. D'Adamo, M. Roel-Lesur, L. Turmo-Vidal, Mohammad Mahdi Dehshibi, D. D. L. Prida, J. R. Díaz-Durán, L. A. Azpicueta-Ruiz, A. Valjamae, A. T. I. Lab, Dei Interactive Systems Group, Department of Computer Science, Engineering, U. C. I. Madrid, Madrid, Spain, Department of Quantitative Theory, Communications, Johan Skytte Institute of Political Studies, University of Tartu, Tartu, Estonia., Ucl Interaction Centre, University College London, London, United Kingdom.","doi":"10.1145/3613904.3642651","DOIUrl":"https://doi.org/10.1145/3613904.3642651","url":null,"abstract":"Changes in body perception influence behavior and emotion and can be induced through multisensory feedback. Auditory feedback to one's actions can trigger such alterations; however, it is unclear which individual factors modulate these effects. We employ and evaluate SoniWeight Shoes, a wearable device based on literature for altering one's weight perception through manipulated footstep sounds. In a healthy population sample across a spectrum of individuals (n=84) with varying degrees of eating disorder symptomatology, physical activity levels, body concerns, and mental imagery capacities, we explore the effects of three sound conditions (low-frequency, high-frequency and control) on extensive body perception measures (demographic, behavioral, physiological, psychological, and subjective). Analyses revealed an impact of individual differences in each of these dimensions. Besides replicating previous findings, we reveal and highlight the role of individual differences in body perception, offering avenues for personalized sonification strategies. Datasets, technical refinements, and novel body map quantification tools are provided.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140396397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}