ArXivPub Date : 2024-03-02DOI: 10.1109/icassp48485.2024.10447384
Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu
{"title":"Enhancing Audio Generation Diversity with Visual Information","authors":"Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu","doi":"10.1109/icassp48485.2024.10447384","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10447384","url":null,"abstract":"Audio and sound generation has garnered significant attention in recent years, with a primary focus on improving the quality of generated audios. However, there has been limited research on enhancing the diversity of generated audio, particularly when it comes to audio generation within specific categories. Current models tend to produce homogeneous audio samples within a category. This work aims to address this limitation by improving the diversity of generated audio with visual information. We propose a clustering-based method, leveraging visual information to guide the model in generating distinct audio content within each category. Results on seven categories indicate that extra visual input can largely enhance audio generation diversity. Audio samples are available at https://zeyuxie29.github.io/DiverseAudioGeneration.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"63 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140398649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1109/icassp48485.2024.10445743
Yuxiang Lu, Shalayiding Sirejiding, Bayram Bayramli, Suizhi Huang, Yue Ding, Hongtao Lu
{"title":"Task Indicating Transformer for Task-conditional Dense Predictions","authors":"Yuxiang Lu, Shalayiding Sirejiding, Bayram Bayramli, Suizhi Huang, Yue Ding, Hongtao Lu","doi":"10.1109/icassp48485.2024.10445743","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10445743","url":null,"abstract":"The task-conditional model is a distinctive stream for efficient multi-task learning. Existing works encounter a critical limitation in learning task-agnostic and task-specific representations, primarily due to shortcomings in global context modeling arising from CNN-based architectures, as well as a deficiency in multi-scale feature interaction within the decoder. In this paper, we introduce a novel task-conditional framework called Task Indicating Transformer (TIT) to tackle this challenge. Our approach designs a Mix Task Adapter module within the transformer block, which incorporates a Task Indicating Matrix through matrix decomposition, thereby enhancing long-range dependency modeling and parameter-efficient feature adaptation by capturing intra- and inter-task features. Moreover, we propose a Task Gate Decoder module that harnesses a Task Indicating Vector and gating mechanism to facilitate adaptive multi-scale feature refinement guided by task embeddings. Experiments on two public multi-task dense prediction benchmarks, NYUD-v2 and PASCAL-Context, demonstrate that our approach surpasses state-of-the-art task-conditional methods.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"35 23‐24","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140400748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1145/3614419.3644023
Caroline Violot, Tuugrulcan Elmas, Igor Bilogrevic, Mathias Humbert
{"title":"Shorts vs. Regular Videos on YouTube: A Comparative Analysis of User Engagement and Content Creation Trends","authors":"Caroline Violot, Tuugrulcan Elmas, Igor Bilogrevic, Mathias Humbert","doi":"10.1145/3614419.3644023","DOIUrl":"https://doi.org/10.1145/3614419.3644023","url":null,"abstract":"YouTube introduced the Shorts video format in 2021, allowing users to upload short videos that are prominently displayed on its website and app. Despite having such a large visual footprint, there are no studies to date that have looked at the impact Shorts introduction had on the production and consumption of content on YouTube. This paper presents the first comparative analysis of YouTube Shorts versus regular videos with respect to user engagement (i.e., views, likes, and comments), content creation frequency and video categories. We collected a dataset containing information about 70k channels that posted at least one Short, and we analyzed the metadata of all the videos (9.9M Shorts and 6.9M regular videos) they uploaded between January 2021 and December 2022, spanning a two-year period including the introduction of Shorts. Our longitudinal analysis shows that content creators consistently increased the frequency of Shorts production over this period, especially for newly-created channels, which surpassed that of regular videos. We also observe that Shorts target mostly entertainment categories, while regular videos cover a wide variety of categories. In general, Shorts attract more views and likes per view than regular videos, but attract less comments per view. However, Shorts do not outperform regular videos in the education and political categories as much as they do in other categories. Our study contributes to understanding social media dynamics, to quantifying the spread of short-form content, and to motivating future research on its impact on society.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"21 S1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140403913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1145/3589335.3648318
Nurendra Choudhary, E-Wen Huang, Karthik Subbian, Chandan K. Reddy
{"title":"An Interpretable Ensemble of Graph and Language Models for Improving Search Relevance in E-Commerce","authors":"Nurendra Choudhary, E-Wen Huang, Karthik Subbian, Chandan K. Reddy","doi":"10.1145/3589335.3648318","DOIUrl":"https://doi.org/10.1145/3589335.3648318","url":null,"abstract":"The problem of search relevance in the E-commerce domain is a challenging one since it involves understanding the intent of a user's short nuanced query and matching it with the appropriate products in the catalog. This problem has traditionally been addressed using language models (LMs) and graph neural networks (GNNs) to capture semantic and inter-product behavior signals, respectively. However, the rapid development of new architectures has created a gap between research and the practical adoption of these techniques. Evaluating the generalizability of these models for deployment requires extensive experimentation on complex, real-world datasets, which can be non-trivial and expensive. Furthermore, such models often operate on latent space representations that are incomprehensible to humans, making it difficult to evaluate and compare the effectiveness of different models. This lack of interpretability hinders the development and adoption of new techniques in the field. To bridge this gap, we propose Plug and Play Graph LAnguage Model (PP-GLAM), an explainable ensemble of plug and play models. Our approach uses a modular framework with uniform data processing pipelines. It employs additive explanation metrics to independently decide whether to include (i) language model candidates, (ii) GNN model candidates, and (iii) inter-product behavioral signals. For the task of search relevance, we show that PP-GLAM outperforms several state-of-the-art baselines as well as a proprietary model on real-world multilingual, multi-regional e-commerce datasets. To promote better model comprehensibility and adoption, we also provide an analysis of the explainability and computational complexity of our model. We also provide the public codebase and provide a deployment strategy for practical implementation.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"79 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140405858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1007/978-3-030-54621-2_749-1
Gabriele Iommazzo, C. D’Ambrosio, A. Frangioni, Leo Liberti
{"title":"The Algorithm Configuration Problem","authors":"Gabriele Iommazzo, C. D’Ambrosio, A. Frangioni, Leo Liberti","doi":"10.1007/978-3-030-54621-2_749-1","DOIUrl":"https://doi.org/10.1007/978-3-030-54621-2_749-1","url":null,"abstract":"","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"199 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140403418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1109/icassp48485.2024.10446894
Zexin Feng, Na Zeng, Jiansheng Fang, Xingyue Wang, Xiaoxi Lu, Heng Meng, Jiang Liu
{"title":"Flattening Singular Values of Factorized Convolution for Medical Images","authors":"Zexin Feng, Na Zeng, Jiansheng Fang, Xingyue Wang, Xiaoxi Lu, Heng Meng, Jiang Liu","doi":"10.1109/icassp48485.2024.10446894","DOIUrl":"https://doi.org/10.1109/icassp48485.2024.10446894","url":null,"abstract":"Convolutional neural networks (CNNs) have long been the paradigm of choice for robust medical image processing (MIP). Therefore, it is crucial to effectively and efficiently deploy CNNs on devices with different computing capabilities to support computer-aided diagnosis. Many methods employ factorized convolutional layers to alleviate the burden of limited computational resources at the expense of expressiveness. To this end, given weak medical image-driven CNN model optimization, a Singular value equalization generalizer-induced Factorized Convolution (SFConv) is proposed to improve the expressive power of factorized convolutions in MIP models. We first decompose the weight matrix of convolutional filters into two low-rank matrices to achieve model reduction. Then minimize the KL divergence between the two low-rank weight matrices and the uniform distribution, thereby reducing the number of singular value directions with significant variance. Extensive experiments on fundus and OCTA datasets demonstrate that our SFConv yields competitive expressiveness over vanilla convolutions while reducing complexity.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"61 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140407188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1609/aaai.v38i12.29217
Woo Kyung Kim, Minjong Yoo, Honguk Woo
{"title":"Robust Policy Learning via Offline Skill Diffusion","authors":"Woo Kyung Kim, Minjong Yoo, Honguk Woo","doi":"10.1609/aaai.v38i12.29217","DOIUrl":"https://doi.org/10.1609/aaai.v38i12.29217","url":null,"abstract":"Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains.\u0000Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"14 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140406571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1145/3613904.3642730
JooYoung Seo, Yilin Xia, Bongshin Lee, Sean McCurry, Yu Jun Yam
{"title":"MAIDR: Making Statistical Visualizations Accessible with Multimodal Data Representation","authors":"JooYoung Seo, Yilin Xia, Bongshin Lee, Sean McCurry, Yu Jun Yam","doi":"10.1145/3613904.3642730","DOIUrl":"https://doi.org/10.1145/3613904.3642730","url":null,"abstract":"This paper investigates new data exploration experiences that enable blind users to interact with statistical data visualizations$-$bar plots, heat maps, box plots, and scatter plots$-$leveraging multimodal data representations. In addition to sonification and textual descriptions that are commonly employed by existing accessible visualizations, our MAIDR (multimodal access and interactive data representation) system incorporates two additional modalities (braille and review) that offer complementary benefits. It also provides blind users with the autonomy and control to interactively access and understand data visualizations. In a user study involving 11 blind participants, we found the MAIDR system facilitated the accurate interpretation of statistical visualizations. Participants exhibited a range of strategies in combining multiple modalities, influenced by their past interactions and experiences with data visualizations. This work accentuates the overlooked potential of combining refreshable tactile representation with other modalities and elevates the discussion on the importance of user autonomy when designing accessible data visualizations.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"66 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140406448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ArXivPub Date : 2024-03-01DOI: 10.1145/3613904.3642117
Xi Ding, Buse Çarik, U. Gunturi, Valerie Reyna, Eugenia H. Rho
{"title":"Leveraging Prompt-Based Large Language Models: Predicting Pandemic Health Decisions and Outcomes Through Social Media Language","authors":"Xi Ding, Buse Çarik, U. Gunturi, Valerie Reyna, Eugenia H. Rho","doi":"10.1145/3613904.3642117","DOIUrl":"https://doi.org/10.1145/3613904.3642117","url":null,"abstract":"We introduce a multi-step reasoning framework using prompt-based LLMs to examine the relationship between social media language patterns and trends in national health outcomes. Grounded in fuzzy-trace theory, which emphasizes the importance of gists of causal coherence in effective health communication, we introduce Role-Based Incremental Coaching (RBIC), a prompt-based LLM framework, to identify gists at-scale. Using RBIC, we systematically extract gists from subreddit discussions opposing COVID-19 health measures (Study 1). We then track how these gists evolve across key events (Study 2) and assess their influence on online engagement (Study 3). Finally, we investigate how the volume of gists is associated with national health trends like vaccine uptake and hospitalizations (Study 4). Our work is the first to empirically link social media linguistic patterns to real-world public health trends, highlighting the potential of prompt-based LLMs in identifying critical online discussion patterns that can form the basis of public health communication strategies.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"40 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140399565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel
{"title":"DyPyBench: A Benchmark of Executable Python Software","authors":"Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel","doi":"10.1145/3643742","DOIUrl":"https://doi.org/10.1145/3643742","url":null,"abstract":"Python has emerged as one of the most popular programming languages, extensively utilized in domains such as machine learning, data analysis, and web applications. Python's dynamic nature and extensive usage make it an attractive candidate for dynamic program analysis. However, unlike for other popular languages, there currently is no comprehensive benchmark suite of executable Python projects, which hinders the development of dynamic analyses. This work addresses this gap by presenting DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run (i.e., with fully configured and prepared test suites), and ready to analyze (by integrating with the DynaPyt dynamic analysis framework). The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases. DyPyBench enables various applications in testing and dynamic analysis, of which we explore three in this work: (i) Gathering dynamic call graphs and empirically comparing them to statically computed call graphs, which exposes and quantifies limitations of existing call graph construction techniques for Python. (ii) Using DyPyBench to build a training data set for LExecutor, a neural model that learns to predict values that otherwise would be missing at runtime. (iii) Using dynamically gathered execution traces to mine API usage specifications, which establishes a baseline for future work on specification mining for Python. We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.","PeriodicalId":513202,"journal":{"name":"ArXiv","volume":"47 29","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140400405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}