{"title":"Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution","authors":"Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, Kathleen McKeown","doi":"arxiv-2409.07072","DOIUrl":"https://doi.org/arxiv-2409.07072","url":null,"abstract":"Recent state-of-the-art authorship attribution methods learn authorship\u0000representations of texts in a latent, non-interpretable space, hindering their\u0000usability in real-world applications. Our work proposes a novel approach to\u0000interpreting these learned embeddings by identifying representative points in\u0000the latent space and utilizing LLMs to generate informative natural language\u0000descriptions of the writing style of each point. We evaluate the alignment of\u0000our interpretable space with the latent one and find that it achieves the best\u0000prediction agreement compared to other baselines. Additionally, we conduct a\u0000human evaluation to assess the quality of these style descriptions, validating\u0000their utility as explanations for the latent space. Finally, we investigate\u0000whether human performance on the challenging AA task improves when aided by our\u0000system's explanations, finding an average improvement of around +20% in\u0000accuracy.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin
{"title":"SimulBench: Evaluating Language Models with Creative Simulation Tasks","authors":"Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin","doi":"arxiv-2409.07641","DOIUrl":"https://doi.org/arxiv-2409.07641","url":null,"abstract":"We introduce SimulBench, a benchmark designed to evaluate large language\u0000models (LLMs) across a diverse collection of creative simulation scenarios,\u0000such as acting as a Linux terminal or playing text games with users. While\u0000these simulation tasks serve as effective measures of an LLM's general\u0000intelligence, they are seldom incorporated into existing benchmarks. A major\u0000challenge is to develop an evaluation framework for testing different LLMs\u0000fairly while preserving the multi-round interactive nature of simulation tasks\u0000between users and AI. To tackle this issue, we suggest using a fixed LLM as a\u0000user agent to engage with an LLM to collect dialogues first under different\u0000tasks. Then, challenging dialogue scripts are extracted for evaluating\u0000different target LLMs. To facilitate automatic assessment on DataName{}, GPT-4\u0000is employed as the evaluator, tasked with reviewing the quality of the final\u0000response generated by the target LLMs given multi-turn dialogue scripts. Our\u0000comprehensive experiments indicate that these simulation tasks continue to pose\u0000a significant challenge with their unique natures and show the gap between\u0000proprietary models and the most advanced open LLMs. For example, GPT-4-turbo\u0000outperforms LLaMA-3-70b-Chat on 18.55% more cases.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization","authors":"Mehrdad Zakershahrak, Samira Ghodratnama","doi":"arxiv-2409.07335","DOIUrl":"https://doi.org/arxiv-2409.07335","url":null,"abstract":"The rapid advancement of artificial intelligence systems has brought the\u0000challenge of AI alignment to the forefront of research, particularly in complex\u0000decision-making and task execution. As these systems surpass human-level\u0000performance in sophisticated problems, ensuring their alignment with human\u0000values, intentions, and ethical guidelines becomes crucial. Building on\u0000previous work in explanation generation for human-agent alignment, we address\u0000the more complex dynamics of multi-agent systems and human-AI teams. This paper\u0000introduces a novel approach to model alignment through weak-to-strong\u0000generalization in the context of language models. We present a framework where\u0000a strong model facilitates the improvement of a weaker model, bridging the gap\u0000between explanation generation and model alignment. Our method, formalized as a\u0000facilitation function, allows for the transfer of capabilities from advanced\u0000models to less capable ones without direct access to extensive training data.\u0000Our results suggest that this facilitation-based approach not only enhances\u0000model performance but also provides insights into the nature of model alignment\u0000and the potential for scalable oversight of AI systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gated Slot Attention for Efficient Linear-Time Sequence Modeling","authors":"Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu","doi":"arxiv-2409.07146","DOIUrl":"https://doi.org/arxiv-2409.07146","url":null,"abstract":"Linear attention Transformers and their gated variants, celebrated for\u0000enabling parallel training and efficient recurrent inference, still fall short\u0000in recall-intensive tasks compared to traditional Transformers and demand\u0000significant resources for training from scratch. This paper introduces Gated\u0000Slot Attention (GSA), which enhances Attention with Bounded-memory-Control\u0000(ABC) by incorporating a gating mechanism inspired by Gated Linear Attention\u0000(GLA). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing\u0000context-aware memory reading and adaptive forgetting to improve memory capacity\u0000while maintaining compact recurrent state size. This design greatly enhances\u0000both training and inference efficiency through GLA's hardware-efficient\u0000training algorithm and reduced state size. Additionally, retaining the softmax\u0000operation is particularly beneficial in \"finetuning pretrained Transformers to\u0000RNNs\" (T2R) settings, reducing the need for extensive training from scratch.\u0000Extensive experiments confirm GSA's superior performance in scenarios requiring\u0000in-context recall and in T2R settings.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Generative Agents to Create Tip Sheets for Investigative Data Reporting","authors":"Joris Veerbeek, Nicholas Diakopoulos","doi":"arxiv-2409.07286","DOIUrl":"https://doi.org/arxiv-2409.07286","url":null,"abstract":"This paper introduces a system using generative AI agents to create tip\u0000sheets for investigative data reporting. Our system employs three specialized\u0000agents--an analyst, a reporter, and an editor--to collaboratively generate and\u0000refine tips from datasets. We validate this approach using real-world\u0000investigative stories, demonstrating that our agent-based system generally\u0000generates more newsworthy and accurate insights compared to a baseline model\u0000without agents, although some variability was noted between different stories.\u0000Our findings highlight the potential of generative AI to provide leads for\u0000investigative data reporting.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai
{"title":"Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective","authors":"Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai","doi":"arxiv-2409.07388","DOIUrl":"https://doi.org/arxiv-2409.07388","url":null,"abstract":"Multimodal affective computing (MAC) has garnered increasing attention due to\u0000its broad applications in analyzing human behaviors and intentions, especially\u0000in text-dominated multimodal affective computing field. This survey presents\u0000the recent trends of multimodal affective computing from NLP perspective\u0000through four hot tasks: multimodal sentiment analysis, multimodal emotion\u0000recognition in conversation, multimodal aspect-based sentiment analysis and\u0000multimodal multi-label emotion recognition. The goal of this survey is to\u0000explore the current landscape of multimodal affective research, identify\u0000development trends, and highlight the similarities and differences across\u0000various tasks, offering a comprehensive report on the recent progress in\u0000multimodal affective computing from an NLP perspective. This survey covers the\u0000formalization of tasks, provides an overview of relevant works, describes\u0000benchmark datasets, and details the evaluation metrics for each task.\u0000Additionally, it briefly discusses research in multimodal affective computing\u0000involving facial expressions, acoustic signals, physiological signals, and\u0000emotion causes. Additionally, we discuss the technical approaches, challenges,\u0000and future directions in multimodal affective computing. To support further\u0000research, we released a repository that compiles related works in multimodal\u0000affective computing, providing detailed resources and references for the\u0000community.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling","authors":"Jaewoo Song, Andrew Zhu, Chris Callison-Burch","doi":"arxiv-2409.06949","DOIUrl":"https://doi.org/arxiv-2409.06949","url":null,"abstract":"Developing a consistent and reliable AI game master for text-based games is a\u0000challenging task due to the limitations of large language models (LLMs) and the\u0000complexity of the game master's role. This paper presents a novel approach to\u0000enhance AI game masters by leveraging function calling in the context of the\u0000table-top role-playing game \"Jim Henson's Labyrinth: The Adventure Game.\" Our\u0000methodology involves integrating game-specific controls through functions,\u0000which we show improves the narrative quality and state update consistency of\u0000the AI game master. The experimental results, based on human evaluations and\u0000unit tests, demonstrate the effectiveness of our approach in enhancing gameplay\u0000experience and maintaining coherence with the game state. This work contributes\u0000to the advancement of game AI and interactive storytelling, offering insights\u0000into the design of more engaging and consistent AI-driven game masters.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games","authors":"Juhwan Choi, YoungBin Kim","doi":"arxiv-2409.06518","DOIUrl":"https://doi.org/arxiv-2409.06518","url":null,"abstract":"Large language models (LLMs) have become a dominant approach in natural\u0000language processing, yet their internal knowledge structures remain largely\u0000unexplored. In this paper, we analyze the internal knowledge structures of LLMs\u0000using historical medal tallies from the Olympic Games. We task the models with\u0000providing the medal counts for each team and identifying which teams achieved\u0000specific rankings. Our results reveal that while state-of-the-art LLMs perform\u0000remarkably well in reporting medal counts for individual teams, they struggle\u0000significantly with questions about specific rankings. This suggests that the\u0000internal knowledge structures of LLMs are fundamentally different from those of\u0000humans, who can easily infer rankings from known medal counts. To support\u0000further research, we publicly release our code, dataset, and model outputs.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation","authors":"Ilya Gusev","doi":"arxiv-2409.06820","DOIUrl":"https://doi.org/arxiv-2409.06820","url":null,"abstract":"We introduce a novel benchmark for evaluating the role-playing capabilities\u0000of language models. Our approach leverages language models themselves to\u0000emulate users in dynamic, multi-turn conversations and to assess the resulting\u0000dialogues. The framework consists of three main components: a player model\u0000assuming a specific character role, an interrogator model simulating user\u0000behavior, and a judge model evaluating conversation quality. We conducted\u0000experiments comparing automated evaluations with human annotations to validate\u0000our approach, demonstrating strong correlations across multiple criteria. This\u0000work provides a foundation for a robust and dynamic evaluation of model\u0000capabilities in interactive scenarios.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji
{"title":"A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio","authors":"Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.06624","DOIUrl":"https://doi.org/arxiv-2409.06624","url":null,"abstract":"Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to\u0000obtain the unfamiliar language skill or adapt into new domains. The huge\u0000training cost of CPT often asks for cautious choice of key hyper-parameters\u0000such as the mixture ratio of extra language or domain corpus. However, there is\u0000no systematic study which bridge the gap between the optimal mixture ratio and\u0000the actual model performance, and the gap between experimental scaling law and\u0000the actual deployment in the full model size. In this paper, we perform CPT on\u0000Llama-3 8B and 70B to enhance its Chinese ability. We study the optimal\u0000correlation between the Additional Language Mixture Ratio (ALMR) and the\u0000Learning Rate (LR) on the 8B size which directly indicate the optimal\u0000experimental set up. By thorough choice of hyper-parameter, and subsequent\u0000fine-tuning, the model capability is improved not only on the Chinese-related\u0000benchmark, but also some specific domains including math, coding and emotional\u0000intelligence. We deploy the final 70B version of LLM on an real-life chat\u0000system which obtain satisfying performance.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}