arXiv - CS - Computation and Language最新文献

筛选
英文 中文
Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution 用于文体分析和可解释作者归属的潜空间解释法
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07072
Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, Kathleen McKeown
{"title":"Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution","authors":"Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, Kathleen McKeown","doi":"arxiv-2409.07072","DOIUrl":"https://doi.org/arxiv-2409.07072","url":null,"abstract":"Recent state-of-the-art authorship attribution methods learn authorship\u0000representations of texts in a latent, non-interpretable space, hindering their\u0000usability in real-world applications. Our work proposes a novel approach to\u0000interpreting these learned embeddings by identifying representative points in\u0000the latent space and utilizing LLMs to generate informative natural language\u0000descriptions of the writing style of each point. We evaluate the alignment of\u0000our interpretable space with the latent one and find that it achieves the best\u0000prediction agreement compared to other baselines. Additionally, we conduct a\u0000human evaluation to assess the quality of these style descriptions, validating\u0000their utility as explanations for the latent space. Finally, we investigate\u0000whether human performance on the challenging AA task improves when aided by our\u0000system's explanations, finding an average improvement of around +20% in\u0000accuracy.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimulBench: Evaluating Language Models with Creative Simulation Tasks SimulBench:用创意模拟任务评估语言模型
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07641
Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin
{"title":"SimulBench: Evaluating Language Models with Creative Simulation Tasks","authors":"Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin","doi":"arxiv-2409.07641","DOIUrl":"https://doi.org/arxiv-2409.07641","url":null,"abstract":"We introduce SimulBench, a benchmark designed to evaluate large language\u0000models (LLMs) across a diverse collection of creative simulation scenarios,\u0000such as acting as a Linux terminal or playing text games with users. While\u0000these simulation tasks serve as effective measures of an LLM's general\u0000intelligence, they are seldom incorporated into existing benchmarks. A major\u0000challenge is to develop an evaluation framework for testing different LLMs\u0000fairly while preserving the multi-round interactive nature of simulation tasks\u0000between users and AI. To tackle this issue, we suggest using a fixed LLM as a\u0000user agent to engage with an LLM to collect dialogues first under different\u0000tasks. Then, challenging dialogue scripts are extracted for evaluating\u0000different target LLMs. To facilitate automatic assessment on DataName{}, GPT-4\u0000is employed as the evaluator, tasked with reviewing the quality of the final\u0000response generated by the target LLMs given multi-turn dialogue scripts. Our\u0000comprehensive experiments indicate that these simulation tasks continue to pose\u0000a significant challenge with their unique natures and show the gap between\u0000proprietary models and the most advanced open LLMs. For example, GPT-4-turbo\u0000outperforms LLaMA-3-70b-Chat on 18.55% more cases.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization 解释、辩论、对齐:从弱到强的语言模型泛化框架
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07335
Mehrdad Zakershahrak, Samira Ghodratnama
{"title":"Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization","authors":"Mehrdad Zakershahrak, Samira Ghodratnama","doi":"arxiv-2409.07335","DOIUrl":"https://doi.org/arxiv-2409.07335","url":null,"abstract":"The rapid advancement of artificial intelligence systems has brought the\u0000challenge of AI alignment to the forefront of research, particularly in complex\u0000decision-making and task execution. As these systems surpass human-level\u0000performance in sophisticated problems, ensuring their alignment with human\u0000values, intentions, and ethical guidelines becomes crucial. Building on\u0000previous work in explanation generation for human-agent alignment, we address\u0000the more complex dynamics of multi-agent systems and human-AI teams. This paper\u0000introduces a novel approach to model alignment through weak-to-strong\u0000generalization in the context of language models. We present a framework where\u0000a strong model facilitates the improvement of a weaker model, bridging the gap\u0000between explanation generation and model alignment. Our method, formalized as a\u0000facilitation function, allows for the transfer of capabilities from advanced\u0000models to less capable ones without direct access to extensive training data.\u0000Our results suggest that this facilitation-based approach not only enhances\u0000model performance but also provides insights into the nature of model alignment\u0000and the potential for scalable oversight of AI systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gated Slot Attention for Efficient Linear-Time Sequence Modeling 用于高效线性时序建模的门控插槽注意力
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07146
Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu
{"title":"Gated Slot Attention for Efficient Linear-Time Sequence Modeling","authors":"Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu","doi":"arxiv-2409.07146","DOIUrl":"https://doi.org/arxiv-2409.07146","url":null,"abstract":"Linear attention Transformers and their gated variants, celebrated for\u0000enabling parallel training and efficient recurrent inference, still fall short\u0000in recall-intensive tasks compared to traditional Transformers and demand\u0000significant resources for training from scratch. This paper introduces Gated\u0000Slot Attention (GSA), which enhances Attention with Bounded-memory-Control\u0000(ABC) by incorporating a gating mechanism inspired by Gated Linear Attention\u0000(GLA). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing\u0000context-aware memory reading and adaptive forgetting to improve memory capacity\u0000while maintaining compact recurrent state size. This design greatly enhances\u0000both training and inference efficiency through GLA's hardware-efficient\u0000training algorithm and reduced state size. Additionally, retaining the softmax\u0000operation is particularly beneficial in \"finetuning pretrained Transformers to\u0000RNNs\" (T2R) settings, reducing the need for extensive training from scratch.\u0000Extensive experiments confirm GSA's superior performance in scenarios requiring\u0000in-context recall and in T2R settings.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Generative Agents to Create Tip Sheets for Investigative Data Reporting 使用生成代理为调查数据报告创建提示表
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07286
Joris Veerbeek, Nicholas Diakopoulos
{"title":"Using Generative Agents to Create Tip Sheets for Investigative Data Reporting","authors":"Joris Veerbeek, Nicholas Diakopoulos","doi":"arxiv-2409.07286","DOIUrl":"https://doi.org/arxiv-2409.07286","url":null,"abstract":"This paper introduces a system using generative AI agents to create tip\u0000sheets for investigative data reporting. Our system employs three specialized\u0000agents--an analyst, a reporter, and an editor--to collaboratively generate and\u0000refine tips from datasets. We validate this approach using real-world\u0000investigative stories, demonstrating that our agent-based system generally\u0000generates more newsworthy and accurate insights compared to a baseline model\u0000without agents, although some variability was noted between different stories.\u0000Our findings highlight the potential of generative AI to provide leads for\u0000investigative data reporting.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective 多模态情感计算的最新趋势:从 NLP 角度进行的调查
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07388
Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai
{"title":"Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective","authors":"Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai","doi":"arxiv-2409.07388","DOIUrl":"https://doi.org/arxiv-2409.07388","url":null,"abstract":"Multimodal affective computing (MAC) has garnered increasing attention due to\u0000its broad applications in analyzing human behaviors and intentions, especially\u0000in text-dominated multimodal affective computing field. This survey presents\u0000the recent trends of multimodal affective computing from NLP perspective\u0000through four hot tasks: multimodal sentiment analysis, multimodal emotion\u0000recognition in conversation, multimodal aspect-based sentiment analysis and\u0000multimodal multi-label emotion recognition. The goal of this survey is to\u0000explore the current landscape of multimodal affective research, identify\u0000development trends, and highlight the similarities and differences across\u0000various tasks, offering a comprehensive report on the recent progress in\u0000multimodal affective computing from an NLP perspective. This survey covers the\u0000formalization of tasks, provides an overview of relevant works, describes\u0000benchmark datasets, and details the evaluation metrics for each task.\u0000Additionally, it briefly discusses research in multimodal affective computing\u0000involving facial expressions, acoustic signals, physiological signals, and\u0000emotion causes. Additionally, we discuss the technical approaches, challenges,\u0000and future directions in multimodal affective computing. To support further\u0000research, we released a repository that compiles related works in multimodal\u0000affective computing, providing detailed resources and references for the\u0000community.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling 你有十三个小时来解开迷宫:用函数调用增强人工智能游戏大师的能力
arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.06949
Jaewoo Song, Andrew Zhu, Chris Callison-Burch
{"title":"You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling","authors":"Jaewoo Song, Andrew Zhu, Chris Callison-Burch","doi":"arxiv-2409.06949","DOIUrl":"https://doi.org/arxiv-2409.06949","url":null,"abstract":"Developing a consistent and reliable AI game master for text-based games is a\u0000challenging task due to the limitations of large language models (LLMs) and the\u0000complexity of the game master's role. This paper presents a novel approach to\u0000enhance AI game masters by leveraging function calling in the context of the\u0000table-top role-playing game \"Jim Henson's Labyrinth: The Adventure Game.\" Our\u0000methodology involves integrating game-specific controls through functions,\u0000which we show improves the narrative quality and state update consistency of\u0000the AI game master. The experimental results, based on human evaluations and\u0000unit tests, demonstrate the effectiveness of our approach in enhancing gameplay\u0000experience and maintaining coherence with the game state. This work contributes\u0000to the advancement of game AI and interactive storytelling, offering insights\u0000into the design of more engaging and consistent AI-driven game masters.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games 从奥运会角度质疑大型语言模型的内部知识结构
arXiv - CS - Computation and Language Pub Date : 2024-09-10 DOI: arxiv-2409.06518
Juhwan Choi, YoungBin Kim
{"title":"Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games","authors":"Juhwan Choi, YoungBin Kim","doi":"arxiv-2409.06518","DOIUrl":"https://doi.org/arxiv-2409.06518","url":null,"abstract":"Large language models (LLMs) have become a dominant approach in natural\u0000language processing, yet their internal knowledge structures remain largely\u0000unexplored. In this paper, we analyze the internal knowledge structures of LLMs\u0000using historical medal tallies from the Olympic Games. We task the models with\u0000providing the medal counts for each team and identifying which teams achieved\u0000specific rankings. Our results reveal that while state-of-the-art LLMs perform\u0000remarkably well in reporting medal counts for individual teams, they struggle\u0000significantly with questions about specific rankings. This suggests that the\u0000internal knowledge structures of LLMs are fundamentally different from those of\u0000humans, who can easily infer rankings from known medal counts. To support\u0000further research, we publicly release our code, dataset, and model outputs.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation 乒乓球:带用户模拟和多模型评估的角色扮演语言模型基准
arXiv - CS - Computation and Language Pub Date : 2024-09-10 DOI: arxiv-2409.06820
Ilya Gusev
{"title":"PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation","authors":"Ilya Gusev","doi":"arxiv-2409.06820","DOIUrl":"https://doi.org/arxiv-2409.06820","url":null,"abstract":"We introduce a novel benchmark for evaluating the role-playing capabilities\u0000of language models. Our approach leverages language models themselves to\u0000emulate users in dynamic, multi-turn conversations and to assess the resulting\u0000dialogues. The framework consists of three main components: a player model\u0000assuming a specific character role, an interrogator model simulating user\u0000behavior, and a judge model evaluating conversation quality. We conducted\u0000experiments comparing automated evaluations with human annotations to validate\u0000our approach, demonstrating strong correlations across multiple criteria. This\u0000work provides a foundation for a robust and dynamic evaluation of model\u0000capabilities in interactive scenarios.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio 优化选择附加语言混合比例的 Llama-3 70B 后期培训实践
arXiv - CS - Computation and Language Pub Date : 2024-09-10 DOI: arxiv-2409.06624
Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji
{"title":"A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio","authors":"Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.06624","DOIUrl":"https://doi.org/arxiv-2409.06624","url":null,"abstract":"Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to\u0000obtain the unfamiliar language skill or adapt into new domains. The huge\u0000training cost of CPT often asks for cautious choice of key hyper-parameters\u0000such as the mixture ratio of extra language or domain corpus. However, there is\u0000no systematic study which bridge the gap between the optimal mixture ratio and\u0000the actual model performance, and the gap between experimental scaling law and\u0000the actual deployment in the full model size. In this paper, we perform CPT on\u0000Llama-3 8B and 70B to enhance its Chinese ability. We study the optimal\u0000correlation between the Additional Language Mixture Ratio (ALMR) and the\u0000Learning Rate (LR) on the 8B size which directly indicate the optimal\u0000experimental set up. By thorough choice of hyper-parameter, and subsequent\u0000fine-tuning, the model capability is improved not only on the Chinese-related\u0000benchmark, but also some specific domains including math, coding and emotional\u0000intelligence. We deploy the final 70B version of LLM on an real-life chat\u0000system which obtain satisfying performance.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信