arXiv - CS - Computation and Language最新文献_第10页

Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution 用于文体分析和可解释作者归属的潜空间解释法

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07072

Milad Alshomary, Narutatsu Ri, Marianna Apidianaki, Ajay Patel, Smaranda Muresan, Kathleen McKeown

引用次数: 0

SimulBench: Evaluating Language Models with Creative Simulation Tasks SimulBench：用创意模拟任务评估语言模型

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07641

Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin

{"title":"SimulBench: Evaluating Language Models with Creative Simulation Tasks","authors":"Qi Jia, Xiang Yue, Tianyu Zheng, Jie Huang, Bill Yuchen Lin","doi":"arxiv-2409.07641","DOIUrl":"https://doi.org/arxiv-2409.07641","url":null,"abstract":"We introduce SimulBench, a benchmark designed to evaluate large language\u0000models (LLMs) across a diverse collection of creative simulation scenarios,\u0000such as acting as a Linux terminal or playing text games with users. While\u0000these simulation tasks serve as effective measures of an LLM's general\u0000intelligence, they are seldom incorporated into existing benchmarks. A major\u0000challenge is to develop an evaluation framework for testing different LLMs\u0000fairly while preserving the multi-round interactive nature of simulation tasks\u0000between users and AI. To tackle this issue, we suggest using a fixed LLM as a\u0000user agent to engage with an LLM to collect dialogues first under different\u0000tasks. Then, challenging dialogue scripts are extracted for evaluating\u0000different target LLMs. To facilitate automatic assessment on DataName{}, GPT-4\u0000is employed as the evaluator, tasked with reviewing the quality of the final\u0000response generated by the target LLMs given multi-turn dialogue scripts. Our\u0000comprehensive experiments indicate that these simulation tasks continue to pose\u0000a significant challenge with their unique natures and show the gap between\u0000proprietary models and the most advanced open LLMs. For example, GPT-4-turbo\u0000outperforms LLaMA-3-70b-Chat on 18.55% more cases.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization 解释、辩论、对齐：从弱到强的语言模型泛化框架

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07335

Mehrdad Zakershahrak, Samira Ghodratnama

{"title":"Explanation, Debate, Align: A Weak-to-Strong Framework for Language Model Generalization","authors":"Mehrdad Zakershahrak, Samira Ghodratnama","doi":"arxiv-2409.07335","DOIUrl":"https://doi.org/arxiv-2409.07335","url":null,"abstract":"The rapid advancement of artificial intelligence systems has brought the\u0000challenge of AI alignment to the forefront of research, particularly in complex\u0000decision-making and task execution. As these systems surpass human-level\u0000performance in sophisticated problems, ensuring their alignment with human\u0000values, intentions, and ethical guidelines becomes crucial. Building on\u0000previous work in explanation generation for human-agent alignment, we address\u0000the more complex dynamics of multi-agent systems and human-AI teams. This paper\u0000introduces a novel approach to model alignment through weak-to-strong\u0000generalization in the context of language models. We present a framework where\u0000a strong model facilitates the improvement of a weaker model, bridging the gap\u0000between explanation generation and model alignment. Our method, formalized as a\u0000facilitation function, allows for the transfer of capabilities from advanced\u0000models to less capable ones without direct access to extensive training data.\u0000Our results suggest that this facilitation-based approach not only enhances\u0000model performance but also provides insights into the nature of model alignment\u0000and the potential for scalable oversight of AI systems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gated Slot Attention for Efficient Linear-Time Sequence Modeling 用于高效线性时序建模的门控插槽注意力

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07146

Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu

{"title":"Gated Slot Attention for Efficient Linear-Time Sequence Modeling","authors":"Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu","doi":"arxiv-2409.07146","DOIUrl":"https://doi.org/arxiv-2409.07146","url":null,"abstract":"Linear attention Transformers and their gated variants, celebrated for\u0000enabling parallel training and efficient recurrent inference, still fall short\u0000in recall-intensive tasks compared to traditional Transformers and demand\u0000significant resources for training from scratch. This paper introduces Gated\u0000Slot Attention (GSA), which enhances Attention with Bounded-memory-Control\u0000(ABC) by incorporating a gating mechanism inspired by Gated Linear Attention\u0000(GLA). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing\u0000context-aware memory reading and adaptive forgetting to improve memory capacity\u0000while maintaining compact recurrent state size. This design greatly enhances\u0000both training and inference efficiency through GLA's hardware-efficient\u0000training algorithm and reduced state size. Additionally, retaining the softmax\u0000operation is particularly beneficial in \"finetuning pretrained Transformers to\u0000RNNs\" (T2R) settings, reducing the need for extensive training from scratch.\u0000Extensive experiments confirm GSA's superior performance in scenarios requiring\u0000in-context recall and in T2R settings.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Generative Agents to Create Tip Sheets for Investigative Data Reporting 使用生成代理为调查数据报告创建提示表

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07286

Joris Veerbeek, Nicholas Diakopoulos

引用次数: 0

Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective 多模态情感计算的最新趋势：从 NLP 角度进行的调查

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.07388

Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai

{"title":"Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective","authors":"Guimin Hu, Yi Xin, Weimin Lyu, Haojian Huang, Chang Sun, Zhihong Zhu, Lin Gui, Ruichu Cai","doi":"arxiv-2409.07388","DOIUrl":"https://doi.org/arxiv-2409.07388","url":null,"abstract":"Multimodal affective computing (MAC) has garnered increasing attention due to\u0000its broad applications in analyzing human behaviors and intentions, especially\u0000in text-dominated multimodal affective computing field. This survey presents\u0000the recent trends of multimodal affective computing from NLP perspective\u0000through four hot tasks: multimodal sentiment analysis, multimodal emotion\u0000recognition in conversation, multimodal aspect-based sentiment analysis and\u0000multimodal multi-label emotion recognition. The goal of this survey is to\u0000explore the current landscape of multimodal affective research, identify\u0000development trends, and highlight the similarities and differences across\u0000various tasks, offering a comprehensive report on the recent progress in\u0000multimodal affective computing from an NLP perspective. This survey covers the\u0000formalization of tasks, provides an overview of relevant works, describes\u0000benchmark datasets, and details the evaluation metrics for each task.\u0000Additionally, it briefly discusses research in multimodal affective computing\u0000involving facial expressions, acoustic signals, physiological signals, and\u0000emotion causes. Additionally, we discuss the technical approaches, challenges,\u0000and future directions in multimodal affective computing. To support further\u0000research, we released a repository that compiles related works in multimodal\u0000affective computing, providing detailed resources and references for the\u0000community.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling 你有十三个小时来解开迷宫：用函数调用增强人工智能游戏大师的能力

arXiv - CS - Computation and Language Pub Date : 2024-09-11 DOI: arxiv-2409.06949

Jaewoo Song, Andrew Zhu, Chris Callison-Burch

引用次数: 0

Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games 从奥运会角度质疑大型语言模型的内部知识结构

arXiv - CS - Computation and Language Pub Date : 2024-09-10 DOI: arxiv-2409.06518

Juhwan Choi, YoungBin Kim

引用次数: 0

PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation 乒乓球：带用户模拟和多模型评估的角色扮演语言模型基准

arXiv - CS - Computation and Language Pub Date : 2024-09-10 DOI: arxiv-2409.06820

Ilya Gusev

引用次数: 0

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio 优化选择附加语言混合比例的 Llama-3 70B 后期培训实践

arXiv - CS - Computation and Language Pub Date : 2024-09-10 DOI: arxiv-2409.06624

Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji

{"title":"A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio","authors":"Ningyuan Xi, Yetao Wu, Kun Fan, Teng Chen, Qingqing Gu, Peng Yu, Jinxian Qu, Chenxi Liu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.06624","DOIUrl":"https://doi.org/arxiv-2409.06624","url":null,"abstract":"Large Language Models (LLM) often needs to be Continual Pre-Trained (CPT) to\u0000obtain the unfamiliar language skill or adapt into new domains. The huge\u0000training cost of CPT often asks for cautious choice of key hyper-parameters\u0000such as the mixture ratio of extra language or domain corpus. However, there is\u0000no systematic study which bridge the gap between the optimal mixture ratio and\u0000the actual model performance, and the gap between experimental scaling law and\u0000the actual deployment in the full model size. In this paper, we perform CPT on\u0000Llama-3 8B and 70B to enhance its Chinese ability. We study the optimal\u0000correlation between the Additional Language Mixture Ratio (ALMR) and the\u0000Learning Rate (LR) on the 8B size which directly indicate the optimal\u0000experimental set up. By thorough choice of hyper-parameter, and subsequent\u0000fine-tuning, the model capability is improved not only on the Chinese-related\u0000benchmark, but also some specific domains including math, coding and emotional\u0000intelligence. We deploy the final 70B version of LLM on an real-life chat\u0000system which obtain satisfying performance.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142184362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0