arXiv - CS - Computation and Language最新文献_第3页

A Controlled Study on Long Context Extension and Generalization in LLMs 关于语言学习者长语境扩展和泛化的对照研究

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12181

Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush

{"title":"A Controlled Study on Long Context Extension and Generalization in LLMs","authors":"Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, Zhiyong Wu, Alexander M. Rush","doi":"arxiv-2409.12181","DOIUrl":"https://doi.org/arxiv-2409.12181","url":null,"abstract":"Broad textual understanding and in-context learning require language models\u0000that utilize full document contexts. Due to the implementation challenges\u0000associated with directly training long-context models, many methods have been\u0000proposed for extending models to handle long contexts. However, owing to\u0000differences in data and model classes, it has been challenging to compare these\u0000approaches, leading to uncertainty as to how to evaluate long-context\u0000performance and whether it differs from standard evaluation. We implement a\u0000controlled protocol for extension methods with a standardized evaluation,\u0000utilizing consistent base models and extension data. Our study yields several\u0000insights into long-context behavior. First, we reaffirm the critical role of\u0000perplexity as a general-purpose performance indicator even in longer-context\u0000tasks. Second, we find that current approximate attention methods\u0000systematically underperform across long-context tasks. Finally, we confirm that\u0000exact fine-tuning based methods are generally effective within the range of\u0000their extension, whereas extrapolation remains challenging. All codebases,\u0000models, and checkpoints will be made available open-source, promoting\u0000transparency and facilitating further research in this critical area of AI\u0000development.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL 只读一次 (YORO)：学习内化数据库知识，实现文本到 SQL 的转换

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12172

Hideo Kobayashi, Wuwei Lan, Peng Shi, Shuaichen Chang, Jiang Guo, Henghui Zhu, Zhiguo Wang, Patrick Ng

引用次数: 0

Using Large Language Models to Generate Clinical Trial Tables and Figures 使用大型语言模型生成临床试验表格和图表

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12046

Yumeng Yang, Peter Krusche, Kristyn Pantoja, Cheng Shi, Ethan Ludmir, Kirk Roberts, Gen Zhu

引用次数: 0

From Lists to Emojis: How Format Bias Affects Model Alignment 从列表到表情符号：格式偏差如何影响模型对齐

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11704

Xuanchang Zhang, Wei Xiong, Lichang Chen, Tianyi Zhou, Heng Huang, Tong Zhang

{"title":"From Lists to Emojis: How Format Bias Affects Model Alignment","authors":"Xuanchang Zhang, Wei Xiong, Lichang Chen, Tianyi Zhou, Heng Huang, Tong Zhang","doi":"arxiv-2409.11704","DOIUrl":"https://doi.org/arxiv-2409.11704","url":null,"abstract":"In this paper, we study format biases in reinforcement learning from human\u0000feedback (RLHF). We observe that many widely-used preference models, including\u0000human evaluators, GPT-4, and top-ranking models on the RewardBench benchmark,\u0000exhibit strong biases towards specific format patterns, such as lists, links,\u0000bold text, and emojis. Furthermore, large language models (LLMs) can exploit\u0000these biases to achieve higher rankings on popular benchmarks like AlpacaEval\u0000and LMSYS Chatbot Arena. One notable example of this is verbosity bias, where\u0000current preference models favor longer responses that appear more\u0000comprehensive, even when their quality is equal to or lower than shorter,\u0000competing responses. However, format biases beyond verbosity remain largely\u0000underexplored in the literature. In this work, we extend the study of biases in\u0000preference learning beyond the commonly recognized length bias, offering a\u0000comprehensive analysis of a wider range of format biases. Additionally, we show\u0000that with a small amount of biased data (less than 1%), we can inject\u0000significant bias into the reward model. Moreover, these format biases can also\u0000be easily exploited by downstream alignment algorithms, such as best-of-n\u0000sampling and online iterative DPO, as it is usually easier to manipulate the\u0000format than to improve the quality of responses. Our findings emphasize the\u0000need to disentangle format and content both for designing alignment algorithms\u0000and evaluating models.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network 利用聚焦细节的分层网络增强复杂公式识别能力

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11677

Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong

引用次数: 0

Linguini: A benchmark for language-agnostic linguistic reasoning 语言学推理基准与语言无关的语言推理基准

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12126

Eduardo Sánchez, Belen Alastruey, Christophe Ropers, Pontus Stenetorp, Mikel Artetxe, Marta R. Costa-jussà

引用次数: 0

DocMamba: Efficient Document Pre-training with State Space Model DocMamba：利用状态空间模型进行高效文档预培训

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11887

Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Shuhang Liu, Jun Du, Jianshu Zhang

引用次数: 0

Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources 在合理的低计算资源条件下开发日本医学大语言模型并进行双语评估

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11783

Issey Sukeda

{"title":"Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources","authors":"Issey Sukeda","doi":"arxiv-2409.11783","DOIUrl":"https://doi.org/arxiv-2409.11783","url":null,"abstract":"The recent success of large language models (LLMs) and the scaling law has\u0000led to a widespread adoption of larger models. Particularly in the healthcare\u0000industry, there is an increasing demand for locally operated LLMs due to\u0000security concerns. However, the majority of high quality open-source LLMs have\u0000a size of 70B parameters, imposing significant financial burdens on users for\u0000GPU preparation and operation. To overcome these issues, we present a medical\u0000adaptation based on the recent 7B models, which enables the operation in low\u0000computational resources. We compare the performance on medical\u0000question-answering benchmarks in two languages (Japanese and English),\u0000demonstrating that its scores reach parity with or surpass those of currently\u0000existing medical LLMs that are ten times larger. We find that fine-tuning an\u0000English-centric base model on Japanese medical dataset improves the score in\u0000both language, supporting the effect of cross-lingual knowledge transfer. We\u0000hope that this study will alleviate financial challenges, serving as a stepping\u0000stone for clinical institutions to practically utilize LLMs locally. Our\u0000evaluation code is available at\u0000https://huggingface.co/stardust-coder/jmedllm-7b-v1.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

"A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs "女人比男人更懂文化？人格对法律硕士文化规范解释的影响

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11636

Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, Gene Louis Kim

{"title":"\"A Woman is More Culturally Knowledgeable than A Man?\": The Effect of Personas on Cultural Norm Interpretation in LLMs","authors":"Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, Gene Louis Kim","doi":"arxiv-2409.11636","DOIUrl":"https://doi.org/arxiv-2409.11636","url":null,"abstract":"As the deployment of large language models (LLMs) expands, there is an\u0000increasing demand for personalized LLMs. One method to personalize and guide\u0000the outputs of these models is by assigning a persona -- a role that describes\u0000the expected behavior of the LLM (e.g., a man, a woman, an engineer). This\u0000study investigates whether an LLM's understanding of social norms varies across\u0000assigned personas. Ideally, the perception of a social norm should remain\u0000consistent regardless of the persona, since acceptability of a social norm\u0000should be determined by the region the norm originates from, rather than by\u0000individual characteristics such as gender, body size, or race. A norm is\u0000universal within its cultural context. In our research, we tested 36 distinct\u0000personas from 12 sociodemographic categories (e.g., age, gender, beauty) across\u0000four different LLMs. We find that LLMs' cultural norm interpretation varies\u0000based on the persona used and the norm interpretation also varies within a\u0000sociodemographic category (e.g., a fat person and a thin person as in physical\u0000appearance group) where an LLM with the more socially desirable persona (e.g.,\u0000a thin person) interprets social norms more accurately than with the less\u0000socially desirable persona (e.g., a fat person). We also discuss how different\u0000types of social biases may contribute to the results that we observe.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BERT-VBD: Vietnamese Multi-Document Summarization Framework BERT-VBD：越南语多文档摘要框架

arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12134

Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong

{"title":"BERT-VBD: Vietnamese Multi-Document Summarization Framework","authors":"Tuan-Cuong Vuong, Trang Mai Xuan, Thien Van Luong","doi":"arxiv-2409.12134","DOIUrl":"https://doi.org/arxiv-2409.12134","url":null,"abstract":"In tackling the challenge of Multi-Document Summarization (MDS), numerous\u0000methods have been proposed, spanning both extractive and abstractive\u0000summarization techniques. However, each approach has its own limitations,\u0000making it less effective to rely solely on either one. An emerging and\u0000promising strategy involves a synergistic fusion of extractive and abstractive\u0000summarization methods. Despite the plethora of studies in this domain, research\u0000on the combined methodology remains scarce, particularly in the context of\u0000Vietnamese language processing. This paper presents a novel Vietnamese MDS\u0000framework leveraging a two-component pipeline architecture that integrates\u0000extractive and abstractive techniques. The first component employs an\u0000extractive approach to identify key sentences within each document. This is\u0000achieved by a modification of the pre-trained BERT network, which derives\u0000semantically meaningful phrase embeddings using siamese and triplet network\u0000structures. The second component utilizes the VBD-LLaMA2-7B-50b model for\u0000abstractive summarization, ultimately generating the final summary document.\u0000Our proposed framework demonstrates a positive performance, attaining ROUGE-2\u0000scores of 39.6% on the VN-MDS dataset and outperforming the state-of-the-art\u0000baselines.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0