arXiv - CS - Computation and Language最新文献

筛选
英文 中文
LLMs + Persona-Plug = Personalized LLMs LLMs + Persona-Plug = 个性化 LLMs
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11901
Jiongnan Liu, Yutao Zhu, Shuting Wang, Xiaochi Wei, Erxue Min, Yu Lu, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou
{"title":"LLMs + Persona-Plug = Personalized LLMs","authors":"Jiongnan Liu, Yutao Zhu, Shuting Wang, Xiaochi Wei, Erxue Min, Yu Lu, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou","doi":"arxiv-2409.11901","DOIUrl":"https://doi.org/arxiv-2409.11901","url":null,"abstract":"Personalization plays a critical role in numerous language tasks and\u0000applications, since users with the same requirements may prefer diverse outputs\u0000based on their individual interests. This has led to the development of various\u0000personalized approaches aimed at adapting large language models (LLMs) to\u0000generate customized outputs aligned with user preferences. Some of them involve\u0000fine-tuning a unique personalized LLM for each user, which is too expensive for\u0000widespread application. Alternative approaches introduce personalization\u0000information in a plug-and-play manner by retrieving the user's relevant\u0000historical texts as demonstrations. However, this retrieval-based strategy may\u0000break the continuity of the user history and fail to capture the user's overall\u0000styles and patterns, hence leading to sub-optimal performance. To address these\u0000challenges, we propose a novel personalized LLM model, ours{}. It constructs a\u0000user-specific embedding for each individual by modeling all her historical\u0000contexts through a lightweight plug-in user embedder module. By attaching this\u0000embedding to the task input, LLMs can better understand and capture user habits\u0000and preferences, thereby producing more personalized outputs without tuning\u0000their own parameters. Extensive experiments on various tasks in the language\u0000model personalization (LaMP) benchmark demonstrate that the proposed model\u0000significantly outperforms existing personalized LLM approaches.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Real-Time Conversations with Minimal Training Costs 以最低的培训成本实现实时对话
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11727
Wang Xu, Shuo Wang, Weilin Zhao, Xu Han, Yukun Yan, Yudi Zhang, Zhe Tao, Zhiyuan Liu, Wanxiang Che
{"title":"Enabling Real-Time Conversations with Minimal Training Costs","authors":"Wang Xu, Shuo Wang, Weilin Zhao, Xu Han, Yukun Yan, Yudi Zhang, Zhe Tao, Zhiyuan Liu, Wanxiang Che","doi":"arxiv-2409.11727","DOIUrl":"https://doi.org/arxiv-2409.11727","url":null,"abstract":"Large language models (LLMs) have demonstrated the ability to improve human\u0000efficiency through conversational interactions. Conventional LLM-powered\u0000dialogue systems, operating on a turn-based paradigm, preclude real-time\u0000interaction during response generation. To address this limitation, researchers\u0000have proposed duplex models. These models can dynamically adapt to user input,\u0000facilitating real-time interactive feedback. However, these methods typically\u0000require substantial computational resources to acquire the ability. To reduce\u0000overhead, this paper presents a new duplex decoding approach that enhances LLMs\u0000with duplex ability, requiring minimal additional training. Specifically, our\u0000method employs parallel decoding of queries and responses in conversations,\u0000effectively implementing a channel-division-multiplexing decoding strategy.\u0000Experimental results indicate that our proposed method significantly enhances\u0000the naturalness and human-likeness of user-AI interactions with minimal\u0000training costs.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring Human and AI Values based on Generative Psychometrics with Large Language Models 基于大语言模型的生成心理测量学衡量人类和人工智能价值
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12106
Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song
{"title":"Measuring Human and AI Values based on Generative Psychometrics with Large Language Models","authors":"Haoran Ye, Yuhang Xie, Yuanyi Ren, Hanjun Fang, Xin Zhang, Guojie Song","doi":"arxiv-2409.12106","DOIUrl":"https://doi.org/arxiv-2409.12106","url":null,"abstract":"Human values and their measurement are long-standing interdisciplinary\u0000inquiry. Recent advances in AI have sparked renewed interest in this area, with\u0000large language models (LLMs) emerging as both tools and subjects of value\u0000measurement. This work introduces Generative Psychometrics for Values (GPV), an\u0000LLM-based, data-driven value measurement paradigm, theoretically grounded in\u0000text-revealed selective perceptions. We begin by fine-tuning an LLM for\u0000accurate perception-level value measurement and verifying the capability of\u0000LLMs to parse texts into perceptions, forming the core of the GPV pipeline.\u0000Applying GPV to human-authored blogs, we demonstrate its stability, validity,\u0000and superiority over prior psychological tools. Then, extending GPV to LLM\u0000value measurement, we advance the current art with 1) a psychometric\u0000methodology that measures LLM values based on their scalable and free-form\u0000outputs, enabling context-specific measurement; 2) a comparative analysis of\u0000measurement paradigms, indicating response biases of prior methods; and 3) an\u0000attempt to bridge LLM values and their safety, revealing the predictive power\u0000of different value systems and the impacts of various values on LLM safety.\u0000Through interdisciplinary efforts, we aim to leverage AI for next-generation\u0000psychometrics and psychometrics for value-aligned AI.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation 利用 LLMs 进行 API 交互:分类和合成数据生成框架
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11703
Chunliang Tao, Xiaojing Fan, Yahe Yang
{"title":"Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation","authors":"Chunliang Tao, Xiaojing Fan, Yahe Yang","doi":"arxiv-2409.11703","DOIUrl":"https://doi.org/arxiv-2409.11703","url":null,"abstract":"As Large Language Models (LLMs) advance in natural language processing, there\u0000is growing interest in leveraging their capabilities to simplify software\u0000interactions. In this paper, we propose a novel system that integrates LLMs for\u0000both classifying natural language inputs into corresponding API calls and\u0000automating the creation of sample datasets tailored to specific API functions.\u0000By classifying natural language commands, our system allows users to invoke\u0000complex software functionalities through simple inputs, improving interaction\u0000efficiency and lowering the barrier to software utilization. Our dataset\u0000generation approach also enables the efficient and systematic evaluation of\u0000different LLMs in classifying API calls, offering a practical tool for\u0000developers or business owners to assess the suitability of LLMs for customized\u0000API management. We conduct experiments on several prominent LLMs using\u0000generated sample datasets for various API functions. The results show that\u0000GPT-4 achieves a high classification accuracy of 0.996, while LLaMA-3-8B\u0000performs much worse at 0.759. These findings highlight the potential of LLMs to\u0000transform API management and validate the effectiveness of our system in\u0000guiding model testing and selection across diverse applications.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement Qwen2.5-Math 技术报告:通过自我完善建立数学专家模型
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12122
An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, Zhenru Zhang
{"title":"Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement","authors":"An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, Zhenru Zhang","doi":"arxiv-2409.12122","DOIUrl":"https://doi.org/arxiv-2409.12122","url":null,"abstract":"In this report, we present a series of math-specific large language models:\u0000Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the\u0000Qwen2.5 series lies in integrating the philosophy of self-improvement\u0000throughout the entire pipeline, from pre-training and post-training to\u0000inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized\u0000to generate large-scale, high-quality mathematical data. (2) In the\u0000post-training phase, we develop a reward model (RM) by conducting massive\u0000sampling from Qwen2-Math-Instruct. This RM is then applied to the iterative\u0000evolution of data in supervised fine-tuning (SFT). With a stronger SFT model,\u0000it's possible to iteratively train and update the RM, which in turn guides the\u0000next round of SFT data iteration. On the final SFT model, we employ the\u0000ultimate RM for reinforcement learning, resulting in the Qwen2.5-Math-Instruct.\u0000(3) Furthermore, during the inference stage, the RM is used to guide sampling,\u0000optimizing the model's performance. Qwen2.5-Math-Instruct supports both Chinese and English, and possess advanced\u0000mathematical reasoning capabilities, including Chain-of-Thought (CoT) and\u0000Tool-Integrated Reasoning (TIR). We evaluate our models on 10 mathematics\u0000datasets in both English and Chinese, such as GSM8K, MATH, GaoKao, AMC23, and\u0000AIME24, covering a range of difficulties from grade school level to math\u0000competition problems.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finetuning Language Models to Emit Linguistic Expressions of Uncertainty 微调语言模型以发出不确定性的语言表达
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12180
Arslan Chaudhry, Sridhar Thiagarajan, Dilan Gorur
{"title":"Finetuning Language Models to Emit Linguistic Expressions of Uncertainty","authors":"Arslan Chaudhry, Sridhar Thiagarajan, Dilan Gorur","doi":"arxiv-2409.12180","DOIUrl":"https://doi.org/arxiv-2409.12180","url":null,"abstract":"Large language models (LLMs) are increasingly employed in information-seeking\u0000and decision-making tasks. Despite their broad utility, LLMs tend to generate\u0000information that conflicts with real-world facts, and their persuasive style\u0000can make these inaccuracies appear confident and convincing. As a result,\u0000end-users struggle to consistently align the confidence expressed by LLMs with\u0000the accuracy of their predictions, often leading to either blind trust in all\u0000outputs or a complete disregard for their reliability. In this work, we explore\u0000supervised finetuning on uncertainty-augmented predictions as a method to\u0000develop models that produce linguistic expressions of uncertainty.\u0000Specifically, we measure the calibration of pre-trained models and then\u0000fine-tune language models to generate calibrated linguistic expressions of\u0000uncertainty. Through experiments on various question-answering datasets, we\u0000demonstrate that LLMs are well-calibrated in assessing their predictions, and\u0000supervised finetuning based on the model's own confidence leads to\u0000well-calibrated expressions of uncertainty, particularly for single-claim\u0000answers.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts MEOW:通过倒置事实进行 MEMOry 监督 LLM 解学习
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11844
Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang
{"title":"MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts","authors":"Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng, Yingchun Wang","doi":"arxiv-2409.11844","DOIUrl":"https://doi.org/arxiv-2409.11844","url":null,"abstract":"Large Language Models (LLMs) can memorize sensitive information, raising\u0000concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove\u0000this information from trained LLMs, offers a promising solution to mitigate\u0000these risks. However, previous practices face three key challenges: 1. Utility:\u0000successful unlearning often causes catastrophic collapse on unrelated tasks. 2.\u0000Efficiency: many methods either involve adding similarly sized models, which\u0000slows down unlearning or inference, or require retain data that are difficult\u0000to obtain. 3. Robustness: even effective methods may still leak data via\u0000extraction techniques. To address these challenges, we propose MEOW, a simple\u0000yet effective gradient descent-based unlearning method. Specifically, we use an\u0000offline LLM to generate a set of inverted facts. Then, we design a new metric,\u0000MEMO, to quantify memorization in LLMs. Finally, based on the signals provided\u0000by MEMO, we select the most appropriate set of inverted facts and finetune the\u0000model based on them. We evaluate MEOW on the commonly used unlearn benchmark,\u0000ToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks.\u0000Results demonstrate significant improvement of MEOW in forget quality without\u0000substantial loss in model utility. Meanwhile, MEOW does not exhibit significant\u0000degradation in NLU or NLG capabilities, and there is even a slight improvement\u0000in NLU performance.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking 同时思考和说话的大语言模型的双层训练和解码
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12059
Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji
{"title":"Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking","authors":"Ningyuan Xi, Xiaoyu Wang, Yetao Wu, Teng Chen, Qingqing Gu, Jinxian Qu, Zhonglin Jiang, Yong Chen, Luo Ji","doi":"arxiv-2409.12059","DOIUrl":"https://doi.org/arxiv-2409.12059","url":null,"abstract":"Large Language Model can reasonably understand and generate human expressions\u0000but may lack of thorough thinking and reasoning mechanisms. Recently there have\u0000been several studies which enhance the thinking ability of language models but\u0000most of them are not data-driven or training-based. In this paper, we are\u0000motivated by the cognitive mechanism in the natural world, and design a novel\u0000model architecture called TaS which allows it to first consider the thoughts\u0000and then express the response based upon the query. We design several pipelines\u0000to annotate or generate the thought contents from prompt-response samples, then\u0000add language heads in a middle layer which behaves as the thinking layer. We\u0000train the language model by the thoughts-augmented data and successfully let\u0000the thinking layer automatically generate reasonable thoughts and finally\u0000output more reasonable responses. Both qualitative examples and quantitative\u0000results validate the effectiveness and performance of TaS. Our code is\u0000available at https://anonymous.4open.science/r/TadE.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework 提取与抽象:在单一编码器-解码器框架内统一提取与抽象摘要法
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.11827
Yuping Wu, Hao Li, Hongbo Zhu, Goran Nenadic, Xiao-Jun Zeng
{"title":"Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework","authors":"Yuping Wu, Hao Li, Hongbo Zhu, Goran Nenadic, Xiao-Jun Zeng","doi":"arxiv-2409.11827","DOIUrl":"https://doi.org/arxiv-2409.11827","url":null,"abstract":"Extract-then-Abstract is a naturally coherent paradigm to conduct abstractive\u0000summarization with the help of salient information identified by the extractive\u0000model. Previous works that adopt this paradigm train the extractor and\u0000abstractor separately and introduce extra parameters to highlight the extracted\u0000salients to the abstractor, which results in error accumulation and additional\u0000training costs. In this paper, we first introduce a parameter-free highlight\u0000method into the encoder-decoder framework: replacing the encoder attention mask\u0000with a saliency mask in the cross-attention module to force the decoder to\u0000focus only on salient parts of the input. A preliminary analysis compares\u0000different highlight methods, demonstrating the effectiveness of our saliency\u0000mask. We further propose the novel extract-and-abstract paradigm, ExtAbs, which\u0000jointly and seamlessly performs Extractive and Abstractive summarization tasks\u0000within single encoder-decoder model to reduce error accumulation. In ExtAbs,\u0000the vanilla encoder is augmented to extract salients, and the vanilla decoder\u0000is modified with the proposed saliency mask to generate summaries. Built upon\u0000BART and PEGASUS, experiments on three datasets show that ExtAbs can achieve\u0000superior performance than baselines on the extractive task and performs\u0000comparable, or even better than the vanilla models on the abstractive task.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models PARAPHRASUS:评估转述检测模型的综合基准
arXiv - CS - Computation and Language Pub Date : 2024-09-18 DOI: arxiv-2409.12060
Andrianos Michail, Simon Clematide, Juri Opitz
{"title":"PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models","authors":"Andrianos Michail, Simon Clematide, Juri Opitz","doi":"arxiv-2409.12060","DOIUrl":"https://doi.org/arxiv-2409.12060","url":null,"abstract":"The task of determining whether two texts are paraphrases has long been a\u0000challenge in NLP. However, the prevailing notion of paraphrase is often quite\u0000simplistic, offering only a limited view of the vast spectrum of paraphrase\u0000phenomena. Indeed, we find that evaluating models in a paraphrase dataset can\u0000leave uncertainty about their true semantic understanding. To alleviate this,\u0000we release paraphrasus, a benchmark designed for multi-dimensional assessment\u0000of paraphrase detection models and finer model selection. We find that\u0000paraphrase detection models under a fine-grained evaluation lens exhibit\u0000trade-offs that cannot be captured through a single classification dataset.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"118 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信