{"title":"PARAPHRASUS:评估转述检测模型的综合基准","authors":"Andrianos Michail, Simon Clematide, Juri Opitz","doi":"arxiv-2409.12060","DOIUrl":null,"url":null,"abstract":"The task of determining whether two texts are paraphrases has long been a\nchallenge in NLP. However, the prevailing notion of paraphrase is often quite\nsimplistic, offering only a limited view of the vast spectrum of paraphrase\nphenomena. Indeed, we find that evaluating models in a paraphrase dataset can\nleave uncertainty about their true semantic understanding. To alleviate this,\nwe release paraphrasus, a benchmark designed for multi-dimensional assessment\nof paraphrase detection models and finer model selection. We find that\nparaphrase detection models under a fine-grained evaluation lens exhibit\ntrade-offs that cannot be captured through a single classification dataset.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"118 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models\",\"authors\":\"Andrianos Michail, Simon Clematide, Juri Opitz\",\"doi\":\"arxiv-2409.12060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The task of determining whether two texts are paraphrases has long been a\\nchallenge in NLP. However, the prevailing notion of paraphrase is often quite\\nsimplistic, offering only a limited view of the vast spectrum of paraphrase\\nphenomena. Indeed, we find that evaluating models in a paraphrase dataset can\\nleave uncertainty about their true semantic understanding. To alleviate this,\\nwe release paraphrasus, a benchmark designed for multi-dimensional assessment\\nof paraphrase detection models and finer model selection. We find that\\nparaphrase detection models under a fine-grained evaluation lens exhibit\\ntrade-offs that cannot be captured through a single classification dataset.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"118 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models
The task of determining whether two texts are paraphrases has long been a
challenge in NLP. However, the prevailing notion of paraphrase is often quite
simplistic, offering only a limited view of the vast spectrum of paraphrase
phenomena. Indeed, we find that evaluating models in a paraphrase dataset can
leave uncertainty about their true semantic understanding. To alleviate this,
we release paraphrasus, a benchmark designed for multi-dimensional assessment
of paraphrase detection models and finer model selection. We find that
paraphrase detection models under a fine-grained evaluation lens exhibit
trade-offs that cannot be captured through a single classification dataset.