基于边界的文本分割

NUT@EMNLP Pub Date : 2016-08-05 DOI:10.18653/v1/W17-4401

J. Williams

{"title":"基于边界的文本分割","authors":"J. Williams","doi":"10.18653/v1/W17-4401","DOIUrl":null,"url":null,"abstract":"This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.","PeriodicalId":207795,"journal":{"name":"NUT@EMNLP","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Boundary-based MWE segmentation with text partitioning\",\"authors\":\"J. Williams\",\"doi\":\"10.18653/v1/W17-4401\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.\",\"PeriodicalId\":207795,\"journal\":{\"name\":\"NUT@EMNLP\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NUT@EMNLP\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W17-4401\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NUT@EMNLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W17-4401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文描述了一种用于全面MWE分割任务的细粒度文本分块算法的开发。这项任务特别侧重于口语和习惯语言的识别。提交的文件还包括在两个最近共享任务的背景下进行全面的模型评估，这些任务跨越19种不同的语言和许多文本域，包括嘈杂的、用户生成的文本。评估表明，所提出的模型是MWE分割目的的最佳整体，并且开源软件随提交一起发布(尽管出于匿名目的，链接被保留)。此外，作者承认在arxiv.org上存在预印文件，应该避免在审查中保持匿名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boundary-based MWE segmentation with text partitioning

This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

NUT@EMNLP

自引率

0.00%

发文量