重读一种特殊语言中不认识的单词

ACL Workshop on Natural Language Processing in the Biomedical Domain Pub Date : 2002-07-11 DOI:10.3115/1118149.1118153

Pierre Zweigenbaum, N. Grabar

{"title":"重读一种特殊语言中不认识的单词","authors":"Pierre Zweigenbaum, N. Grabar","doi":"10.3115/1118149.1118153","DOIUrl":null,"url":null,"abstract":"We propose two internal methods for accenting unknown words, which both learn on a reference set of accented words the contexts of occurrence of the various accented forms of a given letter. One method is adapted from POS tagging, the other is based on finite state transducers.We show experimental results for letter e on the French version of the Medical Subject Headings thesaurus. With the best training set, the tagging method obtains a precision-recall breakeven point of 84.2±4.4% and the transducer method 83.8±4.5% (with a baseline at 64%) for the unknown words that contain this letter. A consensus combination of both increases precision to 92.0±3.7% with a recall of 75%. We perform an error analysis and discuss further steps that might help improve over the current performance.","PeriodicalId":339993,"journal":{"name":"ACL Workshop on Natural Language Processing in the Biomedical Domain","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Accenting unknown words in a specialized language\",\"authors\":\"Pierre Zweigenbaum, N. Grabar\",\"doi\":\"10.3115/1118149.1118153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose two internal methods for accenting unknown words, which both learn on a reference set of accented words the contexts of occurrence of the various accented forms of a given letter. One method is adapted from POS tagging, the other is based on finite state transducers.We show experimental results for letter e on the French version of the Medical Subject Headings thesaurus. With the best training set, the tagging method obtains a precision-recall breakeven point of 84.2±4.4% and the transducer method 83.8±4.5% (with a baseline at 64%) for the unknown words that contain this letter. A consensus combination of both increases precision to 92.0±3.7% with a recall of 75%. We perform an error analysis and discuss further steps that might help improve over the current performance.\",\"PeriodicalId\":339993,\"journal\":{\"name\":\"ACL Workshop on Natural Language Processing in the Biomedical Domain\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACL Workshop on Natural Language Processing in the Biomedical Domain\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1118149.1118153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACL Workshop on Natural Language Processing in the Biomedical Domain","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1118149.1118153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

我们提出了两种重读未知单词的内部方法，这两种方法都是在重读单词的参考集上学习给定字母的各种重读形式的出现上下文。一种方法是基于词性标注，另一种是基于有限状态传感器。我们在法语版医学主题词词典上展示字母e的实验结果。在最佳训练集下，标注法对包含该字母的未知单词的查全率盈亏平衡点为84.2±4.4%，换能器法为83.8±4.5%(基线为64%)。两者的一致性组合将精度提高到92.0±3.7%，召回率为75%。我们执行错误分析，并讨论可能有助于改进当前性能的进一步步骤。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accenting unknown words in a specialized language

We propose two internal methods for accenting unknown words, which both learn on a reference set of accented words the contexts of occurrence of the various accented forms of a given letter. One method is adapted from POS tagging, the other is based on finite state transducers.We show experimental results for letter e on the French version of the Medical Subject Headings thesaurus. With the best training set, the tagging method obtains a precision-recall breakeven point of 84.2±4.4% and the transducer method 83.8±4.5% (with a baseline at 64%) for the unknown words that contain this letter. A consensus combination of both increases precision to 92.0±3.7% with a recall of 75%. We perform an error analysis and discuss further steps that might help improve over the current performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACL Workshop on Natural Language Processing in the Biomedical Domain

自引率

0.00%

发文量