Modeling the internal variability of multiword expressions through a pattern-based method

M. Nissim, Andrea Zaninello
{"title":"Modeling the internal variability of multiword expressions through a pattern-based method","authors":"M. Nissim, Andrea Zaninello","doi":"10.1145/2483691.2483696","DOIUrl":null,"url":null,"abstract":"The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text. We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2483691.2483696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text. We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.
通过基于模式的方法对多词表达式的内部可变性进行建模
多词短语的内部变异问题对多词短语的识别和提取至关重要。我们提出了一个基于语料库支持的意大利语MWEs计算研究,旨在定义一种自动建模内部变化的方法,利用频率和词性(POS)信息。为此,我们基于手动编译的字典派生xml编码的MWEs词典,然后将其投影到一个大型语料库中。由于对固定形式的搜索具有低召回率,而对引理的无约束灵活搜索会产生精度损失,因此我们建议在灵活搜索中最大限度地提高MWEs识别的精度。我们的方法建立在这样一种思想之上,即内部可变性可以通过变异模式的新颖引入来建模,这种模式可以在POS模式上工作,并且可以用作控制精度的工作工具。我们还比较了变化模式与关联度量的性能,并探索了在MWE提取中使用变化模式的可能性。最后,我们建议通过丰富编码和创建基于xml的模式存储库,将语料库派生的模式相关信息包含在原始的MWE词典中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信