基于CRF的孟加拉语名词多词短语识别

Tanmoy Chakraborty
{"title":"基于CRF的孟加拉语名词多词短语识别","authors":"Tanmoy Chakraborty","doi":"10.1109/IHCI.2012.6481823","DOIUrl":null,"url":null,"abstract":"One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge problem to a precise language processing due to their idiosyncratic nature and diversity in lexical, syntactical and semantic properties. The semantic of a MWE can be expressed transparently or opaquely after combining the semantic of its constituents. This paper deals with the identification of Nominal Multiword Expressions in the Bengali text using Conditional Random Field (CRF) machine learning technique. Bengali is highly agglutinative and morphologically rich language. Thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are proved to be very effective for running the CRF tool for the identification of Nominal MWEs. Compared to the statistical system built in Bengali language for compound noun MWEs identification, our proposed system shows higher accuracy in terms of precision, recall and F-score. We also conclude that with the identification of Reduplicated MWEs (RMWEs) and considering it as a feature makes reasonable improvement compared to the earlier system.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of Nominal Multiword Expressions in Bengali using CRF\",\"authors\":\"Tanmoy Chakraborty\",\"doi\":\"10.1109/IHCI.2012.6481823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge problem to a precise language processing due to their idiosyncratic nature and diversity in lexical, syntactical and semantic properties. The semantic of a MWE can be expressed transparently or opaquely after combining the semantic of its constituents. This paper deals with the identification of Nominal Multiword Expressions in the Bengali text using Conditional Random Field (CRF) machine learning technique. Bengali is highly agglutinative and morphologically rich language. Thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are proved to be very effective for running the CRF tool for the identification of Nominal MWEs. Compared to the statistical system built in Bengali language for compound noun MWEs identification, our proposed system shows higher accuracy in terms of precision, recall and F-score. We also conclude that with the identification of Reduplicated MWEs (RMWEs) and considering it as a feature makes reasonable improvement compared to the earlier system.\",\"PeriodicalId\":107245,\"journal\":{\"name\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHCI.2012.6481823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多词表达式的正确处理是自然语言理解和生成的关键问题之一。MWEs由于其在词汇、句法和语义等方面的独特性和多样性,给语言的精确处理带来了巨大的困难。MWE的语义可以在其组成部分的语义组合后透明或不透明地表达。本文利用条件随机场(CRF)机器学习技术研究了孟加拉语文本中标称多词表达式的识别问题。孟加拉语是一种高度粘连且词形丰富的语言。因此,在运行CRF工具来识别标称MWEs时,周围词、词性标记、前缀、后缀、长度等特征的选择被证明是非常有效的。与用孟加拉语构建的复合名词MWEs识别统计系统相比,我们的系统在准确率、查全率和f分上都有更高的准确率。我们还得出结论,与早期的系统相比,通过识别重复的MWEs (RMWEs)并将其作为一个特征进行改进是合理的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identification of Nominal Multiword Expressions in Bengali using CRF
One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge problem to a precise language processing due to their idiosyncratic nature and diversity in lexical, syntactical and semantic properties. The semantic of a MWE can be expressed transparently or opaquely after combining the semantic of its constituents. This paper deals with the identification of Nominal Multiword Expressions in the Bengali text using Conditional Random Field (CRF) machine learning technique. Bengali is highly agglutinative and morphologically rich language. Thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are proved to be very effective for running the CRF tool for the identification of Nominal MWEs. Compared to the statistical system built in Bengali language for compound noun MWEs identification, our proposed system shows higher accuracy in terms of precision, recall and F-score. We also conclude that with the identification of Reduplicated MWEs (RMWEs) and considering it as a feature makes reasonable improvement compared to the earlier system.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信