从平行语料库中提取连接动词:一种混合方法

S. Choudhury, Bibekananda Kundu
{"title":"从平行语料库中提取连接动词:一种混合方法","authors":"S. Choudhury, Bibekananda Kundu","doi":"10.1109/IHCI.2012.6481852","DOIUrl":null,"url":null,"abstract":"Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach\",\"authors\":\"S. Choudhury, Bibekananda Kundu\",\"doi\":\"10.1109/IHCI.2012.6481852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.\",\"PeriodicalId\":107245,\"journal\":{\"name\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHCI.2012.6481852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

连接动词(cv)是复杂谓词的一种特殊形式,它表现为单个动词单位,但保持多词结构。在语音对语音翻译、机器翻译和词汇资源创建等自然语言处理应用中,简历发挥着重要作用。但由于它们的结构不同,检测和提取是一项具有挑战性的任务。本文提出了一种基于规则和统计的并行语料库CVs挖掘混合方法。虽然本文提出的方法已应用于孟加拉语-英语平行语料库中提取孟加拉语CVs,但只要存在词性标注器和足够数量的平行语料库,该方法同样适用于印度-雅利安语系的其他印度语言。通过对5万句孟加拉语-英语平行语料库的评估,该方法的准确率为76%,可以通过增加平行语料库中的句子对数量来提高准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach
Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信