从平行语料库中提取连接动词:一种混合方法

2012 4th International Conference on Intelligent Human Computer Interaction (IHCI) Pub Date : 2012-12-01 DOI:10.1109/IHCI.2012.6481852

S. Choudhury, Bibekananda Kundu

{"title":"从平行语料库中提取连接动词:一种混合方法","authors":"S. Choudhury, Bibekananda Kundu","doi":"10.1109/IHCI.2012.6481852","DOIUrl":null,"url":null,"abstract":"Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.","PeriodicalId":107245,"journal":{"name":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach\",\"authors\":\"S. Choudhury, Bibekananda Kundu\",\"doi\":\"10.1109/IHCI.2012.6481852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.\",\"PeriodicalId\":107245,\"journal\":{\"name\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IHCI.2012.6481852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IHCI.2012.6481852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

连接动词(cv)是复杂谓词的一种特殊形式，它表现为单个动词单位，但保持多词结构。在语音对语音翻译、机器翻译和词汇资源创建等自然语言处理应用中，简历发挥着重要作用。但由于它们的结构不同，检测和提取是一项具有挑战性的任务。本文提出了一种基于规则和统计的并行语料库CVs挖掘混合方法。虽然本文提出的方法已应用于孟加拉语-英语平行语料库中提取孟加拉语CVs，但只要存在词性标注器和足够数量的平行语料库，该方法同样适用于印度-雅利安语系的其他印度语言。通过对5万句孟加拉语-英语平行语料库的评估，该方法的准确率为76%，可以通过增加平行语料库中的句子对数量来提高准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CONVEX: Conjunct Verb extraction from parallel corpus: A hybrid approach

Conjunct Verbs (CVs) are one of the special forms of Complex Predicates that behave as a single verbal unit but maintain a multiword structure. CVs play an important role in Natural Language Processing applications like Speech to Speech Translation, Machine Translation and lexical resource creation. But due to their distinct construction, detection and extraction of CVs is a challenging task. This paper presents a hybrid approach for mining CVs from parallel corpus combining rule-based and statistical approach. Though the proposed approach has been applied on Bangla-English parallel corpus to extract Bangla CVs, the methodology is equally applicable to other Indian languages of Indo-Aryan family, in presence of parts of speech tagger and sufficient amount of parallel corpus. Evaluation on Bangla-English parallel corpus of 50,000 sentences, the proposed approach yields an accuracy of 76% that can be improved by increasing the number of sentence pairs in the parallel corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 4th International Conference on Intelligent Human Computer Interaction (IHCI)

自引率

0.00%

发文量