孟加拉语中主体/客体掉落自动检测

2014 International Conference on Asian Language Processing (IALP) Pub Date : 2014-12-04 DOI:10.1109/IALP.2014.6973488

Arjun Das, Utpal Garain, Apurbalal Senapati

{"title":"孟加拉语中主体/客体掉落自动检测","authors":"Arjun Das, Utpal Garain, Apurbalal Senapati","doi":"10.1109/IALP.2014.6973488","DOIUrl":null,"url":null,"abstract":"This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"8 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Automatic detection of subject/object drops in Bengali\",\"authors\":\"Arjun Das, Utpal Garain, Apurbalal Senapati\",\"doi\":\"10.1109/IALP.2014.6973488\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.\",\"PeriodicalId\":117334,\"journal\":{\"name\":\"2014 International Conference on Asian Language Processing (IALP)\",\"volume\":\"8 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Asian Language Processing (IALP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2014.6973488\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文提出了一个开创性的尝试，自动检测滴在孟加拉语。孟加拉语中的支配语素是指主语、宾语和动词语素。孟加拉语是一种亲滴语，亲滴语属于主语/宾语滴语，这是本研究的重点。检测算法使用了现成的孟加拉语NLP工具，如POS标记器、分块器和依赖解析器。最初应用简单的语言规则来快速注释8,455个句子的数据集，然后手动检查这些句子。然后使用修正后的数据集训练两个分类器，将句子分类为有滴或没有滴。之前其他研究者使用的特征已经被考虑。这两种分类器的总体性能相当。作为副产品，目前的研究产生了另一个有用的NLP资源(除了drop-annotated数据集)，即根据及物性对孟加拉语动词(881个词根动词的所有形态变体)进行分类，这反过来又被分类器用作一个特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic detection of subject/object drops in Bengali

This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Asian Language Processing (IALP)

自引率

0.00%

发文量