Automatic detection of subject/object drops in Bengali

Arjun Das, Utpal Garain, Apurbalal Senapati
{"title":"Automatic detection of subject/object drops in Bengali","authors":"Arjun Das, Utpal Garain, Apurbalal Senapati","doi":"10.1109/IALP.2014.6973488","DOIUrl":null,"url":null,"abstract":"This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"8 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973488","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.
孟加拉语中主体/客体掉落自动检测
本文提出了一个开创性的尝试,自动检测滴在孟加拉语。孟加拉语中的支配语素是指主语、宾语和动词语素。孟加拉语是一种亲滴语,亲滴语属于主语/宾语滴语,这是本研究的重点。检测算法使用了现成的孟加拉语NLP工具,如POS标记器、分块器和依赖解析器。最初应用简单的语言规则来快速注释8,455个句子的数据集,然后手动检查这些句子。然后使用修正后的数据集训练两个分类器,将句子分类为有滴或没有滴。之前其他研究者使用的特征已经被考虑。这两种分类器的总体性能相当。作为副产品,目前的研究产生了另一个有用的NLP资源(除了drop-annotated数据集),即根据及物性对孟加拉语动词(881个词根动词的所有形态变体)进行分类,这反过来又被分类器用作一个特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信