POS Tagging for Improving Code-Switching Identification in Arabic

Mohammed A. Attia, Younes Samih, Ali El-Kahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
{"title":"POS Tagging for Improving Code-Switching Identification in Arabic","authors":"Mohammed A. Attia, Younes Samih, Ali El-Kahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish","doi":"10.18653/v1/W19-4603","DOIUrl":null,"url":null,"abstract":"When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language. This paper explores the possibility of utilizing this pattern in improving code-switching identification between Modern Standard Arabic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the POS signal in word-level code-switching identification. We build a deep learning model enriched with linguistic features (including POS tags) that outperforms the state-of-the-art results by 1.9% on the development set and 1.0% on the test set. We also show that in intra-sentential code-switching, the selection of lexical items is constrained by POS categories, where function words tend to come more often from the dialectal language while the majority of content words come from the standard language.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WANLP@ACL 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-4603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language. This paper explores the possibility of utilizing this pattern in improving code-switching identification between Modern Standard Arabic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the POS signal in word-level code-switching identification. We build a deep learning model enriched with linguistic features (including POS tags) that outperforms the state-of-the-art results by 1.9% on the development set and 1.0% on the test set. We also show that in intra-sentential code-switching, the selection of lexical items is constrained by POS categories, where function words tend to come more often from the dialectal language while the majority of content words come from the standard language.
改进阿拉伯语语码转换识别的词性标注
当使用者在母语和第二语言或语言变体之间进行代码转换时,他们遵循一种句法模式,即从嵌入语言中插入单词和短语到矩阵语言中。本文探讨了利用这一模式提高现代标准阿拉伯语(MSA)和埃及阿拉伯语(EA)之间码转换识别的可能性。我们试图回答词频信号在字级码交换识别中的强度问题。我们建立了一个丰富了语言特征(包括POS标签)的深度学习模型,在开发集和测试集上分别比最先进的结果高出1.9%和1.0%。在句内语码转换过程中,词类的选择受到词类类别的限制,虚词往往来自方言,而实词大多来自标准语言。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信