WANLP@ACL 2019最新文献

筛选
英文 中文
Arabic Dialect Identification for Travel and Twitter Text 旅游和推特文本的阿拉伯语方言识别
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4628
Pruthwik Mishra, Vandan Mujadia
{"title":"Arabic Dialect Identification for Travel and Twitter Text","authors":"Pruthwik Mishra, Vandan Mujadia","doi":"10.18653/v1/W19-4628","DOIUrl":"https://doi.org/10.18653/v1/W19-4628","url":null,"abstract":"This paper presents the results of the experiments done as a part of MADAR Shared Task in WANLP 2019 on Arabic Fine-Grained Dialect Identification. Dialect Identification is one of the prominent tasks in the field of Natural language processing where the subsequent language modules can be improved based on it. We explored the use of different features like char, word n-gram, language model probabilities, etc on different classifiers. Results show that these features help to improve dialect classification accuracy. Results also show that traditional machine learning classifier tends to perform better when compared to neural network models on this task in a low resource setting.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116497924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
JHU System Description for the MADAR Arabic Dialect Identification Shared Task MADAR阿拉伯语方言识别共享任务的JHU系统描述
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4634
Thomas Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee
{"title":"JHU System Description for the MADAR Arabic Dialect Identification Shared Task","authors":"Thomas Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee","doi":"10.18653/v1/W19-4634","DOIUrl":"https://doi.org/10.18653/v1/W19-4634","url":null,"abstract":"Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models. We found several of these techniques provided small boosts in performance, though a simple character-level language model was a strong baseline, and a lower-order LM achieved best performance on Subtask 2. Interestingly, word embeddings provided no consistent benefit, and ensembling struggled to outperform the best component submodel. This suggests the variety of architectures are learning redundant information, and future work may focus on encouraging decorrelated learning.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124647603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Homograph Disambiguation through Selective Diacritic Restoration 通过选择性变音符恢复的同形词消歧义
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4606
Sawsan Alqahtani, Hanan Aldarmaki, Mona T. Diab
{"title":"Homograph Disambiguation through Selective Diacritic Restoration","authors":"Sawsan Alqahtani, Hanan Aldarmaki, Mona T. Diab","doi":"10.18653/v1/W19-4606","DOIUrl":"https://doi.org/10.18653/v1/W19-4606","url":null,"abstract":"Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic. Omitting diacritics leads to an increase in the number of homographs: different words with the same spelling. Diacritic restoration could theoretically help disambiguate these words, but in practice, the increase in overall sparsity leads to performance degradation in NLP applications. In this paper, we propose approaches for automatically marking a subset of words for diacritic restoration, which leads to selective homograph disambiguation. Compared to full or no diacritic restoration, these approaches yield selectively-diacritized datasets that balance sparsity and lexical disambiguation. We evaluate the various selection strategies extrinsically on several downstream applications: neural machine translation, part-of-speech tagging, and semantic textual similarity. Our experiments on Arabic show promising results, where our devised strategies on selective diacritization lead to a more balanced and consistent performance in downstream applications.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127749637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects 没有陆军,没有海军:BERT半监督学习阿拉伯方言
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4637
Chiyu Zhang, Muhammad Abdul-Mageed
{"title":"No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects","authors":"Chiyu Zhang, Muhammad Abdul-Mageed","doi":"10.18653/v1/W19-4637","DOIUrl":"https://doi.org/10.18653/v1/W19-4637","url":null,"abstract":"We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification. We develop tweet-level identification models based on GRUs and BERT in supervised and semi-supervised set-tings. We then introduce a simple, yet effective, method of porting tweet-level labels at the level of users. Our system ranks top 1 in the competition, with 71.70% macro F1 score and 77.40% accuracy.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122285904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model ArbEngVec:阿拉伯-英语跨语言词嵌入模型
WANLP@ACL 2019 Pub Date : 2019-07-28 DOI: 10.18653/v1/W19-4605
Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, D. Schwab
{"title":"ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model","authors":"Raki Lachraf, El Moatez Billah Nagoudi, Youcef Ayachi, Ahmed Abdelali, D. Schwab","doi":"10.18653/v1/W19-4605","DOIUrl":"https://doi.org/10.18653/v1/W19-4605","url":null,"abstract":"Word Embeddings (WE) are getting increasingly popular and widely applied in many Natural Language Processing (NLP) applications due to their effectiveness in capturing semantic properties of words; Machine Translation (MT), Information Retrieval (IR) and Information Extraction (IE) are among such areas. In this paper, we propose an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences. In addition, we perform both extrinsic and intrinsic evaluations for the different word embedding model variants. The extrinsic evaluation assesses the performance of models on the cross-language Semantic Textual Similarity (STS), while the intrinsic evaluation is based on the Word Translation (WT) task.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130189006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Mazajak: An Online Arabic Sentiment Analyser Mazajak:一个在线阿拉伯情绪分析器
WANLP@ACL 2019 Pub Date : 2019-05-24 DOI: 10.18653/v1/W19-4621
Ibrahim Abu Farha, Walid Magdy
{"title":"Mazajak: An Online Arabic Sentiment Analyser","authors":"Ibrahim Abu Farha, Walid Magdy","doi":"10.18653/v1/W19-4621","DOIUrl":"https://doi.org/10.18653/v1/W19-4621","url":null,"abstract":"Sentiment analysis (SA) is one of the most useful natural language processing applications. Literature is flooding with many papers and systems addressing this task, but most of the work is focused on English. In this paper, we present “Mazajak”, an online system for Arabic SA. The system is based on a deep learning model, which achieves state-of-the-art results on many Arabic dialect datasets including SemEval 2017 and ASTD. The availability of such system should assist various applications and research that rely on sentiment analysis as a tool.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134535342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
A Character Level Convolutional BiLSTM for Arabic Dialect Identification 用于阿拉伯语方言识别的字符级卷积BiLSTM
WANLP@ACL 2019 Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-4636
Mohamed S. Elaraby, A. Zahran
{"title":"A Character Level Convolutional BiLSTM for Arabic Dialect Identification","authors":"Mohamed S. Elaraby, A. Zahran","doi":"10.18653/v1/W19-4636","DOIUrl":"https://doi.org/10.18653/v1/W19-4636","url":null,"abstract":"In this paper, we describe CU-RAISA teamcontribution to the 2019Madar shared task2, which focused on Twitter User fine-grained dialect identification.Among par-ticipating teams, our system ranked the4th(with 61.54%) F1-Macro measure.Our sys-tem is trained using a character level convo-lutional bidirectional long-short-term memorynetwork trained on 2k users’ data. We showthat training on concatenated user tweets asinput is further superior to training on usertweets separately and assign user’s label on themode of user’s tweets’ predictions.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131166097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ZCU-NLP at MADAR 2019: Recognizing Arabic Dialects ZCU-NLP在MADAR 2019:识别阿拉伯语方言
WANLP@ACL 2019 Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-4623
P. Pribán, Stephen Eugene Taylor
{"title":"ZCU-NLP at MADAR 2019: Recognizing Arabic Dialects","authors":"P. Pribán, Stephen Eugene Taylor","doi":"10.18653/v1/W19-4623","DOIUrl":"https://doi.org/10.18653/v1/W19-4623","url":null,"abstract":"In this paper, we present our systems for the MADAR Shared Task: Arabic Fine-Grained Dialect Identification. The shared task consists of two subtasks. The goal of Subtask– 1 (S-1) is to detect an Arabic city dialect in a given text and the goal of Subtask–2 (S-2) is to predict the country of origin of a Twitter user by using tweets posted by the user. In S-1, our proposed systems are based on language modelling. We use language models to extract features that are later used as an input for other machine learning algorithms. We also experiment with recurrent neural networks (RNN), but these experiments showed that simpler machine learning algorithms are more successful. Our system achieves 0.658 macro F1-score and our rank is 6th out of 19 teams in S-1 and 7th in S-2 with 0.475 macro F1-score.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128891588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信