WANLP@ACL 2019最新文献

筛选
英文 中文
MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge) MICHAEL:挖掘阿拉伯语方言识别的字符级模式(MADAR挑战)
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4627
Dhaou Ghoul, Gaël Lejeune
{"title":"MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)","authors":"Dhaou Ghoul, Gaël Lejeune","doi":"10.18653/v1/W19-4627","DOIUrl":"https://doi.org/10.18653/v1/W19-4627","url":null,"abstract":"We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25% with 1<=N<=3 but showed a much better result with character 4-grams (62.17% accuracy).","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130369960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Assessing Arabic Weblog Credibility via Deep Co-learning 通过深度共同学习评估阿拉伯博客的可信度
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4614
Chadi Helwe, Shady Elbassuoni, A. Zaatari, W. El-Hajj
{"title":"Assessing Arabic Weblog Credibility via Deep Co-learning","authors":"Chadi Helwe, Shady Elbassuoni, A. Zaatari, W. El-Hajj","doi":"10.18653/v1/W19-4614","DOIUrl":"https://doi.org/10.18653/v1/W19-4614","url":null,"abstract":"Assessing the credibility of online content has garnered a lot of attention lately. We focus on one such type of online content, namely weblogs or blogs for short. Some recent work attempted the task of automatically assessing the credibility of blogs, typically via machine learning. However, in the case of Arabic blogs, there are hardly any datasets available that can be used to train robust machine learning models for this difficult task. To overcome the lack of sufficient training data, we propose deep co-learning, a semi-supervised end-to-end deep learning approach to assess the credibility of Arabic blogs. In deep co-learning, multiple weak deep neural network classifiers are trained using a small labeled dataset, and each using a different view of the data. Each one of these classifiers is then used to classify unlabeled data, and its prediction is used to train the other classifiers in a semi-supervised fashion. We evaluate our deep co-learning approach on an Arabic blogs dataset, and we report significant improvements in performance compared to many baselines including fully-supervised deep learning models as well as ensemble models.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125648976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings 低资源环境下神经机器翻译的增量域自适应
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4601
M. Kalimuthu, Michael Barz, Daniel Sonntag
{"title":"Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings","authors":"M. Kalimuthu, Michael Barz, Daniel Sonntag","doi":"10.18653/v1/W19-4601","DOIUrl":"https://doi.org/10.18653/v1/W19-4601","url":null,"abstract":"We study the problem of incremental domain adaptation of a generic neural machine translation model with limited resources (e.g., budget and time) for human translations or model training. In this paper, we propose a novel query strategy for selecting “unlabeled” samples from a new domain based on sentence embeddings for Arabic. We accelerate the fine-tuning process of the generic model to the target domain. Specifically, our approach estimates the informativeness of instances from the target domain by comparing the distance of their sentence embeddings to embeddings from the generic domain. We perform machine translation experiments (Ar-to-En direction) for comparing a random sampling baseline with our new approach, similar to active learning, using two small update sets for simulating the work of human translators. For the prescribed setting we can save more than 50% of the annotation costs without loss in quality, demonstrating the effectiveness of our approach.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131265671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Segmentation for Domain Adaptation in Arabic 阿拉伯语领域自适应的分割方法
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4613
Mohammed A. Attia, Ali El-Kahky
{"title":"Segmentation for Domain Adaptation in Arabic","authors":"Mohammed A. Attia, Ali El-Kahky","doi":"10.18653/v1/W19-4613","DOIUrl":"https://doi.org/10.18653/v1/W19-4613","url":null,"abstract":"Segmentation serves as an integral part in many NLP applications including Machine Translation, Parsing, and Information Retrieval. When a model trained on the standard language is applied to dialects, the accuracy drops dramatically. However, there are more lexical items shared by the standard language and dialects than can be found by mere surface word matching. This shared lexicon is obscured by a lot of cliticization, gemination, and character repetition. In this paper, we prove that segmentation and base normalization of dialects can help in domain adaptation by reducing data sparseness. Segmentation will improve a system performance by reducing the number of OOVs, help isolate the differences and allow better utilization of the commonalities. We show that adding a small amount of dialectal segmentation training data reduced OOVs by 5% and remarkably improves POS tagging for dialects by 7.37% f-score, even though no dialect-specific POS training data is included.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116952689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
LIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared Task LIUM-MIRACL参与MADAR阿拉伯语方言识别共享任务
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4625
Saméh Kchaou, Fethi Bougares, Lamia Hadrich Belguith
{"title":"LIUM-MIRACL Participation in the MADAR Arabic Dialect Identification Shared Task","authors":"Saméh Kchaou, Fethi Bougares, Lamia Hadrich Belguith","doi":"10.18653/v1/W19-4625","DOIUrl":"https://doi.org/10.18653/v1/W19-4625","url":null,"abstract":"This paper describes the joint participation of the LIUM and MIRACL Laboratories at the Arabic dialect identification challenge of the MADAR Shared Task (Bouamor et al., 2019) conducted during the Fourth Arabic Natural Language Processing Workshop (WANLP 2019). We participated to the Travel Domain Dialect Identification subtask. We built several systems and explored different techniques including conventional machine learning methods and deep learning algorithms. Deep learning approaches did not perform well on this task. We experimented several classification systems and we were able to identify the dialect of an input sentence with an F1-score of 65.41% on the official test set using only the training data supplied by the shared task organizers.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117012398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects 无词对齐的En-Ar双语词嵌入:影响因素
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4611
T. Alqaisi, Simon E. M. O'Keefe
{"title":"En-Ar Bilingual Word Embeddings without Word Alignment: Factors Effects","authors":"T. Alqaisi, Simon E. M. O'Keefe","doi":"10.18653/v1/W19-4611","DOIUrl":"https://doi.org/10.18653/v1/W19-4611","url":null,"abstract":"This paper introduces the first attempt to investigate morphological segmentation on En-Ar bilingual word embeddings using bilingual word embeddings model without word alignment (BilBOWA). We investigate the effect of sentence length and embedding size on the learning process. Our experiment shows that using the D3 segmentation scheme improves the accuracy of learning bilingual word embeddings up to 10 percentage points compared to the ATB and D0 schemes in all different training settings.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114656250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Morphologically Annotated Corpora for Seven Arabic Dialects: Taizi, Sanaani, Najdi, Jordanian, Syrian, Iraqi and Moroccan 七种阿拉伯方言的语料库:泰兹语、萨那尼语、纳吉迪语、约旦语、叙利亚语、伊拉克语和摩洛哥语
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4615
Faisal Al-Shargi, Shahd Dibas, Sakhar B. Alkhereyfy, Reem Faraj, Basma Abdulkareem, S. Yagi, Ouafaa Kacha, Nizar Habash, Owen Rambow
{"title":"Morphologically Annotated Corpora for Seven Arabic Dialects: Taizi, Sanaani, Najdi, Jordanian, Syrian, Iraqi and Moroccan","authors":"Faisal Al-Shargi, Shahd Dibas, Sakhar B. Alkhereyfy, Reem Faraj, Basma Abdulkareem, S. Yagi, Ouafaa Kacha, Nizar Habash, Owen Rambow","doi":"10.18653/v1/W19-4615","DOIUrl":"https://doi.org/10.18653/v1/W19-4615","url":null,"abstract":"We present a collection of morphologically annotated corpora for seven Arabic dialects: Taizi Yemeni, Sanaani Yemeni, Najdi, Jordanian, Syrian, Iraqi and Moroccan Arabic. The corpora collectively cover over 200,000 words, and are all manually annotated in a common set of standards for orthography, diacritized lemmas, tokenization, morphological units and English glosses. These corpora will be publicly available to serve as benchmarks for training and evaluating systems for Arabic dialect morphological analysis and disambiguation.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Arabic Named Entity Recognition: What Works and What’s Next 阿拉伯命名实体识别:什么是有效的,下一步是什么
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4607
Liyuan Liu, Jingbo Shang, Jiawei Han
{"title":"Arabic Named Entity Recognition: What Works and What’s Next","authors":"Liyuan Liu, Jingbo Shang, Jiawei Han","doi":"10.18653/v1/W19-4607","DOIUrl":"https://doi.org/10.18653/v1/W19-4607","url":null,"abstract":"This paper presents the winning solution to the Arabic Named Entity Recognition challenge run by Topcoder.com. The proposed model integrates various tailored techniques together, including representation learning, feature engineering, sequence labeling, and ensemble learning. The final model achieves a test F_1 score of 75.82% on the AQMAR dataset and outperforms baselines by a large margin. Detailed analyses are conducted to reveal both its strengths and limitations. Specifically, we observe that (1) representation learning modules can significantly boost the performance but requires a proper pre-processing and (2) the resulting embedding can be further enhanced with feature engineering due to the limited size of the training data. All implementations and pre-trained models are made public.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115412030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Mawdoo3 AI at MADAR Shared Task: Arabic Fine-Grained Dialect Identification with Ensemble Learning 基于集成学习的阿拉伯语细粒度方言识别
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4630
A. Ragab, Haitham Seelawi, Mostafa Samir, Abdelrahman Mattar, Hesham Al-Bataineh, Mohammad Zaghloul, Ahmad Mustafa, Bashar Talafha, Abed Alhakim Freihat, Hussein T. Al-Natsheh
{"title":"Mawdoo3 AI at MADAR Shared Task: Arabic Fine-Grained Dialect Identification with Ensemble Learning","authors":"A. Ragab, Haitham Seelawi, Mostafa Samir, Abdelrahman Mattar, Hesham Al-Bataineh, Mohammad Zaghloul, Ahmad Mustafa, Bashar Talafha, Abed Alhakim Freihat, Hussein T. Al-Natsheh","doi":"10.18653/v1/W19-4630","DOIUrl":"https://doi.org/10.18653/v1/W19-4630","url":null,"abstract":"In this paper we discuss several models we used to classify 25 city-level Arabic dialects in addition to Modern Standard Arabic (MSA) as part of MADAR shared task (sub-task 1). We propose an ensemble model of a group of experimentally designed best performing classifiers on a various set of features. Our system achieves an accuracy of 69.3% macro F1-score with an improvement of 1.4% accuracy from the baseline model on the DEV dataset. Our best run submitted model ranked as third out of 19 participating teams on the TEST dataset with only 0.12% macro F1-score behind the top ranked system.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126908349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification 共享任务1:细粒度阿拉伯语方言识别的语言建模和集成学习
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4632
K. Kwaik, Motaz Saad
{"title":"ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification","authors":"K. Kwaik, Motaz Saad","doi":"10.18653/v1/W19-4632","DOIUrl":"https://doi.org/10.18653/v1/W19-4632","url":null,"abstract":"In this paper, we present a Dialect Identification system (ArbDialectID) that competed at Task 1 of the MADAR shared task, MADARTravel Domain Dialect Identification. We build a course and a fine-grained identification model to predict the label (corresponding to a dialect of Arabic) of a given text. We build two language models by extracting features at two levels (words and characters). We firstly build a coarse identification model to classify each sentence into one out of six dialects, then use this label as a feature for the fine-grained model that classifies the sentence among 26 dialects from different Arab cities, after that we apply ensemble voting classifier on both sub-systems. Our system ranked 1st that achieving an f-score of 67.32%. Both the models and our feature engineering tools are made available to the research community.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125107613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信