WANLP@ACL 2019最新文献

筛选
英文 中文
Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA 阿尔及利亚语和MSA二元语义文本相似度检测的神经模型
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4609
Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik
{"title":"Neural Models for Detecting Binary Semantic Textual Similarity for Algerian and MSA","authors":"Wafia Adouane, Jean-Philippe Bernardy, Simon Dobnik","doi":"10.18653/v1/W19-4609","DOIUrl":"https://doi.org/10.18653/v1/W19-4609","url":null,"abstract":"We explore the extent to which neural networks can learn to identify semantically equivalent sentences from a small variable dataset using an end-to-end training. We collect a new noisy non-standardised user-generated Algerian (ALG) dataset and also translate it to Modern Standard Arabic (MSA) which serves as its regularised counterpart. We compare the performance of various models on both datasets and report the best performing configurations. The results show that relatively simple models composed of 2 LSTM layers outperform by far other more sophisticated attention-based architectures, for both ALG and MSA datasets.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115466512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
POS Tagging for Improving Code-Switching Identification in Arabic 改进阿拉伯语语码转换识别的词性标注
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4603
Mohammed A. Attia, Younes Samih, Ali El-Kahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish
{"title":"POS Tagging for Improving Code-Switching Identification in Arabic","authors":"Mohammed A. Attia, Younes Samih, Ali El-Kahky, Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish","doi":"10.18653/v1/W19-4603","DOIUrl":"https://doi.org/10.18653/v1/W19-4603","url":null,"abstract":"When speakers code-switch between their native language and a second language or language variant, they follow a syntactic pattern where words and phrases from the embedded language are inserted into the matrix language. This paper explores the possibility of utilizing this pattern in improving code-switching identification between Modern Standard Arabic (MSA) and Egyptian Arabic (EA). We try to answer the question of how strong is the POS signal in word-level code-switching identification. We build a deep learning model enriched with linguistic features (including POS tags) that outperforms the state-of-the-art results by 1.9% on the development set and 1.0% on the test set. We also show that in intra-sentential code-switching, the selection of lexical items is constrained by POS categories, where function words tend to come more often from the dialectal language while the majority of content words come from the standard language.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116097823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The MADAR Shared Task on Arabic Fine-Grained Dialect Identification 阿拉伯语细粒度方言识别的MADAR共享任务
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4622
Houda Bouamor, Sabit Hassan, Nizar Habash
{"title":"The MADAR Shared Task on Arabic Fine-Grained Dialect Identification","authors":"Houda Bouamor, Sabit Hassan, Nizar Habash","doi":"10.18653/v1/W19-4622","DOIUrl":"https://doi.org/10.18653/v1/W19-4622","url":null,"abstract":"In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. This shared task was organized as part of The Fourth Arabic Natural Language Processing Workshop, collocated with ACL 2019. The shared task includes two subtasks: the MADAR Travel Domain Dialect Identification subtask (Subtask 1) and the MADAR Twitter User Dialect Identification subtask (Subtask 2). This shared task is the first to target a large set of dialect labels at the city and country levels. The data for the shared task was created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project. A total of 21 teams from 15 countries participated in the shared task.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123633862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
QC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification 提交MADAR共享任务:阿拉伯语细粒度方言识别
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4639
Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed I. Eldesouki, Kareem Darwish
{"title":"QC-GO Submission for MADAR Shared Task: Arabic Fine-Grained Dialect Identification","authors":"Younes Samih, Hamdy Mubarak, Ahmed Abdelali, Mohammed Attia, Mohamed I. Eldesouki, Kareem Darwish","doi":"10.18653/v1/W19-4639","DOIUrl":"https://doi.org/10.18653/v1/W19-4639","url":null,"abstract":"This paper describes the QC-GO team submission to the MADAR Shared Task Subtask 1 (travel domain dialect identification) and Subtask 2 (Twitter user location identification). In our participation in both subtasks, we explored a number of approaches and system combinations to obtain the best performance for both tasks. These include deep neural nets and heuristics. Since individual approaches suffer from various shortcomings, the combination of different approaches was able to fill some of these gaps. Our system achieves F1-Scores of 66.1% and 67.0% on the development sets for Subtasks 1 and 2 respectively.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128595922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System 形态丰富的语言之间的翻译:一个阿拉伯语到土耳其语的机器翻译系统
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4617
Ilknur Durgar El-Kahlout, E. Bektas, N. S. Erdem, Hamza Kaya
{"title":"Translating Between Morphologically Rich Languages: An Arabic-to-Turkish Machine Translation System","authors":"Ilknur Durgar El-Kahlout, E. Bektas, N. S. Erdem, Hamza Kaya","doi":"10.18653/v1/W19-4617","DOIUrl":"https://doi.org/10.18653/v1/W19-4617","url":null,"abstract":"This paper introduces the work on building a machine translation system for Arabic-to-Turkish in the news domain. Our work includes collecting parallel datasets in several ways for a new and low-resourced language pair, building baseline systems with state-of-the-art architectures and developing language specific algorithms for better translation. Parallel datasets are mainly collected three different ways; i) translating Arabic texts into Turkish by professional translators, ii) exploiting the web for open-source Arabic-Turkish parallel texts, iii) using back-translation. We per-formed preliminary experiments for Arabic-to-Turkish machine translation with neural(Marian) machine translation tools with a novel morphologically motivated vocabulary reduction method.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126785474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Construction and Annotation of the Jordan Comprehensive Contemporary Arabic Corpus (JCCA) 约旦当代阿拉伯语综合语料库(JCCA)的构建与注释
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4616
M. Sawalha, Faisal Alshargi, A. AlShdaifat, S. Yagi, Mohammad A. Qudah
{"title":"Construction and Annotation of the Jordan Comprehensive Contemporary Arabic Corpus (JCCA)","authors":"M. Sawalha, Faisal Alshargi, A. AlShdaifat, S. Yagi, Mohammad A. Qudah","doi":"10.18653/v1/W19-4616","DOIUrl":"https://doi.org/10.18653/v1/W19-4616","url":null,"abstract":"To compile a modern dictionary that catalogues the words in currency, and to study linguistic patterns in the contemporary language, it is necessary to have a corpus of authentic texts that reflect current usage of the language. Although there are numerous Arabic corpora, none claims to be representative of the language in terms of the combination of geographical region, genre, subject matter, mode, and medium. This paper describes a 100-million-word corpus that takes the British National Corpus (BNC) as a model. The aim of the corpus is to be balanced, annotated, comprehensive, and representative of contemporary Arabic as written and spoken in Arab countries today. It will be different from most others in not being heavily-dominated by the news or in mixing the classical with the modern. In this paper is an outline of the methodology adopted for the design, construction, and annotation of this corpus. DIWAN (Alshargi and Rambow, 2015) was used to annotate a one-million-word snapshot of the corpus. DIWAN is a dialectal word annotation tool, but we upgraded it by adding a new tag-set that is based on traditional Arabic grammar and by adding the roots and morphological patterns of nouns and verbs. Moreover, the corpus we constructed covers the major spoken varieties of Arabic.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115621429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The SMarT Classifier for Arabic Fine-Grained Dialect Identification 阿拉伯语细粒度方言识别的智能分类器
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4633
K. Meftouh, Karima Abidi, S. Harrat, K. Smaïli
{"title":"The SMarT Classifier for Arabic Fine-Grained Dialect Identification","authors":"K. Meftouh, Karima Abidi, S. Harrat, K. Smaïli","doi":"10.18653/v1/W19-4633","DOIUrl":"https://doi.org/10.18653/v1/W19-4633","url":null,"abstract":"This paper describes the approach adopted by the SMarT research group to build a dialect identification system in the framework of the Madar shared task on Arabic fine-grained dialect identification. We experimented several approaches, but we finally decided to use a Multinomial Naive Bayes classifier based on word and character ngrams in addition to the language model probabilities. We achieved a score of 67.73% in terms of Macro accuracy and a macro-averaged F1-score of 67.31%","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114931478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
ST MADAR 2019 Shared Task: Arabic Fine-Grained Dialect Identification 2019共享任务:阿拉伯语细粒度方言识别
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4635
Mourad Abbas, Mohamed Lichouri, Abed Alhakim Freihat
{"title":"ST MADAR 2019 Shared Task: Arabic Fine-Grained Dialect Identification","authors":"Mourad Abbas, Mohamed Lichouri, Abed Alhakim Freihat","doi":"10.18653/v1/W19-4635","DOIUrl":"https://doi.org/10.18653/v1/W19-4635","url":null,"abstract":"This paper describes the solution that we propose on MADAR 2019 Arabic Fine-Grained Dialect Identification task. The proposed solution utilized a set of classifiers that we trained on character and word features. These classifiers are: Support Vector Machines (SVM), Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), Stochastic Gradient Descent (SGD), Passive Aggressive(PA) and Perceptron (PC). The system achieved competitive results, with a performance of 62.87 % and 62.12 % for both development and test sets.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121474787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Improved Generalization of Arabic Text Classifiers 阿拉伯语文本分类器的改进泛化
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4618
Alaa Khaddaj, Hazem M. Hajj, W. El-Hajj
{"title":"Improved Generalization of Arabic Text Classifiers","authors":"Alaa Khaddaj, Hazem M. Hajj, W. El-Hajj","doi":"10.18653/v1/W19-4618","DOIUrl":"https://doi.org/10.18653/v1/W19-4618","url":null,"abstract":"While transfer learning for text has been very active in the English language, progress in Arabic has been slow, including the use of Domain Adaptation (DA). Domain Adaptation is used to generalize the performance of any classifier by trying to balance the classifier’s accuracy for a particular task among different text domains. In this paper, we propose and evaluate two variants of a domain adaptation technique: the first is a base model called Domain Adversarial Neural Network (DANN), while the second is a variation that incorporates representational learning. Similar to previous approaches, we propose the use of proxy A-distance as a metric to assess the success of generalization. We make use of ArSentDLEV, a multi-topic dataset collected from the Levantine countries, to test the performance of the models. We show the superiority of the proposed method in accuracy and robustness when dealing with the Arabic language.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128185133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Simple But Not Naïve: Fine-Grained Arabic Dialect Identification Using Only N-Grams 简单但不Naïve:细粒度阿拉伯语方言识别只使用n - gram
WANLP@ACL 2019 Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-4624
Sohaila Eltanbouly, May Bashendy, T. Elsayed
{"title":"Simple But Not Naïve: Fine-Grained Arabic Dialect Identification Using Only N-Grams","authors":"Sohaila Eltanbouly, May Bashendy, T. Elsayed","doi":"10.18653/v1/W19-4624","DOIUrl":"https://doi.org/10.18653/v1/W19-4624","url":null,"abstract":"This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character n-grams as features, and Na ̈ıve Bayes models as classifiers. Surprisingly, the simple approach achieved non-na ̈ıve performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58% by our best submitted run.","PeriodicalId":268163,"journal":{"name":"WANLP@ACL 2019","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114825899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信