Proceedings of the Sixth Workshop on最新文献

筛选
英文 中文
Cross-lingual Annotation Projection Is Effective for Neural Part-of-Speech Tagging 跨语言标注投影是神经词性标注的有效方法
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1425
Matthias Huck, Diana Dutka, Alexander M. Fraser
{"title":"Cross-lingual Annotation Projection Is Effective for Neural Part-of-Speech Tagging","authors":"Matthias Huck, Diana Dutka, Alexander M. Fraser","doi":"10.18653/v1/W19-1425","DOIUrl":"https://doi.org/10.18653/v1/W19-1425","url":null,"abstract":"We tackle the important task of part-of-speech tagging using a neural model in the zero-resource scenario, where we have no access to gold-standard POS training data. We compare this scenario with the low-resource scenario, where we have access to a small amount of gold-standard POS training data. Our experiments focus on Ukrainian as a representative of under-resourced languages. Russian is highly related to Ukrainian, so we exploit gold-standard Russian POS tags. We consider four techniques to perform Ukrainian POS tagging: zero-shot tagging and cross-lingual annotation projection (for the zero-resource scenario), and compare these with self-training and multilingual learning (for the low-resource scenario). We find that cross-lingual annotation projection works particularly well in the zero-resource scenario.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"IE-33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
SC-UPB at the VarDial 2019 Evaluation Campaign: Moldavian vs. Romanian Cross-Dialect Topic Identification SC-UPB在VarDial 2019评估活动:摩尔多瓦语与罗马尼亚语跨方言主题识别
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1418
Cristian Onose, Dumitru-Clementin Cercel, Stefan Trausan-Matu
{"title":"SC-UPB at the VarDial 2019 Evaluation Campaign: Moldavian vs. Romanian Cross-Dialect Topic Identification","authors":"Cristian Onose, Dumitru-Clementin Cercel, Stefan Trausan-Matu","doi":"10.18653/v1/W19-1418","DOIUrl":"https://doi.org/10.18653/v1/W19-1418","url":null,"abstract":"This paper describes our models for the Moldavian vs. Romanian Cross-Topic Identification (MRC) evaluation campaign, part of the VarDial 2019 workshop. We focus on the three subtasks for MRC: binary classification between the Moldavian (MD) and the Romanian (RO) dialects and two cross-dialect multi-class classification between six news topics, MD to RO and RO to MD. We propose several deep learning models based on long short-term memory cells, Bidirectional Gated Recurrent Unit (BiGRU) and Hierarchical Attention Networks (HAN). We also employ three word embedding models to represent the text as a low dimensional vector. Our official submission includes two runs of the BiGRU and HAN models for each of the three subtasks. The best submitted model obtained the following macro-averaged F1 scores: 0.708 for subtask 1, 0.481 for subtask 2 and 0.480 for the last one. Due to a read error caused by the quoting behaviour over the test file, our final submissions contained a smaller number of items than expected. More than 50% of the submission files were corrupted. Thus, we also present the results obtained with the corrected labels for which the HAN model achieves the following results: 0.930 for subtask 1, 0.590 for subtask 2 and 0.687 for the third one.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130744098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Ensemble Methods to Distinguish Mainland and Taiwan Chinese 大陆汉语与台湾汉语的集成方法辨析
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1417
Hai Hu, Wen Li, He Zhou, Zuoyu Tian, Yiwen Zhang, Liang Zou
{"title":"Ensemble Methods to Distinguish Mainland and Taiwan Chinese","authors":"Hai Hu, Wen Li, He Zhou, Zuoyu Tian, Yiwen Zhang, Liang Zou","doi":"10.18653/v1/W19-1417","DOIUrl":"https://doi.org/10.18653/v1/W19-1417","url":null,"abstract":"This paper describes the IUCL system at VarDial 2019 evaluation campaign for the task of discriminating between Mainland and Taiwan variation of mandarin Chinese. We first build several base classifiers, including a Naive Bayes classifier with word n-gram as features, SVMs with both character and syntactic features, and neural networks with pre-trained character/word embeddings. Then we adopt ensemble methods to combine output from base classifiers to make final predictions. Our ensemble models achieve the highest F1 score (0.893) in simplified Chinese track and the second highest (0.901) in traditional Chinese track. Our results demonstrate the effectiveness and robustness of the ensemble methods.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122975589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Toward a deep dialectological representation of Indo-Aryan 走向印度-雅利安语的深层方言表征
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1411
C. Cathcart
{"title":"Toward a deep dialectological representation of Indo-Aryan","authors":"C. Cathcart","doi":"10.18653/v1/W19-1411","DOIUrl":"https://doi.org/10.18653/v1/W19-1411","url":null,"abstract":"This paper presents a new approach to disentangling inter-dialectal and intra-dialectal relationships within one such group, the Indo-Aryan subgroup of Indo-European. We draw upon admixture models and deep generative models to tease apart historic language contact and language-specific behavior in the overall patterns of sound change displayed by Indo-Aryan languages. We show that a “deep” model of Indo-Aryan dialectology sheds some light on questions regarding inter-relationships among the Indo-Aryan languages, and performs better than a “shallow” model in terms of certain qualities of the posterior distribution (e.g., entropy of posterior distributions), and outline future pathways for model development.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131938101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Report on the Third VarDial Evaluation Campaign 第三次VarDial评价活动报告
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1401
Marcos Zampieri, S. Malmasi, Yves Scherrer, T. Samardžić, Francis M. Tyers, Miikka Silfverberg, N. Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, T. Jauhiainen
{"title":"A Report on the Third VarDial Evaluation Campaign","authors":"Marcos Zampieri, S. Malmasi, Yves Scherrer, T. Samardžić, Francis M. Tyers, Miikka Silfverberg, N. Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, T. Jauhiainen","doi":"10.18653/v1/W19-1401","DOIUrl":"https://doi.org/10.18653/v1/W19-1401","url":null,"abstract":"In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019. This year, the campaign included five shared tasks, including one task re-run – German Dialect Identification (GDI) – and four new tasks – Cross-lingual Morphological Analysis (CMA), Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT), Moldavian vs. Romanian Cross-dialect Topic identification (MRC), and Cuneiform Language Identification (CLI). A total of 22 teams submitted runs across the five shared tasks. After the end of the competition, we received 14 system description papers, which are published in the VarDial workshop proceedings and referred to in this report.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127783261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Joint Approach to Deromanization of Code-mixed Texts 语码混合语篇非罗曼化的联合研究
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1403
Rashed Rubby Riyadh, Grzegorz Kondrak
{"title":"Joint Approach to Deromanization of Code-mixed Texts","authors":"Rashed Rubby Riyadh, Grzegorz Kondrak","doi":"10.18653/v1/W19-1403","DOIUrl":"https://doi.org/10.18653/v1/W19-1403","url":null,"abstract":"The conversion of romanized texts back to the native scripts is a challenging task because of the inconsistent romanization conventions and non-standard language use. This problem is compounded by code-mixing, i.e., using words from more than one language within the same discourse. In this paper, we propose a novel approach for handling these two problems together in a single system. Our approach combines three components: language identification, back-transliteration, and sequence prediction. The results of our experiments on Bengali and Hindi datasets establish the state of the art for the task of deromanization of code-mixed texts.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114159413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
TwistBytes - Identification of Cuneiform Languages and German Dialects at VarDial 2019 TwistBytes -识别楔形文字和德语方言在VarDial 2019
Proceedings of the Sixth Workshop on Pub Date : 2019-06-01 DOI: 10.18653/v1/W19-1421
Fernando Benites, P. von Däniken, Mark Cieliebak
{"title":"TwistBytes - Identification of Cuneiform Languages and German Dialects at VarDial 2019","authors":"Fernando Benites, P. von Däniken, Mark Cieliebak","doi":"10.18653/v1/W19-1421","DOIUrl":"https://doi.org/10.18653/v1/W19-1421","url":null,"abstract":"We describe our approaches for the German Dialect Identification (GDI) and the Cuneiform Language Identification (CLI) tasks at the VarDial Evaluation Campaign 2019. The goal was to identify dialects of Swiss German in GDI and Sumerian and Akkadian in CLI. In GDI, the system should distinguish four dialects from the German-speaking part of Switzerland. Our system for GDI achieved third place out of 6 teams, with a macro averaged F-1 of 74.6%. In CLI, the system should distinguish seven languages written in cuneiform script. Our system achieved third place out of 8 teams, with a macro averaged F-1 of 74.7%.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121343341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models 运用自适应语言模型对普通话和瑞士德语进行区分
Proceedings of the Sixth Workshop on Pub Date : 2019-04-30 DOI: 10.18653/v1/W19-1419
T. Jauhiainen, Krister Lindén, H. Jauhiainen
{"title":"Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models","authors":"T. Jauhiainen, Krister Lindén, H. Jauhiainen","doi":"10.18653/v1/W19-1419","DOIUrl":"https://doi.org/10.18653/v1/W19-1419","url":null,"abstract":"This paper describes the language identification systems used by the SUKI team in the Discriminating between the Mainland and Taiwan variation of Mandarin Chinese (DMT) and the German Dialect Identification (GDI) shared tasks which were held as part of the third VarDial Evaluation Campaign. The DMT shared task included two separate tracks, one for the simplified Chinese script and one for the traditional Chinese script. We submitted three runs on both tracks of the DMT task as well as on the GDI task. We won the traditional Chinese track using Naive Bayes with language model adaptation, came second on GDI with an adaptive version of the HeLI 2.0 method, and third on the simplified Chinese track using again the adaptive Naive Bayes.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130398861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation 相似语言的语言辨别与迁移学习:特征组合与适应实验
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1406
Nianheng Wu, Eric DeMattos, Kwok Him So, Pin-zhen Chen, Çagri Çöltekin
{"title":"Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation","authors":"Nianheng Wu, Eric DeMattos, Kwok Him So, Pin-zhen Chen, Çagri Çöltekin","doi":"10.18653/v1/W19-1406","DOIUrl":"https://doi.org/10.18653/v1/W19-1406","url":null,"abstract":"This paper describes the work done by team tearsofjoy participating in the VarDial 2019 Evaluation Campaign. We developed two systems based on Support Vector Machines: SVM with a flat combination of features and SVM ensembles. We participated in all language/dialect identification tasks, as well as the Moldavian vs. Romanian cross-dialect topic identification (MRC) task. Our team achieved first place in German Dialect identification (GDI) and MRC subtasks 2 and 3, second place in the simplified variant of Discriminating between Mainland and Taiwan variation of Mandarin Chinese (DMT) as well as Cuneiform Language Identification (CLI), and third and fifth place in DMT traditional and MRC subtask 1 respectively. In most cases, the SVM with a flat combination of features performed better than SVM ensembles. Besides describing the systems and the results obtained by them, we provide a tentative comparison between the feature combination methods, and present additional experiments with a method of adaptation to the test set, which may indicate potential pitfalls with some of the data sets.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
BAM: A combination of deep and shallow models for German Dialect Identification. 德语方言识别的深层和浅层模型的结合。
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1413
Andrei M. Butnaru
{"title":"BAM: A combination of deep and shallow models for German Dialect Identification.","authors":"Andrei M. Butnaru","doi":"10.18653/v1/W19-1413","DOIUrl":"https://doi.org/10.18653/v1/W19-1413","url":null,"abstract":"*This is a submission for the Third VarDial Evaluation Campaign* In this paper, we present a machine learning approach for the German Dialect Identification (GDI) Closed Shared Task of the DSL 2019 Challenge. The proposed approach combines deep and shallow models, by applying a voting scheme on the outputs resulted from a Character-level Convolutional Neural Networks (Char-CNN), a Long Short-Term Memory (LSTM) network, and a model based on String Kernels. The first model used is the Char-CNN model that merges multiple convolutions computed with kernels of different sizes. The second model is the LSTM network which applies a global max pooling over the returned sequences over time. Both models pass the activation maps to two fully-connected layers. The final model is based on String Kernels, computed on character p-grams extracted from speech transcripts. The model combines two blended kernel functions, one is the presence bits kernel, and the other is the intersection kernel. The empirical results obtained in the shared task prove that the approach can achieve good results. The system proposed in this paper obtained the fourth place with a macro-F1 score of 62.55%","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128680315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信