Proceedings of the Sixth Workshop on最新文献

筛选
英文 中文
DTeam @ VarDial 2019: Ensemble based on skip-gram and triplet loss neural networks for Moldavian vs. Romanian cross-dialect topic identification DTeam @ VarDial 2019:基于skip-gram和三重损失神经网络的集成,用于摩尔多瓦语与罗马尼亚语跨方言主题识别
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1422
D. Tudoreanu
{"title":"DTeam @ VarDial 2019: Ensemble based on skip-gram and triplet loss neural networks for Moldavian vs. Romanian cross-dialect topic identification","authors":"D. Tudoreanu","doi":"10.18653/v1/W19-1422","DOIUrl":"https://doi.org/10.18653/v1/W19-1422","url":null,"abstract":"This paper presents the solution proposed by DTeam in the VarDial 2019 Evaluation Campaign for the Moldavian vs. Romanian cross-topic identification task. The solution proposed is a Support Vector Machines (SVM) ensemble composed of a two character-level neural networks. The first network is a skip-gram classification model formed of an embedding layer, three convolutional layers and two fully-connected layers. The second network has a similar architecture, but is trained using the triplet loss function.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134544582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Improving Cuneiform Language Identification with BERT 利用BERT改进楔形文字识别
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1402
Gabriel Bernier-Colborne, Cyril Goutte, Serge Léger
{"title":"Improving Cuneiform Language Identification with BERT","authors":"Gabriel Bernier-Colborne, Cyril Goutte, Serge Léger","doi":"10.18653/v1/W19-1402","DOIUrl":"https://doi.org/10.18653/v1/W19-1402","url":null,"abstract":"We describe the systems developed by the National Research Council Canada for the Cuneiform Language Identification (CLI) shared task at the 2019 VarDial evaluation campaign. We compare a state-of-the-art baseline relying on character n-grams and a traditional statistical classifier, a voting ensemble of classifiers, and a deep learning approach using a Transformer network. We describe how these systems were trained, and analyze the impact of some preprocessing and model estimation decisions. The deep neural network achieved 77% accuracy on the test data, which turned out to be the best performance at the CLI evaluation, establishing a new state-of-the-art for cuneiform language identification.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130481290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation 阿拉伯文方言神经机器翻译的流水线与集成方法比较
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1424
Pamela Shapiro, Kevin Duh
{"title":"Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation","authors":"Pamela Shapiro, Kevin Duh","doi":"10.18653/v1/W19-1424","DOIUrl":"https://doi.org/10.18653/v1/W19-1424","url":null,"abstract":"When translating diglossic languages such as Arabic, situations may arise where we would like to translate a text but do not know which dialect it is. A traditional approach to this problem is to design dialect identification systems and dialect-specific machine translation systems. However, under the recent paradigm of neural machine translation, shared multi-dialectal systems have become a natural alternative. Here we explore under which conditions it is beneficial to perform dialect identification for Arabic neural machine translation versus using a general system for all dialects.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis 跨语言形态分析的神经和线性管道方法
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1416
Çagri Çöltekin, Jeremy Barnes
{"title":"Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis","authors":"Çagri Çöltekin, Jeremy Barnes","doi":"10.18653/v1/W19-1416","DOIUrl":"https://doi.org/10.18653/v1/W19-1416","url":null,"abstract":"This paper describes Tübingen-Oslo team’s participation in the cross-lingual morphological analysis task in the VarDial 2019 evaluation campaign. We participated in the shared task with a standard neural network model. Our model achieved analysis F1-scores of 31.48 and 23.67 on test languages Karachay-Balkar (Turkic) and Sardinian (Romance) respectively. The scores are comparable to the scores obtained by the other participants in both language families, and the analysis score on the Romance data set was also the best result obtained in the shared task. Besides describing the system used in our shared task participation, we describe another, simpler, model based on linear classifiers, and present further analyses using both models. Our analyses, besides revealing some of the difficult cases, also confirm that the usefulness of a source language in this task is highly correlated with the similarity of source and target languages.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121278100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation 基于语素分割的跨语言形态分析初步实验
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1415
V. Mikhailov, Lorenzo Tosi, Anastasia Khorosheva, O. Serikov
{"title":"Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation","authors":"V. Mikhailov, Lorenzo Tosi, Anastasia Khorosheva, O. Serikov","doi":"10.18653/v1/W19-1415","DOIUrl":"https://doi.org/10.18653/v1/W19-1415","url":null,"abstract":"The paper describes initial experiments in data-driven cross-lingual morphological analysis of open-category words using a combination of unsupervised morpheme segmentation, annotation projection and an LSTM encoder-decoder model with attention. Our algorithm provides lemmatisation and morphological analysis generation for previously unseen low-resource language surface forms with only annotated data on the related languages given. Despite the inherently lossy annotation projection, we achieved the best lemmatisation F1-score in the VarDial 2019 Shared Task on Cross-Lingual Morphological Analysis for both Karachay-Balkar (Turkic languages, agglutinative morphology) and Sardinian (Romance languages, fusional morphology).","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130811768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Variation between Different Discourse Types: Literate vs. Oral 不同话语类型之间的差异:文学与口头
Proceedings of the Sixth Workshop on Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-1407
Katrin Ortmann, Stefanie Dipper
{"title":"Variation between Different Discourse Types: Literate vs. Oral","authors":"Katrin Ortmann, Stefanie Dipper","doi":"10.18653/v1/W19-1407","DOIUrl":"https://doi.org/10.18653/v1/W19-1407","url":null,"abstract":"This paper deals with the automatic identification of literate and oral discourse in German texts. A range of linguistic features is selected and their role in distinguishing between literate- and oral-oriented registers is investigated, using a decision-tree classifier. It turns out that all of the investigated features are related in some way to oral conceptuality. Especially simple measures of complexity (average sentence and word length) are prominent indicators of oral and literate discourse. In addition, features of reference and deixis (realized by different types of pronouns) also prove to be very useful in determining the degree of orality of different registers.","PeriodicalId":344344,"journal":{"name":"Proceedings of the Sixth Workshop on","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116773878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信