访谈笔录中的德语方言识别

Workshop on NLP for Similar Languages, Varieties and Dialects Pub Date : 1900-01-01 DOI:10.18653/v1/W17-1220

S. Malmasi, Marcos Zampieri

{"title":"访谈笔录中的德语方言识别","authors":"S. Malmasi, Marcos Zampieri","doi":"10.18653/v1/W17-1220","DOIUrl":null,"url":null,"abstract":"This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017. The task consists of training models to identify the dialect of Swiss-German speech transcripts. The dialects included in the GDI dataset are Basel, Bern, Lucerne, and Zurich. The three systems we submitted are based on: a plurality ensemble, a mean probability ensemble, and a meta-classifier trained on character and word n-grams. The best results were obtained by the meta-classifier achieving 68.1% accuracy and 66.2% F1-score, ranking first among the 10 teams which participated in the GDI shared task.","PeriodicalId":167439,"journal":{"name":"Workshop on NLP for Similar Languages, Varieties and Dialects","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"German Dialect Identification in Interview Transcriptions\",\"authors\":\"S. Malmasi, Marcos Zampieri\",\"doi\":\"10.18653/v1/W17-1220\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017. The task consists of training models to identify the dialect of Swiss-German speech transcripts. The dialects included in the GDI dataset are Basel, Bern, Lucerne, and Zurich. The three systems we submitted are based on: a plurality ensemble, a mean probability ensemble, and a meta-classifier trained on character and word n-grams. The best results were obtained by the meta-classifier achieving 68.1% accuracy and 66.2% F1-score, ranking first among the 10 teams which participated in the GDI shared task.\",\"PeriodicalId\":167439,\"journal\":{\"name\":\"Workshop on NLP for Similar Languages, Varieties and Dialects\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on NLP for Similar Languages, Varieties and Dialects\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/W17-1220\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on NLP for Similar Languages, Varieties and Dialects","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W17-1220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

本文介绍了2017年VarDial评估活动中提交给德语方言识别(GDI)任务的三个系统。该任务包括训练模型来识别瑞士德语语音文本的方言。GDI数据集中的方言包括巴塞尔、伯尔尼、卢塞恩和苏黎世。我们提交的三个系统是基于:一个多元集成，一个平均概率集成和一个基于字符和单词n-grams训练的元分类器。元分类器的准确率为68.1%，f1得分为66.2%，在参与GDI共享任务的10个团队中排名第一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

German Dialect Identification in Interview Transcriptions

This paper presents three systems submitted to the German Dialect Identification (GDI) task at the VarDial Evaluation Campaign 2017. The task consists of training models to identify the dialect of Swiss-German speech transcripts. The dialects included in the GDI dataset are Basel, Bern, Lucerne, and Zurich. The three systems we submitted are based on: a plurality ensemble, a mean probability ensemble, and a meta-classifier trained on character and word n-grams. The best results were obtained by the meta-classifier achieving 68.1% accuracy and 66.2% F1-score, ranking first among the 10 teams which participated in the GDI shared task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on NLP for Similar Languages, Varieties and Dialects

自引率

0.00%

发文量