自动语言识别

EJLS European Journal of Language and Literature Studies Articles Pub Date : 2017-01-21 DOI:10.26417/EJLS.V7I1.P140-150

Nejla Qafmolla

{"title":"自动语言识别","authors":"Nejla Qafmolla","doi":"10.26417/EJLS.V7I1.P140-150","DOIUrl":null,"url":null,"abstract":"Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.","PeriodicalId":350970,"journal":{"name":"EJLS European Journal of Language and Literature Studies Articles","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic Language Identification\",\"authors\":\"Nejla Qafmolla\",\"doi\":\"10.26417/EJLS.V7I1.P140-150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.\",\"PeriodicalId\":350970,\"journal\":{\"name\":\"EJLS European Journal of Language and Literature Studies Articles\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EJLS European Journal of Language and Literature Studies Articles\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26417/EJLS.V7I1.P140-150\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EJLS European Journal of Language and Literature Studies Articles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26417/EJLS.V7I1.P140-150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

自动语言识别(LID)是对口头话语或书面材料的语言进行自动识别的过程。由于其应用于计算科学的主要研究领域和长期渴望的梦想，即机器翻译(MT)，语音识别(SR)和数据挖掘(DM)， LID受到了广泛关注。不仅由专家而且由整个互联网的用户提供的数据的数量和访问的大量增加，导致了LID领域中不同方法的发展- -以便产生更有效的系统- -以及仍然处于该领域风暴中心的主要挑战。尽管目前的方法已经取得了相当大的成功，但关于某些问题的未来研究仍有待讨论。本文的目的不是描述这一研究领域的历史背景，而是对LID系统的现状进行概述，并对为实现这些系统而开发的方法进行分类。LID系统已经发展并在不断发展。一些需要特别关注和改进的问题是语义学、各种方言和语言变体的识别、拼写错误的识别、数据检索、多语言文档、机器翻译和语音到语音翻译。迄今为止应用的方法从技术的角度来看是好的，但从语义的角度来看则不然。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Language Identification

Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

EJLS European Journal of Language and Literature Studies Articles

自引率

0.00%

发文量