Automatic Language Identification

Nejla Qafmolla
{"title":"Automatic Language Identification","authors":"Nejla Qafmolla","doi":"10.26417/EJLS.V7I1.P140-150","DOIUrl":null,"url":null,"abstract":"Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.","PeriodicalId":350970,"journal":{"name":"EJLS European Journal of Language and Literature Studies Articles","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EJLS European Journal of Language and Literature Studies Articles","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26417/EJLS.V7I1.P140-150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Automatic Language Identification (LID) is the process of automatically identifying the language of spoken utterance or written material. LID has received much attention due to its application to major areas of research and long-aspired dreams in computational sciences, namely Machine Translation (MT), Speech Recognition (SR) and Data Mining (DM). A considerable increase in the amount of and access to data provided not only by experts but also by users all over the Internet has resulted into both the development of different approaches in the area of LID – so as to generate more efficient systems – as well as major challenges that are still in the eye of the storm of this field. Despite the fact that the current approaches have accomplished considerable success, future research concerning some issues remains on the table. The aim of this paper shall not be to describe the historic background of this field of studies, but rather to provide an overview of the current state of LID systems, as well as to classify the approaches developed to accomplish them. LID systems have advanced and are continuously evolving. Some of the issues that need special attention and improvement are semantics, the identification of various dialects and varieties of a language, identification of spelling errors, data retrieval, multilingual documents, MT and speech-to-speech translation. Methods applied to date have been good from a technical point of view, but not from a semantic one.
自动语言识别
自动语言识别(LID)是对口头话语或书面材料的语言进行自动识别的过程。由于其应用于计算科学的主要研究领域和长期渴望的梦想,即机器翻译(MT),语音识别(SR)和数据挖掘(DM), LID受到了广泛关注。不仅由专家而且由整个互联网的用户提供的数据的数量和访问的大量增加,导致了LID领域中不同方法的发展- -以便产生更有效的系统- -以及仍然处于该领域风暴中心的主要挑战。尽管目前的方法已经取得了相当大的成功,但关于某些问题的未来研究仍有待讨论。本文的目的不是描述这一研究领域的历史背景,而是对LID系统的现状进行概述,并对为实现这些系统而开发的方法进行分类。LID系统已经发展并在不断发展。一些需要特别关注和改进的问题是语义学、各种方言和语言变体的识别、拼写错误的识别、数据检索、多语言文档、机器翻译和语音到语音翻译。迄今为止应用的方法从技术的角度来看是好的,但从语义的角度来看则不然。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信