Word Sense Disambiguation Using Semantic Web for Tamil to English Statistical Machine Translation

Santosh Kumar T.S.
{"title":"Word Sense Disambiguation Using Semantic Web for Tamil to English Statistical Machine Translation","authors":"Santosh Kumar T.S.","doi":"10.21013/JTE.V5.N2.P1","DOIUrl":null,"url":null,"abstract":"Machine Translation has been an area of linguistic research for almost more than two decades now. But it still remains a very challenging task for devising an automated system which will deliver accurate translations of the natural languages. However, great strides have been made in this field with more success owing to the development of technologies of the web and off late there is a renewed interest in this area of research.  Technological advancements in the preceding two decades have influenced Machine Translation in a considerable way. Several MT approaches including Statistical Machine Translation greatly benefitted from these advancements, basically making use of the availability of extensive corpora. Web technology web3.0 uses the semantic web technology which represents any object or resource in the web both syntactically and semantically.  This type of representation is very much useful for the computing systems to search any content on the internet similar to lexical search and improve the internet based translations making it more effective and efficient. In this paper we propose a technique to improve existing statistical Machine Translation methods by making use of semantic web technology. Our focus will be on Tamil and Tamil to English MT. The proposed method could successfully integrate a semantic web technique in the process of WSD which forms part of the MT system. The integration is accomplished by using the capabilities of RDFS and OWL into the WSD component of the MT model. The contribution of this work lies in showing that integrating a Semantic web technique in the WSD system significantly improves the performance of a statistical MT system for a translation from Tamil to English. In this paper we assume the availability of large corpora in Tamil language and specific domain based ontologies with Tamil semantic web technology using web3.0. We are positive on the expansion and development of Tamil semantic web and subsequently infer that Tamil to English MT will greatly improve the disambiguation concept apart from other related benefits. This method could enable the enhancement of translation quality by improving on word sense disambiguation process while text is translated from Tamil to English language. This method can also be extended to other languages such as Hindi and Indian Languages.","PeriodicalId":269688,"journal":{"name":"IRA-International Journal of Technology & Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IRA-International Journal of Technology & Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21013/JTE.V5.N2.P1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Machine Translation has been an area of linguistic research for almost more than two decades now. But it still remains a very challenging task for devising an automated system which will deliver accurate translations of the natural languages. However, great strides have been made in this field with more success owing to the development of technologies of the web and off late there is a renewed interest in this area of research.  Technological advancements in the preceding two decades have influenced Machine Translation in a considerable way. Several MT approaches including Statistical Machine Translation greatly benefitted from these advancements, basically making use of the availability of extensive corpora. Web technology web3.0 uses the semantic web technology which represents any object or resource in the web both syntactically and semantically.  This type of representation is very much useful for the computing systems to search any content on the internet similar to lexical search and improve the internet based translations making it more effective and efficient. In this paper we propose a technique to improve existing statistical Machine Translation methods by making use of semantic web technology. Our focus will be on Tamil and Tamil to English MT. The proposed method could successfully integrate a semantic web technique in the process of WSD which forms part of the MT system. The integration is accomplished by using the capabilities of RDFS and OWL into the WSD component of the MT model. The contribution of this work lies in showing that integrating a Semantic web technique in the WSD system significantly improves the performance of a statistical MT system for a translation from Tamil to English. In this paper we assume the availability of large corpora in Tamil language and specific domain based ontologies with Tamil semantic web technology using web3.0. We are positive on the expansion and development of Tamil semantic web and subsequently infer that Tamil to English MT will greatly improve the disambiguation concept apart from other related benefits. This method could enable the enhancement of translation quality by improving on word sense disambiguation process while text is translated from Tamil to English language. This method can also be extended to other languages such as Hindi and Indian Languages.
基于语义网的泰米尔语到英语统计机器翻译词义消歧
机器翻译作为语言学研究的一个领域已经有近二十年的历史了。但是,设计一个能够准确翻译自然语言的自动化系统仍然是一项非常具有挑战性的任务。然而,由于网络技术的发展,这一领域取得了巨大的进步,取得了更大的成功,最近对这一领域的研究又产生了新的兴趣。在过去的二十年里,技术的进步在很大程度上影响了机器翻译。包括统计机器翻译在内的几种机器翻译方法从这些进步中受益匪浅,基本上利用了大量语料库的可用性。Web技术web3.0使用语义Web技术,语义Web技术在语法和语义上表示网络中的任何对象或资源。这种类型的表示对于计算系统在互联网上搜索类似于词汇搜索的任何内容非常有用,并且可以改进基于互联网的翻译,使其更加有效和高效。本文提出了一种利用语义网技术改进现有统计机器翻译方法的方法。我们的重点将放在泰米尔语和泰米尔语到英语的机器翻译上。所提出的方法可以成功地将语义网技术集成到构成机器翻译系统一部分的WSD过程中。集成是通过在MT模型的WSD组件中使用RDFS和OWL的功能来完成的。这项工作的贡献在于展示了在WSD系统中集成语义网技术可以显著提高统计MT系统从泰米尔语到英语翻译的性能。在本文中,我们假设使用web3.0的泰米尔语语义web技术可以获得大型泰米尔语语料库和基于特定领域的本体。我们对泰米尔语语义网的扩展和发展持积极态度,随后推断泰米尔语到英语的机器翻译除了其他相关的好处外,还将极大地改善消歧概念。该方法通过改进泰米尔语翻译过程中的词义消歧过程,提高了翻译质量。这种方法也可以扩展到其他语言,如印地语和印度语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信