Research on Optimization of Machine Translation Model Based on Data Mining Algorithm

Hong Liu
{"title":"Research on Optimization of Machine Translation Model Based on Data Mining Algorithm","authors":"Hong Liu","doi":"10.1109/ISAIEE57420.2022.00087","DOIUrl":null,"url":null,"abstract":"With the increasing number of Internet applications and frequent network interactions, the resources in the Internet show explosive growth. Under the impact of this wave, methods based on large-scale data, such as deep learning, have been put forward, and scholars have begun to think about many classical tasks from a new perspective. The LDA model is used to mine the topic information in the texts in parallel corpora, and the polynomial distribution of thesaurus is used to represent the topic, so as to judge the proportion of each document topic in the document collection. The specific words are obtained according to the polynomial distribution of the corresponding thesaurus of the topic by probability sampling. The monolingual corpus of the target language is processed by maximum likelihood estimation method, and the parallel corpus is taken as the training target. The monolingual corpus of the target language is estimated by importance sampling and full probability formula, and a machine English translation model is established. The estimated expected value is obtained by beam search method, so that English sentence translation can be realized. When disambiguating 2000 groups of random phrases, the correct rate of word sense disambiguation was 79.9%, and the correct rate of structure disambiguation was 85.7%, which was 8.6% and 3.9% higher than the original system respectively.","PeriodicalId":345703,"journal":{"name":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","volume":"97 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAIEE57420.2022.00087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the increasing number of Internet applications and frequent network interactions, the resources in the Internet show explosive growth. Under the impact of this wave, methods based on large-scale data, such as deep learning, have been put forward, and scholars have begun to think about many classical tasks from a new perspective. The LDA model is used to mine the topic information in the texts in parallel corpora, and the polynomial distribution of thesaurus is used to represent the topic, so as to judge the proportion of each document topic in the document collection. The specific words are obtained according to the polynomial distribution of the corresponding thesaurus of the topic by probability sampling. The monolingual corpus of the target language is processed by maximum likelihood estimation method, and the parallel corpus is taken as the training target. The monolingual corpus of the target language is estimated by importance sampling and full probability formula, and a machine English translation model is established. The estimated expected value is obtained by beam search method, so that English sentence translation can be realized. When disambiguating 2000 groups of random phrases, the correct rate of word sense disambiguation was 79.9%, and the correct rate of structure disambiguation was 85.7%, which was 8.6% and 3.9% higher than the original system respectively.
基于数据挖掘算法的机器翻译模型优化研究
随着互联网应用的增多和网络交互的频繁,互联网中的资源呈现爆发式增长。在这一浪潮的冲击下,基于大规模数据的方法如深度学习被提出,学者们开始从新的角度思考许多经典任务。利用LDA模型挖掘并行语料库文本中的主题信息,利用同义词库的多项式分布来表示主题,从而判断各个文档主题在文档集合中的比例。通过概率抽样,根据主题对应词库的多项式分布得到具体的词。采用最大似然估计方法对目标语言的单语语料库进行处理,并以并行语料库作为训练目标。通过重要性抽样和全概率公式对目标语言的单语语料库进行估计,建立了机器英语翻译模型。通过光束搜索法得到估计的期望值,从而实现英语句子的翻译。在对2000组随机短语消歧时,词义消歧正确率为79.9%,结构消歧正确率为85.7%,分别比原系统提高了8.6%和3.9%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信