{"title":"Research on Optimization of Machine Translation Model Based on Data Mining Algorithm","authors":"Hong Liu","doi":"10.1109/ISAIEE57420.2022.00087","DOIUrl":null,"url":null,"abstract":"With the increasing number of Internet applications and frequent network interactions, the resources in the Internet show explosive growth. Under the impact of this wave, methods based on large-scale data, such as deep learning, have been put forward, and scholars have begun to think about many classical tasks from a new perspective. The LDA model is used to mine the topic information in the texts in parallel corpora, and the polynomial distribution of thesaurus is used to represent the topic, so as to judge the proportion of each document topic in the document collection. The specific words are obtained according to the polynomial distribution of the corresponding thesaurus of the topic by probability sampling. The monolingual corpus of the target language is processed by maximum likelihood estimation method, and the parallel corpus is taken as the training target. The monolingual corpus of the target language is estimated by importance sampling and full probability formula, and a machine English translation model is established. The estimated expected value is obtained by beam search method, so that English sentence translation can be realized. When disambiguating 2000 groups of random phrases, the correct rate of word sense disambiguation was 79.9%, and the correct rate of structure disambiguation was 85.7%, which was 8.6% and 3.9% higher than the original system respectively.","PeriodicalId":345703,"journal":{"name":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","volume":"97 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISAIEE57420.2022.00087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing number of Internet applications and frequent network interactions, the resources in the Internet show explosive growth. Under the impact of this wave, methods based on large-scale data, such as deep learning, have been put forward, and scholars have begun to think about many classical tasks from a new perspective. The LDA model is used to mine the topic information in the texts in parallel corpora, and the polynomial distribution of thesaurus is used to represent the topic, so as to judge the proportion of each document topic in the document collection. The specific words are obtained according to the polynomial distribution of the corresponding thesaurus of the topic by probability sampling. The monolingual corpus of the target language is processed by maximum likelihood estimation method, and the parallel corpus is taken as the training target. The monolingual corpus of the target language is estimated by importance sampling and full probability formula, and a machine English translation model is established. The estimated expected value is obtained by beam search method, so that English sentence translation can be realized. When disambiguating 2000 groups of random phrases, the correct rate of word sense disambiguation was 79.9%, and the correct rate of structure disambiguation was 85.7%, which was 8.6% and 3.9% higher than the original system respectively.