{"title":"Long Sentence Segmentation Model based on Machine Translation","authors":"Hui Cui","doi":"10.1109/ICSCDE54196.2021.00054","DOIUrl":null,"url":null,"abstract":"To solve the problems of inaccurate results and high recovery rate of traditional translation algorithms, this paper proposes a long sentence segmentation model based on machine translation. The method consists of a segmentation model and a reordering model. Firstly, the regularization matching algorithm is applied to the segmentation of long sentences, and the number of words in the sentences can be reduced through the combination of sentence components. Then, the segmentation model is trained with the word alignment information generated by the traditional statistical machine translation model, and a large number of linguistic features are used to make rules to identify and correct segmentation errors. Finally, we test the performance of our method on special corpus. The experimental results show that, compared with the traditional translation algorithm, the accuracy rate of the proposed algorithm is 5.72% higher, and the average recovery rate is 7.19% lower, which effectively solves the problems of stiff translation and poor readability.","PeriodicalId":208108,"journal":{"name":"2021 International Conference of Social Computing and Digital Economy (ICSCDE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference of Social Computing and Digital Economy (ICSCDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSCDE54196.2021.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
To solve the problems of inaccurate results and high recovery rate of traditional translation algorithms, this paper proposes a long sentence segmentation model based on machine translation. The method consists of a segmentation model and a reordering model. Firstly, the regularization matching algorithm is applied to the segmentation of long sentences, and the number of words in the sentences can be reduced through the combination of sentence components. Then, the segmentation model is trained with the word alignment information generated by the traditional statistical machine translation model, and a large number of linguistic features are used to make rules to identify and correct segmentation errors. Finally, we test the performance of our method on special corpus. The experimental results show that, compared with the traditional translation algorithm, the accuracy rate of the proposed algorithm is 5.72% higher, and the average recovery rate is 7.19% lower, which effectively solves the problems of stiff translation and poor readability.