基于N-gram模型和拉普拉斯平滑的文本预测零概率问题处理

2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE) Pub Date : 2022-02-24 DOI:10.1109/icaeee54957.2022.9836419

Raonak Jahan Mimi, Md. Abdul Masud, Rifat Rahman, Nusrat Sultana Dina

{"title":"基于N-gram模型和拉普拉斯平滑的文本预测零概率问题处理","authors":"Raonak Jahan Mimi, Md. Abdul Masud, Rifat Rahman, Nusrat Sultana Dina","doi":"10.1109/icaeee54957.2022.9836419","DOIUrl":null,"url":null,"abstract":"In Natural Language Processing, text prediction represents the process of predicting the word with the highest probability through a predictive language model from a series of text corpus. The N-gram model is familiar and considered the handiest and most computationally cost-effective model for text processing. Additionally, higher N-gram models, especially the 5-gram ones, give the best text prediction. Interestingly, these better prediction results were obtained only on the training dataset. In contrast, the highest N-gram model imploded badly on the evaluation dataset. This paper proposes an approach where the N-gram model, especially the bi-gram model, and the fine tuning with Laplace Smoothing, provide the best prediction results at the evaluation stage.","PeriodicalId":383872,"journal":{"name":"2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text Prediction Zero Probability Problem Handling with N-gram Model and Laplace Smoothing\",\"authors\":\"Raonak Jahan Mimi, Md. Abdul Masud, Rifat Rahman, Nusrat Sultana Dina\",\"doi\":\"10.1109/icaeee54957.2022.9836419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Natural Language Processing, text prediction represents the process of predicting the word with the highest probability through a predictive language model from a series of text corpus. The N-gram model is familiar and considered the handiest and most computationally cost-effective model for text processing. Additionally, higher N-gram models, especially the 5-gram ones, give the best text prediction. Interestingly, these better prediction results were obtained only on the training dataset. In contrast, the highest N-gram model imploded badly on the evaluation dataset. This paper proposes an approach where the N-gram model, especially the bi-gram model, and the fine tuning with Laplace Smoothing, provide the best prediction results at the evaluation stage.\",\"PeriodicalId\":383872,\"journal\":{\"name\":\"2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icaeee54957.2022.9836419\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icaeee54957.2022.9836419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在自然语言处理中，文本预测是指通过预测语言模型从一系列文本语料库中预测出概率最高的单词的过程。N-gram模型很常见，被认为是文本处理中最方便、计算成本最高的模型。此外，更高的N-gram模型，特别是5-gram模型，给出了最好的文本预测。有趣的是，这些更好的预测结果只在训练数据集上得到。相比之下，最高N-gram模型在评估数据集上严重内爆。本文提出了一种方法，其中n图模型，特别是双图模型，以及拉普拉斯平滑的微调，在评估阶段提供了最好的预测结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text Prediction Zero Probability Problem Handling with N-gram Model and Laplace Smoothing

In Natural Language Processing, text prediction represents the process of predicting the word with the highest probability through a predictive language model from a series of text corpus. The N-gram model is familiar and considered the handiest and most computationally cost-effective model for text processing. Additionally, higher N-gram models, especially the 5-gram ones, give the best text prediction. Interestingly, these better prediction results were obtained only on the training dataset. In contrast, the highest N-gram model imploded badly on the evaluation dataset. This paper proposes an approach where the N-gram model, especially the bi-gram model, and the fine tuning with Laplace Smoothing, provide the best prediction results at the evaluation stage.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE)

自引率

0.00%

发文量