{"title":"低资源语言中复杂实体识别的新型集合模型","authors":"Preeti Vats, Nonita Sharma, Deepak Kumar Sharma","doi":"10.4108/eetsis.4434","DOIUrl":null,"url":null,"abstract":"The fundamental method for pre-processing speech or text data that enables computers to comprehend human language is known as natural language processing. Numerous models have been developed to date to pre-process data in the English language; however, the Hindi language does not support these models. India's national tongue is Hindi. In order to help the locals, the authors of this study used supervised learning methods like Linear Regression, SVM, and Naive Bayes algorithm to investigate a dataset of complicated terms in the Hindi language. Additionally, a sophisticated Hindi word classification model is suggested employing several methods based on the forecasts as well as collective learning strategies like Random Forest, Adaboost, and Decision Tree. Depending on how well the user's language is understood, the suggested model will assist in simplifying Hindi text. Authors attempt to classify the uncharted dataset using deep learning algorithms like Bi-LSTM and GRU approaches in further processing.","PeriodicalId":502678,"journal":{"name":"ICST Transactions on Scalable Information Systems","volume":"51 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Ensemble Model for Complex Entities Identification in Low Resource Language\",\"authors\":\"Preeti Vats, Nonita Sharma, Deepak Kumar Sharma\",\"doi\":\"10.4108/eetsis.4434\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fundamental method for pre-processing speech or text data that enables computers to comprehend human language is known as natural language processing. Numerous models have been developed to date to pre-process data in the English language; however, the Hindi language does not support these models. India's national tongue is Hindi. In order to help the locals, the authors of this study used supervised learning methods like Linear Regression, SVM, and Naive Bayes algorithm to investigate a dataset of complicated terms in the Hindi language. Additionally, a sophisticated Hindi word classification model is suggested employing several methods based on the forecasts as well as collective learning strategies like Random Forest, Adaboost, and Decision Tree. Depending on how well the user's language is understood, the suggested model will assist in simplifying Hindi text. Authors attempt to classify the uncharted dataset using deep learning algorithms like Bi-LSTM and GRU approaches in further processing.\",\"PeriodicalId\":502678,\"journal\":{\"name\":\"ICST Transactions on Scalable Information Systems\",\"volume\":\"51 4\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICST Transactions on Scalable Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/eetsis.4434\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICST Transactions on Scalable Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/eetsis.4434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Ensemble Model for Complex Entities Identification in Low Resource Language
The fundamental method for pre-processing speech or text data that enables computers to comprehend human language is known as natural language processing. Numerous models have been developed to date to pre-process data in the English language; however, the Hindi language does not support these models. India's national tongue is Hindi. In order to help the locals, the authors of this study used supervised learning methods like Linear Regression, SVM, and Naive Bayes algorithm to investigate a dataset of complicated terms in the Hindi language. Additionally, a sophisticated Hindi word classification model is suggested employing several methods based on the forecasts as well as collective learning strategies like Random Forest, Adaboost, and Decision Tree. Depending on how well the user's language is understood, the suggested model will assist in simplifying Hindi text. Authors attempt to classify the uncharted dataset using deep learning algorithms like Bi-LSTM and GRU approaches in further processing.