A hybrid approach for Bengali sentence validation

IF 10.7 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Juel Sikder, Prosenjit Chakraborty, Utpol Kanti Das, Krity Dhar
{"title":"A hybrid approach for Bengali sentence validation","authors":"Juel Sikder,&nbsp;Prosenjit Chakraborty,&nbsp;Utpol Kanti Das,&nbsp;Krity Dhar","doi":"10.1007/s10462-024-10795-2","DOIUrl":null,"url":null,"abstract":"<div><p>Bengali is the official language of Bangladesh and is widely used in Bangladesh and West Bengal in India. Due to the growing accessibility of the internet and smart devices, the use of digital text material and documents in Bengali is growing with time. An automated Bengali Sentence Validation System is proposed in this study to effectively determine the correctness of sentences in such extensively available Bengali content. As far as we know, no substantial work has been done in the field of Bengali Sentence Validation utilizing deep learning approaches. Due to the lack of linguistic resources, sophisticated Natural Language Processing tools, and benchmark datasets, developing an automated Sentence Validation System for a limited-resource language like Bengali is challenging. Additionally, Bengali Sentences come in two morphological varieties (Sadhu-bhasha and Cholito-bhasha), making the validation process more challenging. The proposed automated Bengali Sentence Validation system contains the CNN-BiLSTM hybrid classifier model. As of now, there is no standard dataset for Bengali sentence validation. Due to the lack of a standard dataset, we collected Bengali sentences from different sources in Bangladesh and developed a Bengali Sentence Validation (BSV) Dataset with around 5000 labelled sentences arranged into two categories such as correct and incorrect. Experimental results demonstrate that the proposed system outperformed other classifier models and existing approaches for Bengali Sentence Validation and is able to categorize a wide range of Bengali sentences based on their correctness. The system’s F1 score for the Bengali Sentence Validation is 98%. </p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 11","pages":""},"PeriodicalIF":10.7000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10795-2.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-024-10795-2","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Bengali is the official language of Bangladesh and is widely used in Bangladesh and West Bengal in India. Due to the growing accessibility of the internet and smart devices, the use of digital text material and documents in Bengali is growing with time. An automated Bengali Sentence Validation System is proposed in this study to effectively determine the correctness of sentences in such extensively available Bengali content. As far as we know, no substantial work has been done in the field of Bengali Sentence Validation utilizing deep learning approaches. Due to the lack of linguistic resources, sophisticated Natural Language Processing tools, and benchmark datasets, developing an automated Sentence Validation System for a limited-resource language like Bengali is challenging. Additionally, Bengali Sentences come in two morphological varieties (Sadhu-bhasha and Cholito-bhasha), making the validation process more challenging. The proposed automated Bengali Sentence Validation system contains the CNN-BiLSTM hybrid classifier model. As of now, there is no standard dataset for Bengali sentence validation. Due to the lack of a standard dataset, we collected Bengali sentences from different sources in Bangladesh and developed a Bengali Sentence Validation (BSV) Dataset with around 5000 labelled sentences arranged into two categories such as correct and incorrect. Experimental results demonstrate that the proposed system outperformed other classifier models and existing approaches for Bengali Sentence Validation and is able to categorize a wide range of Bengali sentences based on their correctness. The system’s F1 score for the Bengali Sentence Validation is 98%.

孟加拉语句子验证的混合方法
孟加拉语是孟加拉国的官方语言,在孟加拉国和印度西孟加拉邦广泛使用。由于互联网和智能设备的普及,孟加拉语数字文本材料和文档的使用与日俱增。本研究提出了一个自动孟加拉语句子验证系统,以有效确定这些广泛使用的孟加拉语内容中句子的正确性。据我们所知,在孟加拉语句子验证领域还没有利用深度学习方法进行的实质性工作。由于缺乏语言资源、复杂的自然语言处理工具和基准数据集,为孟加拉语这种资源有限的语言开发自动句子验证系统具有挑战性。此外,孟加拉语句子有两种形态(Sadhu-bhasha 和 Cholito-bhasha),这使得验证过程更具挑战性。拟议的孟加拉语句子自动验证系统包含 CNN-BiLSTM 混合分类器模型。到目前为止,还没有孟加拉语句子验证的标准数据集。由于缺乏标准数据集,我们从孟加拉国的不同来源收集了孟加拉语句子,并开发了孟加拉语句子验证(BSV)数据集,其中包含约 5000 个标签句子,分为正确和错误两类。实验结果表明,所提出的系统在孟加拉语句子验证方面的表现优于其他分类器模型和现有方法,能够根据句子的正确性对各种孟加拉语句子进行分类。该系统在孟加拉语句子验证方面的 F1 得分为 98%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Artificial Intelligence Review
Artificial Intelligence Review 工程技术-计算机:人工智能
CiteScore
22.00
自引率
3.30%
发文量
194
审稿时长
5.3 months
期刊介绍: Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信