{"title":"基于bert的文本简化方法降低孟加拉语的语言复杂性","authors":"Nahid Hossain, Adil Ahnaf","doi":"10.1109/ITSS-IoE53029.2021.9615303","DOIUrl":null,"url":null,"abstract":"The text simplification approach simplifies the linguistic complexity of a particular language so that the grammar and structure of a language are greatly simplified to read and understand while preserving the information and underlying meaning. Despite being spoken globally and having a rich history of Bangla literature, there is no work has been done in the Bangla language on this important topic. The work has been done to increase the number of Bangla literature readers and save Bangla historical writings from becoming extinct. We have also collected and used an extensive corpus consisting of 1,52,230 sentences along with a lexicon consisting of 22,580 complex-simple unique word pairs, which are mapped manually. This paper has presented two text simplification models based on Long Short Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT). However, the proposed model based on BERT shows a satisfactory accuracy rate of 95.3%.","PeriodicalId":230566,"journal":{"name":"2021 International Conference on Intelligent Technology, System and Service for Internet of Everything (ITSS-IoE)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"BERT-based Text Simplification Approach to Reduce Linguistic Complexity of Bangla Language\",\"authors\":\"Nahid Hossain, Adil Ahnaf\",\"doi\":\"10.1109/ITSS-IoE53029.2021.9615303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The text simplification approach simplifies the linguistic complexity of a particular language so that the grammar and structure of a language are greatly simplified to read and understand while preserving the information and underlying meaning. Despite being spoken globally and having a rich history of Bangla literature, there is no work has been done in the Bangla language on this important topic. The work has been done to increase the number of Bangla literature readers and save Bangla historical writings from becoming extinct. We have also collected and used an extensive corpus consisting of 1,52,230 sentences along with a lexicon consisting of 22,580 complex-simple unique word pairs, which are mapped manually. This paper has presented two text simplification models based on Long Short Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT). However, the proposed model based on BERT shows a satisfactory accuracy rate of 95.3%.\",\"PeriodicalId\":230566,\"journal\":{\"name\":\"2021 International Conference on Intelligent Technology, System and Service for Internet of Everything (ITSS-IoE)\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Intelligent Technology, System and Service for Internet of Everything (ITSS-IoE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITSS-IoE53029.2021.9615303\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technology, System and Service for Internet of Everything (ITSS-IoE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSS-IoE53029.2021.9615303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
BERT-based Text Simplification Approach to Reduce Linguistic Complexity of Bangla Language
The text simplification approach simplifies the linguistic complexity of a particular language so that the grammar and structure of a language are greatly simplified to read and understand while preserving the information and underlying meaning. Despite being spoken globally and having a rich history of Bangla literature, there is no work has been done in the Bangla language on this important topic. The work has been done to increase the number of Bangla literature readers and save Bangla historical writings from becoming extinct. We have also collected and used an extensive corpus consisting of 1,52,230 sentences along with a lexicon consisting of 22,580 complex-simple unique word pairs, which are mapped manually. This paper has presented two text simplification models based on Long Short Term Memory (LSTM) and Bidirectional Encoder Representations from Transformers (BERT). However, the proposed model based on BERT shows a satisfactory accuracy rate of 95.3%.