{"title":"基于双向LSTM的Dzongkha下一词预测","authors":"Karma Wangchuk, Tandin Wangchuk, Tenzin Namgyel","doi":"10.17102/bjrd.rub.se2.038","DOIUrl":null,"url":null,"abstract":"Dzongkha Development Commission of Bhutan (DDC) is trying to computerize Dzongkha. However, the computerization of Dzongkha poses numerous challenges. Currently, the support for Dzongkha in modern technology is limited to printing, typing, and storage. Typewriting a single Dzongkha word requires several keypresses. As a result, typing Dzongkha is tedious. In this paper, the Dzongkha word label prediction was studied. The purpose of the study was to further reduce keystrokes and make Dzongkha typing much faster. The dataset encompasses different genres curated by DDC. The dataset consisted of 10000 sentences and 4820 unique words. Next, 52150 sequences were generated using N-gram methods followed by vectorizing text using embedding techniques. Different RNN-based models were evaluated for the next Dzongkha words prediction. Two Bi-LSTM layers with 512 hidden layer neurons gave the best accuracy of 73.89% with a loss of 1.0722.","PeriodicalId":244206,"journal":{"name":"Bhutan Journal of Research and Development","volume":"11 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dzongkha Next Words Prediction Using Bidirectional LSTM\",\"authors\":\"Karma Wangchuk, Tandin Wangchuk, Tenzin Namgyel\",\"doi\":\"10.17102/bjrd.rub.se2.038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dzongkha Development Commission of Bhutan (DDC) is trying to computerize Dzongkha. However, the computerization of Dzongkha poses numerous challenges. Currently, the support for Dzongkha in modern technology is limited to printing, typing, and storage. Typewriting a single Dzongkha word requires several keypresses. As a result, typing Dzongkha is tedious. In this paper, the Dzongkha word label prediction was studied. The purpose of the study was to further reduce keystrokes and make Dzongkha typing much faster. The dataset encompasses different genres curated by DDC. The dataset consisted of 10000 sentences and 4820 unique words. Next, 52150 sequences were generated using N-gram methods followed by vectorizing text using embedding techniques. Different RNN-based models were evaluated for the next Dzongkha words prediction. Two Bi-LSTM layers with 512 hidden layer neurons gave the best accuracy of 73.89% with a loss of 1.0722.\",\"PeriodicalId\":244206,\"journal\":{\"name\":\"Bhutan Journal of Research and Development\",\"volume\":\"11 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bhutan Journal of Research and Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17102/bjrd.rub.se2.038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bhutan Journal of Research and Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17102/bjrd.rub.se2.038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dzongkha Next Words Prediction Using Bidirectional LSTM
Dzongkha Development Commission of Bhutan (DDC) is trying to computerize Dzongkha. However, the computerization of Dzongkha poses numerous challenges. Currently, the support for Dzongkha in modern technology is limited to printing, typing, and storage. Typewriting a single Dzongkha word requires several keypresses. As a result, typing Dzongkha is tedious. In this paper, the Dzongkha word label prediction was studied. The purpose of the study was to further reduce keystrokes and make Dzongkha typing much faster. The dataset encompasses different genres curated by DDC. The dataset consisted of 10000 sentences and 4820 unique words. Next, 52150 sequences were generated using N-gram methods followed by vectorizing text using embedding techniques. Different RNN-based models were evaluated for the next Dzongkha words prediction. Two Bi-LSTM layers with 512 hidden layer neurons gave the best accuracy of 73.89% with a loss of 1.0722.