Ajay Mittur, Aravindh R Shankar, Adithya Narasimhan
{"title":"印度文字多语种分类的一次性方法","authors":"Ajay Mittur, Aravindh R Shankar, Adithya Narasimhan","doi":"10.1109/ICITIIT54346.2022.9744238","DOIUrl":null,"url":null,"abstract":"The use of multiple languages with different scripts is a common theme in India. There is an emerging need to digitise documents that may be handwritten or available solely as images. This necessitates a system for multilingual classification of different Indic scripts and the subsequent character recognition into digitised standards such as Unicode. However, a learning system for various languages with multiple character combinations can be computationally expensive and prove arduous with a dearth of available data. In this paper, the one-shot learning approach to the optical character recognition of different languages is explored, where there is a need to accurately classify the character given only one example of every additional class introduced. Siamese neural networks are used for learning and to tune a network to work with entirely new, unseen data. Compelling results are attained in the classification of characters in nine different Indian languages using this approach with an accuracy ranging from 77.72 to 91.83 across the Indic languages in the best case.","PeriodicalId":184353,"journal":{"name":"2022 International Conference on Innovative Trends in Information Technology (ICITIIT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"One-Shot Approach for Multilingual Classification of Indic Scripts\",\"authors\":\"Ajay Mittur, Aravindh R Shankar, Adithya Narasimhan\",\"doi\":\"10.1109/ICITIIT54346.2022.9744238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of multiple languages with different scripts is a common theme in India. There is an emerging need to digitise documents that may be handwritten or available solely as images. This necessitates a system for multilingual classification of different Indic scripts and the subsequent character recognition into digitised standards such as Unicode. However, a learning system for various languages with multiple character combinations can be computationally expensive and prove arduous with a dearth of available data. In this paper, the one-shot learning approach to the optical character recognition of different languages is explored, where there is a need to accurately classify the character given only one example of every additional class introduced. Siamese neural networks are used for learning and to tune a network to work with entirely new, unseen data. Compelling results are attained in the classification of characters in nine different Indian languages using this approach with an accuracy ranging from 77.72 to 91.83 across the Indic languages in the best case.\",\"PeriodicalId\":184353,\"journal\":{\"name\":\"2022 International Conference on Innovative Trends in Information Technology (ICITIIT)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Innovative Trends in Information Technology (ICITIIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITIIT54346.2022.9744238\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Innovative Trends in Information Technology (ICITIIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITIIT54346.2022.9744238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
One-Shot Approach for Multilingual Classification of Indic Scripts
The use of multiple languages with different scripts is a common theme in India. There is an emerging need to digitise documents that may be handwritten or available solely as images. This necessitates a system for multilingual classification of different Indic scripts and the subsequent character recognition into digitised standards such as Unicode. However, a learning system for various languages with multiple character combinations can be computationally expensive and prove arduous with a dearth of available data. In this paper, the one-shot learning approach to the optical character recognition of different languages is explored, where there is a need to accurately classify the character given only one example of every additional class introduced. Siamese neural networks are used for learning and to tune a network to work with entirely new, unseen data. Compelling results are attained in the classification of characters in nine different Indian languages using this approach with an accuracy ranging from 77.72 to 91.83 across the Indic languages in the best case.