{"title":"用深度神经网络识别手写源代码字符","authors":"Barış Kılıçlar, Metehan Makinaci","doi":"10.36287/setsci.4.6.111","DOIUrl":null,"url":null,"abstract":"In this paper we present an application of deep learning techniques to recognize handwritten source code characters. Although there are many works on the handwritten character recognition (HCR) problem, very few have been done about the offline handwritten source code character recognition. The problem includes the recognition of source code specific characters. We designed and implemented an application, performing preprocessing, histogram based segmentation and normalization on the scanned documents of exam papers which include codes that were written in C programming language. Constructed dataset includes 7093 source code character samples. We enriched this dataset with character samples from the CROHME database by transforming them to offline samples. With resulting 95 classes of 17748 samples, we trained and tested several models of convolutional neural networks (CNN). CNN is a deep learning architecture which is shown to produce state-of-the-art performance rates for handwritten character recognition tasks as well as for various other computer vision applications. Experimental evaluations gave performance rates between 95.43% and 97.49%. We conclude that CNN based classifiers are powerful tools for handwritten source code character recognition task.","PeriodicalId":6817,"journal":{"name":"4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recognition of Handwritten Source Code Characters With Deep Neural Networks\",\"authors\":\"Barış Kılıçlar, Metehan Makinaci\",\"doi\":\"10.36287/setsci.4.6.111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present an application of deep learning techniques to recognize handwritten source code characters. Although there are many works on the handwritten character recognition (HCR) problem, very few have been done about the offline handwritten source code character recognition. The problem includes the recognition of source code specific characters. We designed and implemented an application, performing preprocessing, histogram based segmentation and normalization on the scanned documents of exam papers which include codes that were written in C programming language. Constructed dataset includes 7093 source code character samples. We enriched this dataset with character samples from the CROHME database by transforming them to offline samples. With resulting 95 classes of 17748 samples, we trained and tested several models of convolutional neural networks (CNN). CNN is a deep learning architecture which is shown to produce state-of-the-art performance rates for handwritten character recognition tasks as well as for various other computer vision applications. Experimental evaluations gave performance rates between 95.43% and 97.49%. We conclude that CNN based classifiers are powerful tools for handwritten source code character recognition task.\",\"PeriodicalId\":6817,\"journal\":{\"name\":\"4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.36287/setsci.4.6.111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"4th International Symposium on Innovative Approaches in Engineering and Natural Sciences Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36287/setsci.4.6.111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recognition of Handwritten Source Code Characters With Deep Neural Networks
In this paper we present an application of deep learning techniques to recognize handwritten source code characters. Although there are many works on the handwritten character recognition (HCR) problem, very few have been done about the offline handwritten source code character recognition. The problem includes the recognition of source code specific characters. We designed and implemented an application, performing preprocessing, histogram based segmentation and normalization on the scanned documents of exam papers which include codes that were written in C programming language. Constructed dataset includes 7093 source code character samples. We enriched this dataset with character samples from the CROHME database by transforming them to offline samples. With resulting 95 classes of 17748 samples, we trained and tested several models of convolutional neural networks (CNN). CNN is a deep learning architecture which is shown to produce state-of-the-art performance rates for handwritten character recognition tasks as well as for various other computer vision applications. Experimental evaluations gave performance rates between 95.43% and 97.49%. We conclude that CNN based classifiers are powerful tools for handwritten source code character recognition task.