{"title":"Problems and Approaches for Oriental Document Analysis","authors":"J. H. Kim","doi":"10.1109/ICDAR.1997.10004","DOIUrl":null,"url":null,"abstract":"Machine understanding of hand,filled documents in China, Japan and Korea requires not only general solutions of document analysis but also ability to handle peculiarities of the Oriental languages. As expected, handwritten Chinese character recognition is the major task for it. In addition, Japanese Kana, Korean Hangul, Roman alphabet as well as numerals are targets of recognition. The main difficulties of Oriental character recognition originate from their large character sets. A practical system should be able to handle at least 5000 classes from possibly 50000 over classes. For Hangul, 11720 classes can be made in theory. The difficulty closely depends on writing styles. Oriental script is generally classified into regular, fluent, cursive style. Needless to say, cursive style is deformed most seriously and, therefore, most difficult to recognize. Regular style writing is often attacked successfully by feature matching and statistical analysis, while fluent style is now actively under investigation by stroke analyses and structural matching. Cursive style recognition is seldom found even in research papers. Since Chinese and Hangul characters are intrinsically hierarchical, often hierarchical analysis has been applied. A Hangul character, which corresponds a syllable, is formed by 2 to 5 basic graphemes, drawn from 24 classes, deploying them in two dimensional way. Recognizing component graphemes is, we believe, the viable approach to handle the large set of Hangul classes. Therefore, segmentation into graphemes, which is another difficult task, is the key for hierarchical recognition For robust recognition of fluent to cursive style, the following research directions are suggested.","PeriodicalId":294655,"journal":{"name":"IEEE International Conference on Document Analysis and Recognition","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1997.10004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine understanding of hand,filled documents in China, Japan and Korea requires not only general solutions of document analysis but also ability to handle peculiarities of the Oriental languages. As expected, handwritten Chinese character recognition is the major task for it. In addition, Japanese Kana, Korean Hangul, Roman alphabet as well as numerals are targets of recognition. The main difficulties of Oriental character recognition originate from their large character sets. A practical system should be able to handle at least 5000 classes from possibly 50000 over classes. For Hangul, 11720 classes can be made in theory. The difficulty closely depends on writing styles. Oriental script is generally classified into regular, fluent, cursive style. Needless to say, cursive style is deformed most seriously and, therefore, most difficult to recognize. Regular style writing is often attacked successfully by feature matching and statistical analysis, while fluent style is now actively under investigation by stroke analyses and structural matching. Cursive style recognition is seldom found even in research papers. Since Chinese and Hangul characters are intrinsically hierarchical, often hierarchical analysis has been applied. A Hangul character, which corresponds a syllable, is formed by 2 to 5 basic graphemes, drawn from 24 classes, deploying them in two dimensional way. Recognizing component graphemes is, we believe, the viable approach to handle the large set of Hangul classes. Therefore, segmentation into graphemes, which is another difficult task, is the key for hierarchical recognition For robust recognition of fluent to cursive style, the following research directions are suggested.