Anastasia Rita Widiarti, Marsono, A. Harjoko, S. Hartati
{"title":"基于统计与结构相结合的爪哇文字文本分割方法","authors":"Anastasia Rita Widiarti, Marsono, A. Harjoko, S. Hartati","doi":"10.1109/DigitalHeritage.2013.6743844","DOIUrl":null,"url":null,"abstract":"The character segmentation of handwritten manuscripts often presents complicated tasks. There are many factors that cause such segmentation difficult, such as inconsistencies in the slope, slant, length and width of each character, as well as intersections of two characters from either the same or different lines. This paper proposes a new approach that combines statistical and structural analyses to generate the Javanese scripts from line segmentation of Javanese manuscript image. Every time a new manuscript is discovered, all objects that make up the characters in the manuscript are identified using interconnecting operation to identify the components of the script. Each object that is interconnected is given the same label. The next task is to calculate the average height and average width of each object that has been given the same label and its standard deviation. This information is used to guide the average normality of a script, i.e. when a character has a height or width that exceeds the average value plus the standard deviation, it can be concluded that the character in question in fact consists of two characters that touch each other. In regard to normalizing a skewed cluster of scripts, the task is to straighten the script in such a way that it becomes perpendicular. The experiment was done using 13 line images from different authors with different writing styles, and the result shows an 88.19% segmentation accuracy. It can be concluded that the proposed approach to segmentation method is relatively a success when applied on the Javanese handwritten characters.","PeriodicalId":52934,"journal":{"name":"Studies in Digital Heritage","volume":"32 1","pages":"775"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Combination of statistic and structural approach to scripts segmentation from line segmentation of Javanese manuscript image\",\"authors\":\"Anastasia Rita Widiarti, Marsono, A. Harjoko, S. Hartati\",\"doi\":\"10.1109/DigitalHeritage.2013.6743844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The character segmentation of handwritten manuscripts often presents complicated tasks. There are many factors that cause such segmentation difficult, such as inconsistencies in the slope, slant, length and width of each character, as well as intersections of two characters from either the same or different lines. This paper proposes a new approach that combines statistical and structural analyses to generate the Javanese scripts from line segmentation of Javanese manuscript image. Every time a new manuscript is discovered, all objects that make up the characters in the manuscript are identified using interconnecting operation to identify the components of the script. Each object that is interconnected is given the same label. The next task is to calculate the average height and average width of each object that has been given the same label and its standard deviation. This information is used to guide the average normality of a script, i.e. when a character has a height or width that exceeds the average value plus the standard deviation, it can be concluded that the character in question in fact consists of two characters that touch each other. In regard to normalizing a skewed cluster of scripts, the task is to straighten the script in such a way that it becomes perpendicular. The experiment was done using 13 line images from different authors with different writing styles, and the result shows an 88.19% segmentation accuracy. It can be concluded that the proposed approach to segmentation method is relatively a success when applied on the Javanese handwritten characters.\",\"PeriodicalId\":52934,\"journal\":{\"name\":\"Studies in Digital Heritage\",\"volume\":\"32 1\",\"pages\":\"775\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in Digital Heritage\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DigitalHeritage.2013.6743844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Digital Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DigitalHeritage.2013.6743844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Arts and Humanities","Score":null,"Total":0}
Combination of statistic and structural approach to scripts segmentation from line segmentation of Javanese manuscript image
The character segmentation of handwritten manuscripts often presents complicated tasks. There are many factors that cause such segmentation difficult, such as inconsistencies in the slope, slant, length and width of each character, as well as intersections of two characters from either the same or different lines. This paper proposes a new approach that combines statistical and structural analyses to generate the Javanese scripts from line segmentation of Javanese manuscript image. Every time a new manuscript is discovered, all objects that make up the characters in the manuscript are identified using interconnecting operation to identify the components of the script. Each object that is interconnected is given the same label. The next task is to calculate the average height and average width of each object that has been given the same label and its standard deviation. This information is used to guide the average normality of a script, i.e. when a character has a height or width that exceeds the average value plus the standard deviation, it can be concluded that the character in question in fact consists of two characters that touch each other. In regard to normalizing a skewed cluster of scripts, the task is to straighten the script in such a way that it becomes perpendicular. The experiment was done using 13 line images from different authors with different writing styles, and the result shows an 88.19% segmentation accuracy. It can be concluded that the proposed approach to segmentation method is relatively a success when applied on the Javanese handwritten characters.