Talha Soysal, Hande Adiguzel, A. Oktem, Alican Haman, E. Can, Pinar Duygulu, M. Kalpakli
{"title":"Processing the manuscripts of Atatürk","authors":"Talha Soysal, Hande Adiguzel, A. Oktem, Alican Haman, E. Can, Pinar Duygulu, M. Kalpakli","doi":"10.1109/SIU.2010.5652708","DOIUrl":null,"url":null,"abstract":"In this paper, as a first step to an easy and convenient way to access the manuscripts of Atatürk with a word based search engine, the preprocessing of digitalized documents and their line and word segmentation is studied. The techniques that are applied on printed documents may not yield satisfactory results. Due to this fact, more developed techniques are decided to be applied consisting of a technique based on Hough transform [1] for line segmentation and a technique that is based on dealing with skewness of lines for word segmentation. The results, which are acquired through studies that are conducted on the documents provided by Afet Đnan and consisting of 30 pages [2], prove to be highly accurate and promising for future researches.","PeriodicalId":152297,"journal":{"name":"2010 IEEE 18th Signal Processing and Communications Applications Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 18th Signal Processing and Communications Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2010.5652708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, as a first step to an easy and convenient way to access the manuscripts of Atatürk with a word based search engine, the preprocessing of digitalized documents and their line and word segmentation is studied. The techniques that are applied on printed documents may not yield satisfactory results. Due to this fact, more developed techniques are decided to be applied consisting of a technique based on Hough transform [1] for line segmentation and a technique that is based on dealing with skewness of lines for word segmentation. The results, which are acquired through studies that are conducted on the documents provided by Afet Đnan and consisting of 30 pages [2], prove to be highly accurate and promising for future researches.