{"title":"Separability versus Prototypicality in Handwritten Word Retrieval","authors":"J. V. Oosten, Lambert Schomaker","doi":"10.1109/ICFHR.2012.269","DOIUrl":null,"url":null,"abstract":"User appreciation of a word-image retrieval system is based on the quality of a hit list for a query. Using support vector machines for ranking in large scale, handwritten document collections, we observed that many hit lists suffered from bad instances in the top ranks. An analysis of this problem revealed that two functions needed to be optimised concerning both separability and prototypicality. By ranking images in two stages, the number of distracting images is reduced, making the method very convenient for massive scale, continuously trainable retrieval engines. Instead of cumbersome SVM training, we present a nearest-centroid method and show that precision improvements of up to 35 percentage points can be achieved, yielding up to 100% precision in data sets with a large amount of instances, while maintaining high recall performances.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"88 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
User appreciation of a word-image retrieval system is based on the quality of a hit list for a query. Using support vector machines for ranking in large scale, handwritten document collections, we observed that many hit lists suffered from bad instances in the top ranks. An analysis of this problem revealed that two functions needed to be optimised concerning both separability and prototypicality. By ranking images in two stages, the number of distracting images is reduced, making the method very convenient for massive scale, continuously trainable retrieval engines. Instead of cumbersome SVM training, we present a nearest-centroid method and show that precision improvements of up to 35 percentage points can be achieved, yielding up to 100% precision in data sets with a large amount of instances, while maintaining high recall performances.