A. Sapkal, Chhavi, Shashank Sharma, Pradeep Kumar, Sachin Yadav
{"title":"Keyword spotting in historical document collections withoutsegmentation using the Siamese Network","authors":"A. Sapkal, Chhavi, Shashank Sharma, Pradeep Kumar, Sachin Yadav","doi":"10.1109/ICSES52305.2021.9633920","DOIUrl":null,"url":null,"abstract":"Keyword spotting is the method of estimating whether the text query occurs in the document or not. The query- by-example model is used in this paper to present an efficient segmentation-free keyword spotting approach that can be applied in historical document collections. For image de-noising and binarization, we use an autoencoder network in our approach. We are using a patch-based system to create patches for the binarized image, followed by a Siamese network. To determine the degree of similarity between two input word images, a Siamese network employs two identical convolutional networks. Once trained, the network can detect not only words from different writing styles and contexts, but also words that are not in the training set. The method proposed is evaluated on the Bengali Handwritten dataset.","PeriodicalId":6777,"journal":{"name":"2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)","volume":"11 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSES52305.2021.9633920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Keyword spotting is the method of estimating whether the text query occurs in the document or not. The query- by-example model is used in this paper to present an efficient segmentation-free keyword spotting approach that can be applied in historical document collections. For image de-noising and binarization, we use an autoencoder network in our approach. We are using a patch-based system to create patches for the binarized image, followed by a Siamese network. To determine the degree of similarity between two input word images, a Siamese network employs two identical convolutional networks. Once trained, the network can detect not only words from different writing styles and contexts, but also words that are not in the training set. The method proposed is evaluated on the Bengali Handwritten dataset.