{"title":"An Improved Approach of Dictionary Based Syntactic PR Using Trie","authors":"Samita Pradhan, A. Negi","doi":"10.1109/ICESC.2014.76","DOIUrl":null,"url":null,"abstract":"Dictionary based syntactic pattern recognition of strings attempts to extract a set of strings X+ from the dictionary H, by processing its noisy version string Y, without sequentially comparing Y with each element of X, the strings of H. H is the dictionary that contains a finite set of strings. The best estimate X+ from all X* in H, is defined as the set of string from X* that has least Levenshtein edit distance with the searched string Y. Existing techniques are there to search approximately from a dictionary. All strings compared with the searched string stored in dictionary the least distance string are the X+. Few techniques also there who use trie as data structure to store the words set of dictionary and uses some heuristic to prune some search space while finding the X+. Efficiency in search and retrieval depends upon the success in pruning out words from the computation while searching for an approximate match. In this paper, we store all the words of dictionary in a trie data structure. We propose heuristics that apply to every node of the trie. These heuristics help to prune the search current path at a node. Our method of pruning path while searching can save space in computation as compare to other method with correct approximation. We have tested our approach with different data sets with different noisy word and our method gave the correct X+, the approximate words set as result. The proposed approaches are compared with the existing approach. The first approach is giving 19.03% and second approach showing 29.35% eficiency compared to existing approach.","PeriodicalId":335267,"journal":{"name":"2014 International Conference on Electronic Systems, Signal Processing and Computing Technologies","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Electronic Systems, Signal Processing and Computing Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICESC.2014.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Dictionary based syntactic pattern recognition of strings attempts to extract a set of strings X+ from the dictionary H, by processing its noisy version string Y, without sequentially comparing Y with each element of X, the strings of H. H is the dictionary that contains a finite set of strings. The best estimate X+ from all X* in H, is defined as the set of string from X* that has least Levenshtein edit distance with the searched string Y. Existing techniques are there to search approximately from a dictionary. All strings compared with the searched string stored in dictionary the least distance string are the X+. Few techniques also there who use trie as data structure to store the words set of dictionary and uses some heuristic to prune some search space while finding the X+. Efficiency in search and retrieval depends upon the success in pruning out words from the computation while searching for an approximate match. In this paper, we store all the words of dictionary in a trie data structure. We propose heuristics that apply to every node of the trie. These heuristics help to prune the search current path at a node. Our method of pruning path while searching can save space in computation as compare to other method with correct approximation. We have tested our approach with different data sets with different noisy word and our method gave the correct X+, the approximate words set as result. The proposed approaches are compared with the existing approach. The first approach is giving 19.03% and second approach showing 29.35% eficiency compared to existing approach.