Prince Mahmud, Md. Sohel Rana, Kamrul Hasan Talukder
{"title":"An Efficient Hybrid Exact String Matching Algorithm to Minimize the Number of Attempts and Character Comparisons","authors":"Prince Mahmud, Md. Sohel Rana, Kamrul Hasan Talukder","doi":"10.1109/ICCITECHN.2018.8631908","DOIUrl":null,"url":null,"abstract":"String matching fundamentally is a classical problem of finding occurrence(s) of a pattern string within another string or body of text. String matching problems can be traced into intrusion detection in network, detecting plagiarism, information security, pattern recognition, document matching, text mining, speech analysis, application in bioinformatics and other diversified fields. Two important factors of string matching which are also the challenges of this paper are “number of attempts” and “number of character comparisons”. With these challenges of string matching, we have proposed a hybrid algorithm which is named as MAC (Minimum number of Attempts and Character Comparisons) algorithm. We have integrated the concepts of Berry-Ravindran (BR) algorithm and index based shifting approach with our new search technique to build our MAC algorithm. We have evaluated the MAC algorithm to analyze the performance for English text alongside biological data (DNA sequence and Protein sequence). The performance of MAC algorithm has turned out to be better than Maximum-Shift (MS) algorithm and Index Based Shifting (IBS) algorithm. The performance of the MAC algorithm is proficient for exact string matching for both small and large size of pattern length comparing with some existing algorithm to solve the string matching problem.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
String matching fundamentally is a classical problem of finding occurrence(s) of a pattern string within another string or body of text. String matching problems can be traced into intrusion detection in network, detecting plagiarism, information security, pattern recognition, document matching, text mining, speech analysis, application in bioinformatics and other diversified fields. Two important factors of string matching which are also the challenges of this paper are “number of attempts” and “number of character comparisons”. With these challenges of string matching, we have proposed a hybrid algorithm which is named as MAC (Minimum number of Attempts and Character Comparisons) algorithm. We have integrated the concepts of Berry-Ravindran (BR) algorithm and index based shifting approach with our new search technique to build our MAC algorithm. We have evaluated the MAC algorithm to analyze the performance for English text alongside biological data (DNA sequence and Protein sequence). The performance of MAC algorithm has turned out to be better than Maximum-Shift (MS) algorithm and Index Based Shifting (IBS) algorithm. The performance of the MAC algorithm is proficient for exact string matching for both small and large size of pattern length comparing with some existing algorithm to solve the string matching problem.
字符串匹配基本上是在另一个字符串或文本中查找模式字符串出现的经典问题。字符串匹配问题可以追溯到网络入侵检测、剽窃检测、信息安全、模式识别、文档匹配、文本挖掘、语音分析、生物信息学应用等多元化领域。字符串匹配的两个重要因素是“尝试次数”和“字符比较次数”,这也是本文面临的挑战。针对字符串匹配的这些挑战,我们提出了一种混合算法,命名为MAC (Minimum number of Attempts and Character comparison)算法。我们将Berry-Ravindran (BR)算法和基于索引的移动方法的概念与我们的新搜索技术相结合,构建了我们的MAC算法。我们已经评估了MAC算法来分析英语文本和生物数据(DNA序列和蛋白质序列)的性能。结果表明,MAC算法的性能优于最大位移(MS)算法和基于索引的位移(IBS)算法。与现有算法相比,该算法在解决字符串匹配问题时,无论在模式长度大小的情况下,都能熟练地进行精确的字符串匹配。