An Efficient Hybrid Exact String Matching Algorithm to Minimize the Number of Attempts and Character Comparisons

Prince Mahmud, Md. Sohel Rana, Kamrul Hasan Talukder
{"title":"An Efficient Hybrid Exact String Matching Algorithm to Minimize the Number of Attempts and Character Comparisons","authors":"Prince Mahmud, Md. Sohel Rana, Kamrul Hasan Talukder","doi":"10.1109/ICCITECHN.2018.8631908","DOIUrl":null,"url":null,"abstract":"String matching fundamentally is a classical problem of finding occurrence(s) of a pattern string within another string or body of text. String matching problems can be traced into intrusion detection in network, detecting plagiarism, information security, pattern recognition, document matching, text mining, speech analysis, application in bioinformatics and other diversified fields. Two important factors of string matching which are also the challenges of this paper are “number of attempts” and “number of character comparisons”. With these challenges of string matching, we have proposed a hybrid algorithm which is named as MAC (Minimum number of Attempts and Character Comparisons) algorithm. We have integrated the concepts of Berry-Ravindran (BR) algorithm and index based shifting approach with our new search technique to build our MAC algorithm. We have evaluated the MAC algorithm to analyze the performance for English text alongside biological data (DNA sequence and Protein sequence). The performance of MAC algorithm has turned out to be better than Maximum-Shift (MS) algorithm and Index Based Shifting (IBS) algorithm. The performance of the MAC algorithm is proficient for exact string matching for both small and large size of pattern length comparing with some existing algorithm to solve the string matching problem.","PeriodicalId":355984,"journal":{"name":"2018 21st International Conference of Computer and Information Technology (ICCIT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference of Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCITECHN.2018.8631908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

String matching fundamentally is a classical problem of finding occurrence(s) of a pattern string within another string or body of text. String matching problems can be traced into intrusion detection in network, detecting plagiarism, information security, pattern recognition, document matching, text mining, speech analysis, application in bioinformatics and other diversified fields. Two important factors of string matching which are also the challenges of this paper are “number of attempts” and “number of character comparisons”. With these challenges of string matching, we have proposed a hybrid algorithm which is named as MAC (Minimum number of Attempts and Character Comparisons) algorithm. We have integrated the concepts of Berry-Ravindran (BR) algorithm and index based shifting approach with our new search technique to build our MAC algorithm. We have evaluated the MAC algorithm to analyze the performance for English text alongside biological data (DNA sequence and Protein sequence). The performance of MAC algorithm has turned out to be better than Maximum-Shift (MS) algorithm and Index Based Shifting (IBS) algorithm. The performance of the MAC algorithm is proficient for exact string matching for both small and large size of pattern length comparing with some existing algorithm to solve the string matching problem.
一种有效的混合精确字符串匹配算法,以减少尝试和字符比较的次数
字符串匹配基本上是在另一个字符串或文本中查找模式字符串出现的经典问题。字符串匹配问题可以追溯到网络入侵检测、剽窃检测、信息安全、模式识别、文档匹配、文本挖掘、语音分析、生物信息学应用等多元化领域。字符串匹配的两个重要因素是“尝试次数”和“字符比较次数”,这也是本文面临的挑战。针对字符串匹配的这些挑战,我们提出了一种混合算法,命名为MAC (Minimum number of Attempts and Character comparison)算法。我们将Berry-Ravindran (BR)算法和基于索引的移动方法的概念与我们的新搜索技术相结合,构建了我们的MAC算法。我们已经评估了MAC算法来分析英语文本和生物数据(DNA序列和蛋白质序列)的性能。结果表明,MAC算法的性能优于最大位移(MS)算法和基于索引的位移(IBS)算法。与现有算法相比,该算法在解决字符串匹配问题时,无论在模式长度大小的情况下,都能熟练地进行精确的字符串匹配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信