Distance-based pattern matching of DNA sequences for evaluating primary mutation

B. Kindhi, M. A. Hendrawan, D. Purwitasari, T. A. Sardjono, M. Purnomo
{"title":"Distance-based pattern matching of DNA sequences for evaluating primary mutation","authors":"B. Kindhi, M. A. Hendrawan, D. Purwitasari, T. A. Sardjono, M. Purnomo","doi":"10.1109/ICITISEE.2017.8285518","DOIUrl":null,"url":null,"abstract":"String matching methods are often used to find out DNA pattern. However, basic string matching methods are unable to recognize the mutations case of viruses and bacteria. Distance-based Hamming method can accept character mismatches in an arrangement although it can give varied performance results depending on the number of compared patterns. We modify Hamming method to do pattern analysis of nucleotide arrangement in DNA that has primary Hepatitis C Virus (HCV) infection. We select HCV analysis because Indonesia showed the highest hepatitis case in Southeast Asia. Our experiments use DNA Hepatitis data from World Gen Bank and make comparisons to primary sequences from our partner institution. The problem we encountered while researching is the length of the HCV primary characters that are not always the same. This raises the hamming counting score to become unbalanced. The system we propose is to normalize the primary before being tested on isolate. The result of the normalization will be a constant and then summed with the hamming count. So the results of each hamming primary with each isolate can be balanced. The test results show that hamming method with modification able to give the distance between isolate and primary. The analysis of pattern matching results is similar to the condition of real primary. We purpose this modified hamming distance for analize virus or bacteria mutation, especially on HCV primary.","PeriodicalId":130873,"journal":{"name":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2017.8285518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

String matching methods are often used to find out DNA pattern. However, basic string matching methods are unable to recognize the mutations case of viruses and bacteria. Distance-based Hamming method can accept character mismatches in an arrangement although it can give varied performance results depending on the number of compared patterns. We modify Hamming method to do pattern analysis of nucleotide arrangement in DNA that has primary Hepatitis C Virus (HCV) infection. We select HCV analysis because Indonesia showed the highest hepatitis case in Southeast Asia. Our experiments use DNA Hepatitis data from World Gen Bank and make comparisons to primary sequences from our partner institution. The problem we encountered while researching is the length of the HCV primary characters that are not always the same. This raises the hamming counting score to become unbalanced. The system we propose is to normalize the primary before being tested on isolate. The result of the normalization will be a constant and then summed with the hamming count. So the results of each hamming primary with each isolate can be balanced. The test results show that hamming method with modification able to give the distance between isolate and primary. The analysis of pattern matching results is similar to the condition of real primary. We purpose this modified hamming distance for analize virus or bacteria mutation, especially on HCV primary.
基于距离的DNA序列模式匹配评价原发突变
字符串匹配法常用于寻找DNA模式。然而,基本的字符串匹配方法无法识别病毒和细菌的突变情况。基于距离的汉明方法可以接受排列中的字符不匹配,尽管它可以根据比较模式的数量给出不同的性能结果。我们修改了Hamming方法,对原发性丙型肝炎病毒(HCV)感染的DNA进行核苷酸排列模式分析。我们选择HCV分析是因为印度尼西亚是东南亚肝炎病例最多的国家。我们的实验使用来自世界遗传银行的DNA肝炎数据,并与我们合作机构的初级序列进行比较。我们在研究过程中遇到的问题是HCV主要角色的长度并不总是相同的。这会使汉明计数分数变得不平衡。我们提出的系统是在隔离测试之前对主系统进行归一化。归一化的结果将是一个常数,然后与汉明计数求和。因此,每种汉明原液和每种分离液的结果都是平衡的。试验结果表明,修正后的汉明法能准确地给出孤立点与原生点之间的距离。模式匹配结果的分析类似于真实初级的情况。我们将这种改进的汉明距离用于分析病毒或细菌的突变,特别是HCV的原代突变。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信