{"title":"Microsatellite Finder algorithm with High Memory Efficiency for Even Super Long Sequences","authors":"Hossein Savari, Nazanin Hadiniya, Abdorreza Savadi, Mahmoud Naghibzadeh","doi":"10.1109/ICCKE50421.2020.9303640","DOIUrl":null,"url":null,"abstract":"An important issue in bioinformatics is to identify microsatellites that are a type of tandem repeats in genomic sequences. Changes in the number of repetitions of microsatellites can cause many diseases including Huntington’s and cancer. Therefore, identifying microsatellites in organisms’ genome in order to diagnose diseases and advise a treatment method is of the utmost importance. Many algorithms and tools have been developed to identify these sequences. Considering the importance of the application any improvement in accuracy, speed, and memory utilization can have a positive effect on the quality of human lives. The study proposes an algorithm that has a constant memory consumption that is independent of the input-size. Thus, its main memory consumption is significantly lower than the existing methods, while having a very high processing speed. In the end, this algorithm is implemented and the resulting tool which is named Memory Efficient Microsatellite Finder (MEMF) is compared to the state of the art tools available in terms of memory consumption and execution time. Its superiority is clear from the developed method and forms the comparison results.","PeriodicalId":402043,"journal":{"name":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE50421.2020.9303640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
An important issue in bioinformatics is to identify microsatellites that are a type of tandem repeats in genomic sequences. Changes in the number of repetitions of microsatellites can cause many diseases including Huntington’s and cancer. Therefore, identifying microsatellites in organisms’ genome in order to diagnose diseases and advise a treatment method is of the utmost importance. Many algorithms and tools have been developed to identify these sequences. Considering the importance of the application any improvement in accuracy, speed, and memory utilization can have a positive effect on the quality of human lives. The study proposes an algorithm that has a constant memory consumption that is independent of the input-size. Thus, its main memory consumption is significantly lower than the existing methods, while having a very high processing speed. In the end, this algorithm is implemented and the resulting tool which is named Memory Efficient Microsatellite Finder (MEMF) is compared to the state of the art tools available in terms of memory consumption and execution time. Its superiority is clear from the developed method and forms the comparison results.
微卫星是基因组序列中串联重复序列的一种类型,是生物信息学中的一个重要问题。微卫星重复次数的变化可引起许多疾病,包括亨廷顿舞蹈症和癌症。因此,鉴定生物体基因组中的微卫星以诊断疾病并提出治疗方法是至关重要的。已经开发了许多算法和工具来识别这些序列。考虑到应用程序的重要性,在准确性、速度和内存利用率方面的任何改进都可以对人类生活质量产生积极影响。该研究提出了一种算法,该算法具有独立于输入大小的恒定内存消耗。因此,它的主内存消耗明显低于现有方法,同时具有非常高的处理速度。最后,实现了该算法,并将其命名为Memory Efficient Microsatellite Finder (MEMF),并将其与现有的内存消耗和执行时间方面的最新工具进行了比较。从所开发的方法和形成的对比结果可以看出其优越性。