微卫星探测软件的对比分析:结果的显著差异和参数的影响

Q2 Medicine

In Silico Biology Pub Date : 2010-02-15 DOI:10.1145/1722024.1722068

Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram

{"title":"微卫星探测软件的对比分析:结果的显著差异和参数的影响","authors":"Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram","doi":"10.1145/1722024.1722068","DOIUrl":null,"url":null,"abstract":"Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":"1 1","pages":"38"},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722068","citationCount":"8","resultStr":"{\"title\":\"Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters\",\"authors\":\"Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram\",\"doi\":\"10.1145/1722024.1722068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.\",\"PeriodicalId\":39379,\"journal\":{\"name\":\"In Silico Biology\",\"volume\":\"1 1\",\"pages\":\"38\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/1722024.1722068\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"In Silico Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1722024.1722068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 8

摘要

微卫星是在包括细菌和病毒在内的所有已知生物体的基因组序列中发现的一种独特的重复模式。这些重复序列在基因组进化中起着重要作用，与多种疾病有关，已被用作DNA指纹图谱、群体遗传学等方面的分子标记。各种生物信息学工具已经开发出来用于从DNA序列中提取微卫星。然而，并非所有工具都能识别灵敏度相似的微型卫星，因此，根据所使用工具的类型，关于微型卫星的研究在结果和解释方面可能存在重大偏差。为了清楚地了解微卫星提取的固有局限性和偏差，特别是在程序参数阈值变化的影响下，我们使用一些测试DNA序列对一些广泛使用的工具的性能进行了比较分析。采用常用的微卫星提取工具TRF、Sputnik、SciRoKoCo和IMEx，分别从大肠杆菌基因组、秀丽隐杆线虫染色体I和果蝇染色体X三个不同序列中提取不完善的微卫星，并对提取结果进行分析。我们观察到，即使使用默认/建议参数，这些工具提取的微卫星数量也有显著变化。参数值的松弛导致检测到的重复次数增加，但结果之间的差异仍然存在。在TRF, Sputnik和SciRoKoCo中观察到，不匹配的数量随着重复的束长度的增加而增加，这表明不完善的水平在整个重复中并不均匀。本研究调查的四种工具在它们的算法、它们使用的参数以及因此检测到的微卫星数量方面有所不同。与IMEx相比，基于评分的程序识别出更多的五和六核苷酸重复序列。因此，我们建议谨慎地适当改变参数，以探测尽可能多的微卫星，以此作为不错过任何真正的重复区域的手段，或使用一个以上的工具作为获得良好共识的手段。我们还对所有微卫星提取工具的可用特性进行了详细的调查。除了算法、效率和参数方面的差异外，这些工具在功能和灵活性方面也存在很大差异。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters

Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

In Silico Biology Computer Science-Computational Theory and Mathematics

CiteScore

2.20

自引率

0.00%

发文量

期刊介绍： The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.