Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram
{"title":"微卫星探测软件的对比分析:结果的显著差异和参数的影响","authors":"Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram","doi":"10.1145/1722024.1722068","DOIUrl":null,"url":null,"abstract":"Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.","PeriodicalId":39379,"journal":{"name":"In Silico Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/1722024.1722068","citationCount":"8","resultStr":"{\"title\":\"Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters\",\"authors\":\"Suresh B. Mudunuri, A. A. Rao, S. Pallamsetty, H. Nagarajaram\",\"doi\":\"10.1145/1722024.1722068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.\",\"PeriodicalId\":39379,\"journal\":{\"name\":\"In Silico Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/1722024.1722068\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"In Silico Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1722024.1722068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"In Silico Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1722024.1722068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
Comparative analysis of microsatellite detecting software: a significant variation in results and influence of parameters
Microsatellites are a unique type of repeat patterns found in genome sequences of all known organisms including bacteria and viruses. These repeats play an important role in genome evolution, are associated with various diseases, have been used as molecular markers in DNA Fingerprinting, Population Genetics etc. Various bioinformatics tools have been developed for extraction of microsatellites from DNA sequences. However, not all tools can identify microsatellites with similar sensitivities and hence studies on microsatellites can suffer from significant biases in results and interpretations depending on the type of tool used. In order to get a clear idea on inherent limitations and biases with regard to extraction of microsatellites especially under the influence of varying threshold values of program parameters we carried out a comparative analysis of performance of some of the widely used tools using some test DNA sequences. We extracted imperfect microsatellites from three different sequences (E. coli bacterial genome, C. elegans Chromosome I and Drosophila Chromosome X) using the commonly used microsatellite extraction tools TRF, Sputnik, SciRoKoCo and IMEx with varying parameters and analyzed the results. We observed a significant variation in the number of microsatellites extracted by these tools even when used with default / suggested parameters. Relaxation of parameter values lead to an increase in the number of repeats detected but still the differences among the results persist. In TRF, Sputnik and SciRoKoCo it was observed that the number of mismatches increases with the increase in the tract length of the repeat indicating the level of imperfection is not uniform throughout the repeats. The four tools investigated in this study differ in their algorithms, in the parameters they use and hence in the number of microsatellites detected. The score based programs identify more number of divergent penta and hexa nucleotide repeats than IMEx. We therefore suggest that it is prudent to alter parameters appropriately to detect as many microsatellites as possible as a means not to miss any genuine repeat tracts or to use more than one tool as a means to get a good consensus. We also made a detailed survey of the available features of all microsatellite extraction tools. Apart from differences in their algorithm, efficiency and parameters, the tools also differ largely in terms of the features and flexibility.
In Silico BiologyComputer Science-Computational Theory and Mathematics
CiteScore
2.20
自引率
0.00%
发文量
1
期刊介绍:
The considerable "algorithmic complexity" of biological systems requires a huge amount of detailed information for their complete description. Although far from being complete, the overwhelming quantity of small pieces of information gathered for all kind of biological systems at the molecular and cellular level requires computational tools to be adequately stored and interpreted. Interpretation of data means to abstract them as much as allowed to provide a systematic, an integrative view of biology. Most of the presently available scientific journals focus either on accumulating more data from elaborate experimental approaches, or on presenting new algorithms for the interpretation of these data. Both approaches are meritorious.