H. Aktulga, Ioannis Kontoyiannis, L. A. Lyznik, Lukasz Szpankowski, A. Grama, W. Szpankowski
{"title":"生物序列的统计依赖性","authors":"H. Aktulga, Ioannis Kontoyiannis, L. A. Lyznik, Lukasz Szpankowski, A. Grama, W. Szpankowski","doi":"10.1109/ISIT.2007.4557183","DOIUrl":null,"url":null,"abstract":"We demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural ependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from CODIS, we demonstrate that our approach is well suited for the problem of discovering short tandem repeats (STRs).","PeriodicalId":193467,"journal":{"name":"2007 IEEE International Symposium on Information Theory","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Statistical Dependence in Biological Sequences\",\"authors\":\"H. Aktulga, Ioannis Kontoyiannis, L. A. Lyznik, Lukasz Szpankowski, A. Grama, W. Szpankowski\",\"doi\":\"10.1109/ISIT.2007.4557183\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural ependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from CODIS, we demonstrate that our approach is well suited for the problem of discovering short tandem repeats (STRs).\",\"PeriodicalId\":193467,\"journal\":{\"name\":\"2007 IEEE International Symposium on Information Theory\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE International Symposium on Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2007.4557183\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE International Symposium on Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2007.4557183","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural ependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from CODIS, we demonstrate that our approach is well suited for the problem of discovering short tandem repeats (STRs).