{"title":"大型DNA数据集中规则域的发现","authors":"F. Bertacchini, E. Bilotta, Pietro S. Pantano","doi":"10.1145/3107411.3110419","DOIUrl":null,"url":null,"abstract":"To analyze large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite one-dimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discovery of Regular Domains in Large DNA Data Sets\",\"authors\":\"F. Bertacchini, E. Bilotta, Pietro S. Pantano\",\"doi\":\"10.1145/3107411.3110419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To analyze large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite one-dimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.\",\"PeriodicalId\":246388,\"journal\":{\"name\":\"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3107411.3110419\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3110419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Discovery of Regular Domains in Large DNA Data Sets
To analyze large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite one-dimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.