大型DNA数据集中规则域的发现

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics Pub Date : 2017-08-20 DOI:10.1145/3107411.3110419

F. Bertacchini, E. Bilotta, Pietro S. Pantano

{"title":"大型DNA数据集中规则域的发现","authors":"F. Bertacchini, E. Bilotta, Pietro S. Pantano","doi":"10.1145/3107411.3110419","DOIUrl":null,"url":null,"abstract":"To analyze large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite one-dimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.","PeriodicalId":246388,"journal":{"name":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Discovery of Regular Domains in Large DNA Data Sets\",\"authors\":\"F. Bertacchini, E. Bilotta, Pietro S. Pantano\",\"doi\":\"10.1145/3107411.3110419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To analyze large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite one-dimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.\",\"PeriodicalId\":246388,\"journal\":{\"name\":\"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3107411.3110419\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3107411.3110419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为了分析大型DNA数据集，我们假设DNA内重复碱基的组织遵循类似于细胞自动机(CA)的规则。这些序列可以被定义为规则域。考虑到DNA链是一个有限的一维自动细胞，由有限的(可计数的)细胞组成，在空间上排成一条直线，并采用一种颜色编码来转换DNA碱基(a, C, T, G)的数量，我们用计算力学的方法分析DNA链。在这种方法中，规则域是一个时空区域，由使用相同规则语言(系统演化的特定规则，产生形式语言)的序列组成，该规则语言创建计算上同质且易于描述的模式。我们发现正则定义域是存在的。结果揭示了给定长度的字符串的确切数量，确定了它们的长度极限，它们在所有人类染色体中的精确定位以及它们复杂的数值组织。此外，这些域的分布不是随机的，也不是混沌的，也不是概率的，而是有数字吸引子围绕着这些域的数量分布。这使我们认为DNA中的所有这些区域都是相互连接的，不能随意分布，但它们遵循一些组合规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discovery of Regular Domains in Large DNA Data Sets

To analyze large DNA data sets, we hypothesized that the organization of repeated bases within DNA follows rules similar to Cellular Automata (CA). These sequences could be defined as regular domains. By considering DNA strings as a finite one-dimensional cell automated, consisting of a finite (numerable) set of cells spatially aligned on a straight line and adopting a color code that transforms the DNA bases (A, C, T, G) in numbers, we analyzed DNA strings in the approach of computational mechanics. In this approach, a regular domain is a space-time region consisting of sequences in the same regular language (the particular rule of system evolution, which gives rise to a formal language) that creates patterns computationally homogeneous and simple to describe. We discovered that regular domain exists. Results revealed the exact number of strings of given lengths, establishing their limit in length, their precise localizations in all the human chromosomes and their complex numerical organization. Furthermore, the distribution of these domains is not at random, nor chaotic neither probabilistic, but there are numeric attractors around which the number of these domains are distributed. This leads us to think that all these domains within the DNA are connected to each other and cannot be casually distributed, but they follow some combinatorics rules.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics

自引率

0.00%

发文量