用GP创建正则表达式作为mRNA基序来预测人类外显子分裂

W. Langdon, Joanna Rowsell, A. Harrison
{"title":"用GP创建正则表达式作为mRNA基序来预测人类外显子分裂","authors":"W. Langdon, Joanna Rowsell, A. Harrison","doi":"10.1145/1569901.1570162","DOIUrl":null,"url":null,"abstract":"RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ allows the user to calculate correlations of gene expression, both between genes and between components within genes. We investigate all of Ensembl http://www.ensembl.org and find all the Homo Sapiens exons for which there are sufficient robust Affymetrix HG-U133 Plus 2 GeneChip probes. Calculating correlation between mRNA probe measurements for the same exon shows many exons whose components are consistently up regulated and down regulated. However we identify other Ensembl exons where sub-regions within them are self consistent but these transcript blocks are not well correlated with other blocks in the same exon. We suggest many current Ensembl exon definitions are incomplete. Secondly, having identified exon with substructure we use machine learning to try and identify patterns in the DNA sequence lying between blocks of high correlation which might yield biological or technological explanations. A Backus-Naur form (BNF) context-free grammar constrains strongly typed genetic programming (STGP) to evolve biological motifs in the form of regular expressions (RE) (e.g. TCTTT) which classify gene exons with potential alternative mRNA expression from those without. We show biological patterns can be data mined by a GP written in gawk and using egrep from NCBI's GEO http://www.ncbi.nlm.nih.gov/geo/ database. The automatically produced DNA motifs suggest that alternative polyadenylation is not responsible. (Full version in TR-09-02 [7].) Blocky exons can be found in http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz","PeriodicalId":193093,"journal":{"name":"Proceedings of the 11th Annual conference on Genetic and evolutionary computation","volume":"197 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Creating regular expressions as mRNA motifs with GP to predict human exon splitting\",\"authors\":\"W. Langdon, Joanna Rowsell, A. Harrison\",\"doi\":\"10.1145/1569901.1570162\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ allows the user to calculate correlations of gene expression, both between genes and between components within genes. We investigate all of Ensembl http://www.ensembl.org and find all the Homo Sapiens exons for which there are sufficient robust Affymetrix HG-U133 Plus 2 GeneChip probes. Calculating correlation between mRNA probe measurements for the same exon shows many exons whose components are consistently up regulated and down regulated. However we identify other Ensembl exons where sub-regions within them are self consistent but these transcript blocks are not well correlated with other blocks in the same exon. We suggest many current Ensembl exon definitions are incomplete. Secondly, having identified exon with substructure we use machine learning to try and identify patterns in the DNA sequence lying between blocks of high correlation which might yield biological or technological explanations. A Backus-Naur form (BNF) context-free grammar constrains strongly typed genetic programming (STGP) to evolve biological motifs in the form of regular expressions (RE) (e.g. TCTTT) which classify gene exons with potential alternative mRNA expression from those without. We show biological patterns can be data mined by a GP written in gawk and using egrep from NCBI's GEO http://www.ncbi.nlm.nih.gov/geo/ database. The automatically produced DNA motifs suggest that alternative polyadenylation is not responsible. (Full version in TR-09-02 [7].) Blocky exons can be found in http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz\",\"PeriodicalId\":193093,\"journal\":{\"name\":\"Proceedings of the 11th Annual conference on Genetic and evolutionary computation\",\"volume\":\"197 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th Annual conference on Genetic and evolutionary computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1569901.1570162\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th Annual conference on Genetic and evolutionary computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1569901.1570162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/允许用户计算基因之间以及基因内组件之间的基因表达相关性。我们研究了所有的Ensembl http://www.ensembl.org,并找到了所有具有足够健壮的Affymetrix gg - u133 Plus 2基因芯片探针的智人外显子。计算相同外显子的mRNA探针测量之间的相关性显示,许多外显子的成分一致上调和下调。然而,我们发现了其他的Ensembl外显子,其中的子区域是自一致的,但这些转录片段与同一外显子中的其他片段没有很好的相关性。我们认为许多当前的Ensembl外显子定义是不完整的。其次,在确定了外显子和亚结构之后,我们使用机器学习来尝试识别DNA序列中的模式,这些模式位于可能产生生物学或技术解释的高相关性块之间。Backus-Naur形式(BNF)上下文无关语法限制强类型遗传编程(STGP)以正则表达式(RE)的形式进化生物基序(例如TCTTT),将具有潜在替代mRNA表达的基因外显子与没有的基因外显子进行分类。我们展示了生物模式可以通过用gawk编写的GP和使用NCBI的GEO http://www.ncbi.nlm.nih.gov/geo/数据库中的egrep来挖掘数据。自动产生的DNA基序表明,选择性聚腺苷酸化不是原因。(完整版本见TR-09-02[7]。)块状外显子可以在http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz上找到
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Creating regular expressions as mRNA motifs with GP to predict human exon splitting
RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ allows the user to calculate correlations of gene expression, both between genes and between components within genes. We investigate all of Ensembl http://www.ensembl.org and find all the Homo Sapiens exons for which there are sufficient robust Affymetrix HG-U133 Plus 2 GeneChip probes. Calculating correlation between mRNA probe measurements for the same exon shows many exons whose components are consistently up regulated and down regulated. However we identify other Ensembl exons where sub-regions within them are self consistent but these transcript blocks are not well correlated with other blocks in the same exon. We suggest many current Ensembl exon definitions are incomplete. Secondly, having identified exon with substructure we use machine learning to try and identify patterns in the DNA sequence lying between blocks of high correlation which might yield biological or technological explanations. A Backus-Naur form (BNF) context-free grammar constrains strongly typed genetic programming (STGP) to evolve biological motifs in the form of regular expressions (RE) (e.g. TCTTT) which classify gene exons with potential alternative mRNA expression from those without. We show biological patterns can be data mined by a GP written in gawk and using egrep from NCBI's GEO http://www.ncbi.nlm.nih.gov/geo/ database. The automatically produced DNA motifs suggest that alternative polyadenylation is not responsible. (Full version in TR-09-02 [7].) Blocky exons can be found in http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信