利用人工神经网络识别活性寡核苷酸序列

A. Luke, Sarah Fergione, Riley Wilson, B. Gunn, S. Svojanovsky
{"title":"利用人工神经网络识别活性寡核苷酸序列","authors":"A. Luke, Sarah Fergione, Riley Wilson, B. Gunn, S. Svojanovsky","doi":"10.22369/ISSN.2153-4136/9/2/4","DOIUrl":null,"url":null,"abstract":"In this project we designed an Artificial Neural Network (ANN) computational model to predict the activity of short oligonucleotide sequences (octamers) with important biological role as exonic splicing enhancers (ESE) motifs recognized by human SR protein SC35. Since only active sequences were available from the literature as our initial data set, we generated an additional set of complementary sequences to the original set. We used back-propagation neural network (BPNN) with MATLAB® Neural Network ToolboxTM on our research designated computer. In Stage I of our project we trained, validated and tested the BPNN prototype. We started with 20 samples in the training and 8 samples in the validation sets. Trained and validated BPNN prototype was then used to test the unique set of 10 octamer sequences with 5 active samples and their 5 complementary sequences. The test showed 2 classification errors, one false positive and the other false negative. We used the test data and moved into Stage II of the project. First, we analyzed the initial DNA numerical representation (DNR) and changed the scheme to achieve higher difference between the subsets of active and complementary sequences. We compared the BPNN results with different numbers of nodes in the second hidden layer to optimize model accuracy. To estimate future model performance we needed to test the classifier on newly collected data from another paper. This practical application included the testing of 41 published, non-repeating SC35 ESE motif octamers, together with 41 complementary sequences. The test showed high BPNN accuracy in the predictive power for both (active and inactive) categories. This study shows the potential for using a BPNN to screen SC35 ESE motif candidates.","PeriodicalId":330804,"journal":{"name":"The Journal of Computational Science Education","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of Active Oligonucleotide Sequences Using Artificial Neural Network\",\"authors\":\"A. Luke, Sarah Fergione, Riley Wilson, B. Gunn, S. Svojanovsky\",\"doi\":\"10.22369/ISSN.2153-4136/9/2/4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this project we designed an Artificial Neural Network (ANN) computational model to predict the activity of short oligonucleotide sequences (octamers) with important biological role as exonic splicing enhancers (ESE) motifs recognized by human SR protein SC35. Since only active sequences were available from the literature as our initial data set, we generated an additional set of complementary sequences to the original set. We used back-propagation neural network (BPNN) with MATLAB® Neural Network ToolboxTM on our research designated computer. In Stage I of our project we trained, validated and tested the BPNN prototype. We started with 20 samples in the training and 8 samples in the validation sets. Trained and validated BPNN prototype was then used to test the unique set of 10 octamer sequences with 5 active samples and their 5 complementary sequences. The test showed 2 classification errors, one false positive and the other false negative. We used the test data and moved into Stage II of the project. First, we analyzed the initial DNA numerical representation (DNR) and changed the scheme to achieve higher difference between the subsets of active and complementary sequences. We compared the BPNN results with different numbers of nodes in the second hidden layer to optimize model accuracy. To estimate future model performance we needed to test the classifier on newly collected data from another paper. This practical application included the testing of 41 published, non-repeating SC35 ESE motif octamers, together with 41 complementary sequences. The test showed high BPNN accuracy in the predictive power for both (active and inactive) categories. This study shows the potential for using a BPNN to screen SC35 ESE motif candidates.\",\"PeriodicalId\":330804,\"journal\":{\"name\":\"The Journal of Computational Science Education\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Computational Science Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22369/ISSN.2153-4136/9/2/4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Computational Science Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22369/ISSN.2153-4136/9/2/4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本项目中,我们设计了一个人工神经网络(ANN)计算模型来预测具有重要生物学作用的短寡核苷酸序列(八聚体)的活性,作为人类SR蛋白SC35识别的外显子剪接增强子(ESE)基序。由于我们的初始数据集只有文献中的活动序列,因此我们在原始数据集的基础上生成了一组补充序列。我们在我们的研究指定计算机上使用MATLAB®神经网络工具箱tm使用反向传播神经网络(BPNN)。在项目的第一阶段,我们训练、验证和测试了BPNN原型。我们在训练集中使用20个样本,在验证集中使用8个样本。然后使用训练和验证的BPNN原型对10个八聚体序列的唯一集进行测试,其中包含5个活性样本及其5个互补序列。结果显示2个分类错误,1个假阳性,1个假阴性。我们使用测试数据并进入项目的第二阶段。首先,我们分析了初始DNA数字表示(DNR),并改变了方案,以实现活性序列和互补序列子集之间的较大差异。我们比较了不同隐层节点数下的BPNN结果,以优化模型精度。为了估计未来的模型性能,我们需要在另一篇论文中新收集的数据上测试分类器。该实际应用包括测试41个已发表的非重复SC35 ESE基序八聚体,以及41个互补序列。测试显示BPNN在(活动和非活动)类别的预测能力方面具有很高的准确性。这项研究显示了使用BPNN筛选SC35 ESE基序候选者的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identification of Active Oligonucleotide Sequences Using Artificial Neural Network
In this project we designed an Artificial Neural Network (ANN) computational model to predict the activity of short oligonucleotide sequences (octamers) with important biological role as exonic splicing enhancers (ESE) motifs recognized by human SR protein SC35. Since only active sequences were available from the literature as our initial data set, we generated an additional set of complementary sequences to the original set. We used back-propagation neural network (BPNN) with MATLAB® Neural Network ToolboxTM on our research designated computer. In Stage I of our project we trained, validated and tested the BPNN prototype. We started with 20 samples in the training and 8 samples in the validation sets. Trained and validated BPNN prototype was then used to test the unique set of 10 octamer sequences with 5 active samples and their 5 complementary sequences. The test showed 2 classification errors, one false positive and the other false negative. We used the test data and moved into Stage II of the project. First, we analyzed the initial DNA numerical representation (DNR) and changed the scheme to achieve higher difference between the subsets of active and complementary sequences. We compared the BPNN results with different numbers of nodes in the second hidden layer to optimize model accuracy. To estimate future model performance we needed to test the classifier on newly collected data from another paper. This practical application included the testing of 41 published, non-repeating SC35 ESE motif octamers, together with 41 complementary sequences. The test showed high BPNN accuracy in the predictive power for both (active and inactive) categories. This study shows the potential for using a BPNN to screen SC35 ESE motif candidates.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信