Rule-based Assembly for Short Read Data Set obtained with Multiple Assemblers and k-mer Sizes (情报论的学习理论と机械学习)

Aya Oshiro, H. Afuso, T. Okazaki
{"title":"Rule-based Assembly for Short Read Data Set obtained with Multiple Assemblers and k-mer Sizes (情报论的学习理论と机械学习)","authors":"Aya Oshiro, H. Afuso, T. Okazaki","doi":"10.2197/IPSJTBIO.10.9","DOIUrl":null,"url":null,"abstract":"Various de novo assembly methods based on the concept of k-mer have been proposed. Despite the success of these methods, an alternative approach, referred to as the hybrid approach, has recently been proposed that combines different traditional methods to effectively exploit each of their properties in an integrated manner. However, the results obtained from the traditional methods used in the hybrid approach depend not only on the specific algorithm or heuristics but also on the selection of a user-specific k-mer size. Consequently, the results obtained with the hybrid approach also depend on these factors. Here, we designed a new assembly approach, referred to as the rule-based assembly. This approach follows a similar strategy to the hybrid approach, but employs specific rules learned from certain characteristics of draft contigs to remove any erroneous contigs and then merges them. To construct the most effective rules for this purpose, a learning method based on decision trees, i.e., a complex decision tree, is proposed. Comparative experiments were also conducted to validate the method. The results showed that proposed method could outperformed traditional methods in certain cases.","PeriodicalId":377405,"journal":{"name":"IEICE technical report. Speech","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Rule-based Assembly for Short Read Data Set obtained with Multiple Assemblers and k-mer Sizes (情報論的学習理論と機械学習)\",\"authors\":\"Aya Oshiro, H. Afuso, T. Okazaki\",\"doi\":\"10.2197/IPSJTBIO.10.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Various de novo assembly methods based on the concept of k-mer have been proposed. Despite the success of these methods, an alternative approach, referred to as the hybrid approach, has recently been proposed that combines different traditional methods to effectively exploit each of their properties in an integrated manner. However, the results obtained from the traditional methods used in the hybrid approach depend not only on the specific algorithm or heuristics but also on the selection of a user-specific k-mer size. Consequently, the results obtained with the hybrid approach also depend on these factors. Here, we designed a new assembly approach, referred to as the rule-based assembly. This approach follows a similar strategy to the hybrid approach, but employs specific rules learned from certain characteristics of draft contigs to remove any erroneous contigs and then merges them. To construct the most effective rules for this purpose, a learning method based on decision trees, i.e., a complex decision tree, is proposed. Comparative experiments were also conducted to validate the method. The results showed that proposed method could outperformed traditional methods in certain cases.\",\"PeriodicalId\":377405,\"journal\":{\"name\":\"IEICE technical report. Speech\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEICE technical report. Speech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/IPSJTBIO.10.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEICE technical report. Speech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJTBIO.10.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

基于k-mer概念的各种从头组装方法已经被提出。尽管这些方法都取得了成功,但最近有人提出了一种替代方法,即混合方法,该方法结合了不同的传统方法,以综合的方式有效地利用了每种方法的特性。然而,混合方法中使用的传统方法获得的结果不仅取决于特定的算法或启发式算法,还取决于用户特定k-mer大小的选择。因此,用混合方法得到的结果也取决于这些因素。在这里,我们设计了一种新的组装方法,称为基于规则的组装。这种方法遵循与混合方法类似的策略,但采用了从草案配置的某些特征中学到的特定规则来删除任何错误的配置,然后合并它们。为了构建最有效的规则,提出了一种基于决策树的学习方法,即复杂决策树。通过对比实验验证了该方法的有效性。结果表明,该方法在某些情况下优于传统方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Rule-based Assembly for Short Read Data Set obtained with Multiple Assemblers and k-mer Sizes (情報論的学習理論と機械学習)
Various de novo assembly methods based on the concept of k-mer have been proposed. Despite the success of these methods, an alternative approach, referred to as the hybrid approach, has recently been proposed that combines different traditional methods to effectively exploit each of their properties in an integrated manner. However, the results obtained from the traditional methods used in the hybrid approach depend not only on the specific algorithm or heuristics but also on the selection of a user-specific k-mer size. Consequently, the results obtained with the hybrid approach also depend on these factors. Here, we designed a new assembly approach, referred to as the rule-based assembly. This approach follows a similar strategy to the hybrid approach, but employs specific rules learned from certain characteristics of draft contigs to remove any erroneous contigs and then merges them. To construct the most effective rules for this purpose, a learning method based on decision trees, i.e., a complex decision tree, is proposed. Comparative experiments were also conducted to validate the method. The results showed that proposed method could outperformed traditional methods in certain cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信