Regularization of sequence data for machine learning

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) Pub Date : 2011-11-12 DOI:10.1109/BIBMW.2011.6112350

Bryan Bai, S. C. Kremer

引用次数: 0

Abstract

We examine the problem of classifying biological sequences, and in particular the challenge of generalizing results to novel input data. We observe that the high-dimensionality of sequence data representations results in an extremely sparsely populated input space. This motivates a need for regularization (a form of inductive bias), in order to achieve generalization. We discuss regularization in the context of regular neural networks, deep belief networks and support vector machines, and provide experimental results for these architectures. Our results support the importance of using an effective regularization method and identify which methods work well on a real-world dataset.

查看原文本刊更多论文

用于机器学习的序列数据正则化

我们研究了分类生物序列的问题，特别是将结果推广到新输入数据的挑战。我们观察到，序列数据表示的高维导致了一个极其稀疏的输入空间。这激发了对正则化(归纳偏差的一种形式)的需求，以实现泛化。我们在规则神经网络、深度信念网络和支持向量机的背景下讨论了正则化，并提供了这些架构的实验结果。我们的结果支持使用有效的正则化方法的重要性，并确定哪些方法在真实数据集上工作得很好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

自引率

0.00%

发文量