2D similarity kernels for biological sequence classification

Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference) Pub Date : 2012-08-12 DOI:10.1145/2350176.2350179

P. Kuksa

引用次数: 6

Abstract

String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on tasks such as document topic elucidation, biological sequence classification, or protein superfamily and fold prediction. However, typical string kernel methods rely on analysis of discrete 1D string data (e.g., DNA or amino acid sequences). This work introduces new 2D kernel methods for sequence data in the form of sequences of feature vectors (as in biological sequence profiles, or sequences of individual amino acid physico-chemical descriptors). On three protein sequence classification tasks proposed 2D kernels show significant 15-20% improvements compared to state-of-the-art sequence classification methods.

查看原文本刊更多论文

生物序列分类的二维相似核

基于字符串核的机器学习方法在结构化/顺序数据分析的实际任务中取得了巨大的成功。它们通常在文档主题解释、生物序列分类或蛋白质超家族和折叠预测等任务上表现出最先进的性能。然而，典型的字符串核方法依赖于对离散的一维字符串数据(例如DNA或氨基酸序列)的分析。这项工作以特征向量序列的形式引入了新的二维核方法(如生物序列剖面，或单个氨基酸物理化学描述符的序列)。在三个蛋白质序列分类任务中，与最先进的序列分类方法相比，提出的2D核函数显示出15-20%的显著改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Evolutionary computation, machine learning and data mining in bioinformatics. EvoBIO (Conference)

自引率

0.00%

发文量