Protein sub-cellular localization based on noise-intensity-weighted linear discriminant analysis and an improved k-nearest-neighbor classifier

Zhenfeng Lei, Shunfang Wang, Dongshu Xu
{"title":"Protein sub-cellular localization based on noise-intensity-weighted linear discriminant analysis and an improved k-nearest-neighbor classifier","authors":"Zhenfeng Lei, Shunfang Wang, Dongshu Xu","doi":"10.1109/CISP-BMEI.2016.7853022","DOIUrl":null,"url":null,"abstract":"Data dimension reduction and classification are the key steps in protein sub-cellular localization. With the rapid development of biological science and technology, a plenty of high dimensional biological data have generated, accompanied by certain noise. How to express high dimensional data in low dimension space and achieve better classification effect have become one of the significant tasks for researchers in the application of protein sub-cellular localization. Both the traditional dimension reduction algorithm of linear discriminant analysis (LDA) and the popular classifier of k-nearest neighbor (KNN) cannot meet the needs of the current application well if they are simply used without improvements. The aim of LDA is to seek out a projecting line at certain direction letting the projection of samples as far away as possible. However, noise jamming expands the within-class distance and makes the classes uneasily separated even by LDA. Besides, KNN has not taken samples' inequality into consideration primely. Therefore, this paper first uses the noise intensity as a kind of weight in LDA, then improves KNN algorithm by considering the inequality of samples from different classes with a within-class KNN method. Experimental results show that the proposed method by combining the above two improvements gets ideal feasibility and effectiveness in classification through the verification of Jackknife.","PeriodicalId":275095,"journal":{"name":"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI.2016.7853022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Data dimension reduction and classification are the key steps in protein sub-cellular localization. With the rapid development of biological science and technology, a plenty of high dimensional biological data have generated, accompanied by certain noise. How to express high dimensional data in low dimension space and achieve better classification effect have become one of the significant tasks for researchers in the application of protein sub-cellular localization. Both the traditional dimension reduction algorithm of linear discriminant analysis (LDA) and the popular classifier of k-nearest neighbor (KNN) cannot meet the needs of the current application well if they are simply used without improvements. The aim of LDA is to seek out a projecting line at certain direction letting the projection of samples as far away as possible. However, noise jamming expands the within-class distance and makes the classes uneasily separated even by LDA. Besides, KNN has not taken samples' inequality into consideration primely. Therefore, this paper first uses the noise intensity as a kind of weight in LDA, then improves KNN algorithm by considering the inequality of samples from different classes with a within-class KNN method. Experimental results show that the proposed method by combining the above two improvements gets ideal feasibility and effectiveness in classification through the verification of Jackknife.
基于噪声强度加权线性判别分析和改进k近邻分类器的蛋白质亚细胞定位
数据降维和分类是蛋白质亚细胞定位的关键步骤。随着生物科学技术的飞速发展,产生了大量高维的生物数据,并伴随着一定的噪声。如何在低维空间中表达高维数据,获得更好的分类效果,已成为蛋白质亚细胞定位应用研究的重要课题之一。传统的线性判别分析(LDA)降维算法和流行的k近邻分类器(KNN)如果不加改进就简单使用,都不能很好地满足当前应用的需要。LDA的目的是在一定的方向上寻找一条投影线,使样本的投影尽可能的远。然而,噪声干扰扩大了类内距离,即使采用LDA也难以实现类间的分离。此外,KNN没有充分考虑样本的不平等。因此,本文首先将噪声强度作为LDA中的一种权重,然后利用类内KNN方法考虑不同类别样本的不平等,对KNN算法进行改进。实验结果表明,将上述两种改进相结合的方法通过Jackknife的验证获得了理想的分类可行性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信