GrpClassifierEC: a novel classification approach based on the ensemble clustering space.

IF 1.5 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS
Algorithms for Molecular Biology Pub Date : 2020-02-13 eCollection Date: 2020-01-01 DOI:10.1186/s13015-020-0162-7
Loai Abdallah, Malik Yousef
{"title":"GrpClassifierEC: a novel classification approach based on the ensemble clustering space.","authors":"Loai Abdallah,&nbsp;Malik Yousef","doi":"10.1186/s13015-020-0162-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm.</p><p><strong>Results: </strong>In this study, we propose a new classification method named <i>GrpClassifierEC</i> that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the <i>k</i> nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm <i>GrpClassifierEC</i> outperforms the other algorithms.</p><p><strong>Conclusions: </strong>Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC.</p><p><strong>Availability and implementation: </strong>The KNIME workflow, implementing <i>GrpClassifierEC</i>, is available at https://malikyousef.com.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"15 ","pages":"3"},"PeriodicalIF":1.5000,"publicationDate":"2020-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-020-0162-7","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-020-0162-7","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/1/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm.

Results: In this study, we propose a new classification method named GrpClassifierEC that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC outperforms the other algorithms.

Conclusions: Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC.

Availability and implementation: The KNIME workflow, implementing GrpClassifierEC, is available at https://malikyousef.com.

Abstract Image

Abstract Image

Abstract Image

GrpClassifierEC:一种新的基于集成聚类空间的分类方法。
背景:分子生物学的进步产生了庞大而复杂的数据集,因此需要一种能够捕捉数据的实际结构和隐藏模式的聚类方法。此外,几何空间可能不能反映不同物体之间的实际相似度。因此,在本研究中,我们使用基于聚类的空间,将分子的几何空间转换为基于聚类结果的分类空间。然后我们利用这个空间来开发一种新的分类算法。结果:在本研究中,我们提出了一种新的分类方法GrpClassifierEC,它将给定的数据空间替换为基于集成聚类(EC)的分类空间。EC空间是通过在多个聚类算法运行中跟踪点的隶属度来定义的。包含在同一簇中的不同点将被表示为单个点。我们的算法将所有这些点分类为一类。两个对象之间的相似性定义为这些对象不属于同一集群的次数。为了评估我们提出的方法,我们将其结果与几个基准数据集上的k近邻、决策树和随机森林分类算法进行了比较。结果表明,本文提出的新算法GrpClassifierEC优于其他算法。结论:我们的算法可以与许多其他算法集成。在本研究中,我们只使用不同k值的k-means聚类算法。在未来的研究中,我们提出了几个方向:(1)检验聚类算法的效果,构建集成聚类空间。(2)基于训练数据发现聚类结果较差;(3)基于EC结合相似点减少数据量。可用性和实现:实现GrpClassifierEC的KNIME工作流可从https://malikyousef.com获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Algorithms for Molecular Biology
Algorithms for Molecular Biology 生物-生化研究方法
CiteScore
2.40
自引率
10.00%
发文量
16
审稿时长
>12 weeks
期刊介绍: Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信