A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics.

EURASIP journal on bioinformatics & systems biology Pub Date : 2010-01-01 Epub Date: 2010-06-27 DOI:10.1155/2010/746021

Tonny J Oyana

{"title":"A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics.","authors":"Tonny J Oyana","doi":"10.1155/2010/746021","DOIUrl":null,"url":null,"abstract":"<p><p>The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique-the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.</p>","PeriodicalId":72957,"journal":{"name":"EURASIP journal on bioinformatics & systems biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2010/746021","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EURASIP journal on bioinformatics & systems biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2010/746021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2010/6/27 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique-the Fast, Efficient, and Scalable k-means algorithm (FES-k-means). The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.

查看原文本刊更多论文

一种用于疾病发现和视觉分析的新型FES-k-Means聚类算法。

本研究的中心目的是进一步评估新算法的性能质量。该研究为该算法提供了额外的证据，该算法旨在提高原始k-means聚类技术的整体效率-快速，高效和可扩展的k-means算法(FES-k-means)。FES-k-means算法使用一种混合方法，该方法包括增强最近邻查询的k-d树数据结构、原始k-means算法和Mashor提出的自适应率。用两个真实数据集和一个合成数据集对该算法进行了测试。它在所有三个数据集上使用了两次:一次是通过创新的MIL-SOM方法训练的数据，然后是实际未训练的数据，以评估其能力。这种在聚类之前进行数据训练的两步方法为知识发现和数据挖掘提供了坚实的基础，否则仅用聚类方法是无法做到的。这种方法的好处是，运行时比较数据显示，它以更快的速度产生与原始k-means方法相似的聚类;它还提供了对具有疾病机制发现意义的大型地理空间数据的有效分析。从疾病机制发现的角度来看，假设在芝加哥市发现的血铅水平升高的线性模式可能与城市供水管道在空间上有关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EURASIP journal on bioinformatics & systems biology

自引率

0.00%

发文量