基于小波的概率密度函数聚类方法在时间序列数据中识别模式

2016 IEEE EMBS International Student Conference (ISC) Pub Date : 2016-05-29 DOI:10.1109/EMBSISC.2016.7508616

Mojtaba Kordestani, A. Alkhateeb, Iman Rezaeian, L. Rueda, M. Saif

{"title":"基于小波的概率密度函数聚类方法在时间序列数据中识别模式","authors":"Mojtaba Kordestani, A. Alkhateeb, Iman Rezaeian, L. Rueda, M. Saif","doi":"10.1109/EMBSISC.2016.7508616","DOIUrl":null,"url":null,"abstract":"Clustering is a prominent method to identify similar patterns in large groups of data and can be beneficial in the bioinformatics studies due to this property. Classical methods such as k-means and maximum likelihood consider a mixture of Gaussian probability density function (PDF) of data and find clusters based on maximizing the PDF. However, correlation among different groups of data and existence of noise on the data make it difficult to correctly detect the correct number of clusters. Furthermore, the assumption of the Gaussian distance for the PDF is not necessarily true in real applications. This paper presents a new clustering method via wavelet-based probability density functions. For this purpose, first, a mixture of PDFs is estimated by the wavelet for each feature. After this, a multilevel thresholding method is implemented on the mixture of PDFs of each feature to obtain the clusters. Finally, a forward feature selection with memory is used to cluster the dataset based on combinations of the features. The profile alignment and agglomerative clustering (PAAC) index is applied for evaluating the number of clusters and features. Transcript expression throughout the various stages of prostate cancer is considered as a case study to identify patterns. The experimental results show the ability of the proposed method in detecting patterns of similar transcripts throughout disease progression. The results are promising in comparison with the other methods.","PeriodicalId":361773,"journal":{"name":"2016 IEEE EMBS International Student Conference (ISC)","volume":"249 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A new clustering method using wavelet based probability density functions for identifying patterns in time-series data\",\"authors\":\"Mojtaba Kordestani, A. Alkhateeb, Iman Rezaeian, L. Rueda, M. Saif\",\"doi\":\"10.1109/EMBSISC.2016.7508616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is a prominent method to identify similar patterns in large groups of data and can be beneficial in the bioinformatics studies due to this property. Classical methods such as k-means and maximum likelihood consider a mixture of Gaussian probability density function (PDF) of data and find clusters based on maximizing the PDF. However, correlation among different groups of data and existence of noise on the data make it difficult to correctly detect the correct number of clusters. Furthermore, the assumption of the Gaussian distance for the PDF is not necessarily true in real applications. This paper presents a new clustering method via wavelet-based probability density functions. For this purpose, first, a mixture of PDFs is estimated by the wavelet for each feature. After this, a multilevel thresholding method is implemented on the mixture of PDFs of each feature to obtain the clusters. Finally, a forward feature selection with memory is used to cluster the dataset based on combinations of the features. The profile alignment and agglomerative clustering (PAAC) index is applied for evaluating the number of clusters and features. Transcript expression throughout the various stages of prostate cancer is considered as a case study to identify patterns. The experimental results show the ability of the proposed method in detecting patterns of similar transcripts throughout disease progression. The results are promising in comparison with the other methods.\",\"PeriodicalId\":361773,\"journal\":{\"name\":\"2016 IEEE EMBS International Student Conference (ISC)\",\"volume\":\"249 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE EMBS International Student Conference (ISC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EMBSISC.2016.7508616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE EMBS International Student Conference (ISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EMBSISC.2016.7508616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

聚类是一种在大量数据中识别相似模式的重要方法，并且由于这一特性在生物信息学研究中是有益的。k-means和极大似然等经典方法考虑数据的混合高斯概率密度函数(PDF)，并在最大化PDF的基础上寻找聚类。然而，不同数据组之间的相关性以及数据上存在的噪声使得正确检测到正确的簇数变得困难。此外，在实际应用中，PDF的高斯距离假设并不一定正确。提出了一种基于小波概率密度函数的聚类方法。为此，首先，用小波对每个特征估计混合的pdf。然后，对每个特征的混合pdf进行多级阈值分割，得到聚类。最后，基于特征的组合，采用带记忆的前向特征选择对数据集进行聚类。采用PAAC (profile alignment and agglomerative clustering)指数来评价聚类和特征的数量。转录表达在前列腺癌的各个阶段被认为是一个案例研究，以确定模式。实验结果表明，所提出的方法在整个疾病进展过程中检测相似转录本模式的能力。与其他方法比较，结果是有希望的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new clustering method using wavelet based probability density functions for identifying patterns in time-series data

Clustering is a prominent method to identify similar patterns in large groups of data and can be beneficial in the bioinformatics studies due to this property. Classical methods such as k-means and maximum likelihood consider a mixture of Gaussian probability density function (PDF) of data and find clusters based on maximizing the PDF. However, correlation among different groups of data and existence of noise on the data make it difficult to correctly detect the correct number of clusters. Furthermore, the assumption of the Gaussian distance for the PDF is not necessarily true in real applications. This paper presents a new clustering method via wavelet-based probability density functions. For this purpose, first, a mixture of PDFs is estimated by the wavelet for each feature. After this, a multilevel thresholding method is implemented on the mixture of PDFs of each feature to obtain the clusters. Finally, a forward feature selection with memory is used to cluster the dataset based on combinations of the features. The profile alignment and agglomerative clustering (PAAC) index is applied for evaluating the number of clusters and features. Transcript expression throughout the various stages of prostate cancer is considered as a case study to identify patterns. The experimental results show the ability of the proposed method in detecting patterns of similar transcripts throughout disease progression. The results are promising in comparison with the other methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE EMBS International Student Conference (ISC)

自引率

0.00%

发文量