{"title":"Clustering Algorithms for Incomplete Datasets","authors":"Loai Abdallah, I. Shimshoni","doi":"10.5772/INTECHOPEN.78272","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.78272","url":null,"abstract":"Many real-world dataset suffers from the problem of missing values. Several methods were developed to deal with this problem. Many of them filled the missing values within fixed value based on statistical computation. In this research, we developed a new ver- sions of the k-means and the mean shift clustering algorithms that deal with datasets with missing values without filling their values. We developed a new distance function that is able to compute distances over incomplete datasets. The distance was computed based only on the mean and variance of the data for each attribute. As a result, the runtime complexity of our computation was O 1 ð Þ . We experimented on six standard numerical datasets from different fields. On these datasets, we simulated missing values and com- pared the performance of the developed algorithms using our distance and the suggested mean computations to other three basic methods. Our experiments show that the devel- oped algorithms using our distance function outperform the existing k-means and mean shift using other methods for dealing with missing values.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"29 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132228623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Point Cloud Clustering Using Panoramic Layered Range Image","authors":"M. Nakagawa, Kounosuke Kataoka, Shouta Ouma","doi":"10.5772/INTECHOPEN.76407","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.76407","url":null,"abstract":"Point-cloud clustering is an essential technique for modeling massive point clouds acquired with a laser scanner. There are three clustering approaches in point-cloud clustering, namely model-based clustering, edge-based clustering, and region-based clus- tering. In geoinformatics, edge-based and region-based clustering are often applied for the modeling of buildings and roads. These approaches use low-resolution point-cloud data that consist of tens of points or several hundred points per m 2 , such as aerial laser scanning data and vehicle-borne mobile mapping system data. These approaches also focus on geometrical knowledge and restrictions. We focused on region-based point-cloud clustering to improve 3D visualization and modeling using massive point clouds. We proposed a point-cloud clustering methodology and point-cloud filtering on a mul tilayered panoramic range image. A point-based rendering approach was applied for the range image generation using a massive point cloud. Moreover, we conducted three experiments to verify our methodology.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121778916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Local Data and KL Membership Divergence into Hard C-Means Clustering for Fuzzy and Noise-Robust Data Segmentation","authors":"R. Gharieb","doi":"10.5772/INTECHOPEN.74514","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.74514","url":null,"abstract":"Hard C-means (HCM) and fuzzy C-means (FCM) algorithms are among the most popular ones for data clustering including image data. The HCM algorithm offers each data entity with a cluster membership of 0 or 1. This implies that the entity will be assigned to only one cluster. On the contrary, the FCM algorithm provides an entity with a membership value between 0 and 1, which means that the entity may belong to all clusters but with different membership values. The main disadvantage of both HCM and FCM algorithms is that they cluster an entity based on only its self-features and do not incorporate the influence of the entity ’ s neighborhoods, which makes clustering prone to additive noise. In this chapter, Kullback-Leibler (KL) membership divergence is incorporated into the HCM for image data clustering. This HCM-KL-based clustering algorithm provides twofold advantage. The first one is that it offers a fuzzification approach to the HCM cluster- ing algorithm. The second one is that by incorporating a local spatial membership function into the HCM objective function, additive noise can be tolerated. Also spatial data is incorporated for more noise-robust clustering. pixels. Results of segmentation of synthetic, simulated medical and real-world images have shown that the proposed local membership KL divergence-based FCM (LMKLFCM) and the local data and membership KL divergence-based entropy FCM (LDMKLFCM) algorithms outperform several widely used FCM related algorithms. Moreover, the average runtimes of all algorithms have been measured via simulation. In all runs, all algorithms start from the same randomly generated initial conditions, as mentioned in the simulation section, and stopped at the same fixed point. The LDMKLFCM, LMKLFCM, standard FCM, MEFCM, and SFCM algorithms have provided average runtime of 1.5, 1.75, 1, 0.9 and 1 sec respectively. The simulation results have been done using Matlab R2013b under windows on a processor of Intel (R) core (TM) i3, CPU M370 2.4 GHZ, 4 GB RAM.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130058738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Class of Parametric Tree-Based Clustering Methods","authors":"F. Glover, Yang Wang","doi":"10.5772/INTECHOPEN.76406","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.76406","url":null,"abstract":"We introduce a class of tree-based clustering methods based on a single parameter W and show how to generate the full collection of cluster sets C(W), without duplication, by varying W according to conditions identified during the algorithm’s execution. The number of clusters within C(W) for a given W is determined automatically, using a graph representation in which cluster elements are represented by nodes and their pairwise con- nections are represented by edges. We identify features of the clusters produced which lead to special procedures to accelerate the computation. Finally, we introduce a related node-based variant of the algorithm based on a parameter Y which can be used to generate clusters with complementary features, and a method that combines both variants based on a parameter Z and a weight that determines the contribution of each variant.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133779284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Centroid-Based Lexical Clustering","authors":"Khaled Abdalgader","doi":"10.5772/INTECHOPEN.75433","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.75433","url":null,"abstract":"Conventional lexical-clustering algorithms treat text fragments as a mixed collection of words, with a semantic similarity between them calculated based on the term of how many the particular word occurs within the compared fragments. Whereas this technique is appropriate for clustering large-sized textual collections, it operates poorly when clustering small-sized texts such as sentences. This is due to compared sentences that may be linguistically similar despite having no words in common. This chapter presents a new version of the original k-means method for sentence-level text clustering that is relay on the idea of use of the related synonyms in order to construct the rich semantic vectors. These vectors represent a sentence using linguistic information resulting from a lexical database founded to determine the actual sense to a word, based on the context in which it occurs. Therefore, while traditional k-means method application is relay on calculating the distance between patterns, the new proposed version operates by calculating the semantic similarity between sentences. This allows it to capture a higher degree of semantic or linguistic information existing within the clustered sentences. Experimental results illustrate that the proposed version of clustering algorithm performs favorably against other well-known clustering algorithms on several standard datasets.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122253229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Urosevic, Ana Kovačević, Firas Kaddachi, MilanVukicevic
{"title":"emporal Clustering for Behavior Variation and Anomaly Detection from Data Acquired Through IoT in Smart Cities","authors":"V. Urosevic, Ana Kovačević, Firas Kaddachi, MilanVukicevic","doi":"10.5772/INTECHOPEN.75203","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.75203","url":null,"abstract":"In this chapter, we propose a methodology for behavior variation and anomaly detection from acquired sensory data, based on temporal clustering models. Data are collected from five prominent European smart cities, and Singapore, that aim to become fully “elderly-friendly,” with the development and deployment of ubiquitous systems for assessment and prediction of early risks of elderly Mild Cognitive Impairments (MCI) and frailty, and for supporting generation and delivery of optimal personalized preventive interventions that mitigate those risks, utilizing smart city datasets and IoT infrastructure. Low level data collected from IoT devices are preprocessed as sequences of activities, with temporal and causal variations in sequences classified as normal or anomalous behavior. The goals of proposed methodology are to (1) recognize significant behavioral variation patterns and (2) support early identification of pattern changes. Temporal clustering models are applied in detection and prediction of the following variation types: intra-activity (single activity, single citizen) and inter-activity (multi- ple-activities, single citizen). Identified behavioral variations and anomalies are further mapped to MCI/frailty onset behavior and risk factors, following the developed geriatric expert model.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132178529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Spectral Clustering via Sparse Representation","authors":"Xiaodong Feng","doi":"10.5772/INTECHOPEN.76586","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.76586","url":null,"abstract":"Clustering high-dimensional data has been a challenging problem in data mining and machining learning. Spectral clustering via sparse representation has been proposed for clustering high-dimensional data. A critical step in spectral clustering is to effectively construct a weight matrix by assessing the proximity between each pair of objects. While sparse representation proves its effectiveness for compressing high-dimensional signals, existing spectral clustering algorithms based on sparse representation use those sparse coefficients directly. We believe that the similarity measure exploiting more global information from the coefficient vectors will provide more truthful similarity among data objects. The intuition is that the sparse coefficient vectors corresponding to two similar objects are similar and those of two dissimilar objects are also dissimilar. In particular, we propose two approaches of weight matrix construction according to the similarity of the sparse coefficient vectors. Experimental results on several real-world high-dimensional data sets demonstrate that spectral clustering based on the proposed similarity matrices outperforms existing spectral clustering algorithms via sparse representation.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126280240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partitional Clustering","authors":"Uğurhan Kutbay","doi":"10.5772/intechopen.75836","DOIUrl":"https://doi.org/10.5772/intechopen.75836","url":null,"abstract":"","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132553235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hadeel K. Aljobouri, Hussain A. Jaber, Ilyas Çankaya
{"title":"Performance Assessment of Unsupervised Clustering Algorithms Combined MDL Index","authors":"Hadeel K. Aljobouri, Hussain A. Jaber, Ilyas Çankaya","doi":"10.5772/INTECHOPEN.74506","DOIUrl":"https://doi.org/10.5772/INTECHOPEN.74506","url":null,"abstract":"Best clustering analysis should be resisting the presence of outliers and be less sensi- tive to initialization as well as the input sequence ordering. This chapter compares the performance among three of the unsupervised clustering algorithms: neural gas (NG), growing neural gas (GNG), and robust growing neural gas (RGNG). A complete expla-nation of NG and GNG algorithms is presented in the next comparison with RGNG. Another comparison due to the minimum description length (MDL) criterion between RGNG used MDL value as the clustering validity index versus GNG and NG combined with MDL. Statistical estimations are applied to explain the meaning of the output results when these algorithms are fed to the synthetic 2D dataset. The techniques introduced in this chapter are designed and implemented in a simple software package using a MATLAB-based graphical user interface (GUI) tool, which allows users to interact with the clustering techniques and output data easily.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115619409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}