{"title":"一种新的增量模糊c介质聚类算法","authors":"Nicolas Labroche","doi":"10.1109/NAFIPS.2010.5548263","DOIUrl":null,"url":null,"abstract":"This paper proposes two new incremental fuzzy c medoids clustering algorithms for very large datasets. These algorithms are tailored to work with continuous data streams, where all the data is not necessarily available at once or can not fit in main memory. Some fuzzy algorithms already propose solutions to manage large datasets in a similar way but are generally limited to spatial datasets to avoid the complexity of medoids computation. Our methods keep the advantages of the fuzzy approaches and add the capability to handle large relational datasets by considering the continuous input stream of data as a set of data chunks that are processed sequentially. Two distinct models are proposed to aggregate the information discovered from each data chunk and produce the final partition of the dataset. Our new algorithms are compared to state-of-the-art fuzzy clustering algorithms on artificial and real datasets. Experiments show that our new approaches perform closely if not better than existing algorithms while adding the capability to handle relational data to better match the needs of real world applications.","PeriodicalId":394892,"journal":{"name":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"New incremental fuzzy c medoids clustering algorithms\",\"authors\":\"Nicolas Labroche\",\"doi\":\"10.1109/NAFIPS.2010.5548263\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes two new incremental fuzzy c medoids clustering algorithms for very large datasets. These algorithms are tailored to work with continuous data streams, where all the data is not necessarily available at once or can not fit in main memory. Some fuzzy algorithms already propose solutions to manage large datasets in a similar way but are generally limited to spatial datasets to avoid the complexity of medoids computation. Our methods keep the advantages of the fuzzy approaches and add the capability to handle large relational datasets by considering the continuous input stream of data as a set of data chunks that are processed sequentially. Two distinct models are proposed to aggregate the information discovered from each data chunk and produce the final partition of the dataset. Our new algorithms are compared to state-of-the-art fuzzy clustering algorithms on artificial and real datasets. Experiments show that our new approaches perform closely if not better than existing algorithms while adding the capability to handle relational data to better match the needs of real world applications.\",\"PeriodicalId\":394892,\"journal\":{\"name\":\"2010 Annual Meeting of the North American Fuzzy Information Processing Society\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Annual Meeting of the North American Fuzzy Information Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAFIPS.2010.5548263\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2010.5548263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
New incremental fuzzy c medoids clustering algorithms
This paper proposes two new incremental fuzzy c medoids clustering algorithms for very large datasets. These algorithms are tailored to work with continuous data streams, where all the data is not necessarily available at once or can not fit in main memory. Some fuzzy algorithms already propose solutions to manage large datasets in a similar way but are generally limited to spatial datasets to avoid the complexity of medoids computation. Our methods keep the advantages of the fuzzy approaches and add the capability to handle large relational datasets by considering the continuous input stream of data as a set of data chunks that are processed sequentially. Two distinct models are proposed to aggregate the information discovered from each data chunk and produce the final partition of the dataset. Our new algorithms are compared to state-of-the-art fuzzy clustering algorithms on artificial and real datasets. Experiments show that our new approaches perform closely if not better than existing algorithms while adding the capability to handle relational data to better match the needs of real world applications.