{"title":"Implementation of Space Optimized Bisecting K-Means (BKM) Based on Hadoop","authors":"Y. Yin, Chengguang Wei, Guigang Zhang, C. Li","doi":"10.1109/WISA.2012.47","DOIUrl":null,"url":null,"abstract":"This article is composed in the background of the study of scientific field of coauthors phenomenon factual basis. By the study of massive amounts of relational data, it provides us with major significances theoretically and practically on retrieving and obtaining professionally academic information and getting knowing of academic development trend of miscellaneous fields. In process of studying this type of project, the problem of cluttering for coauthors that are in the data is involved. However, it is hard to meet the need of implementing the analysis of massive amounts of data cluttering by the existing cluttering software and algorithms, for this reason, finding an approach to deal with this kind of question is toughly important. To solve this question, this article presents an optimized Bisecting K-Means (BKM) clustering algorithm based on Hadoop and states the fashion of how to optimize the algorithm and the key point of implementing in details after analyzing the status quo related to this study. Estimating the complexity of the algorithm by experiments indicates the current problems and the direction for the future study.","PeriodicalId":313228,"journal":{"name":"2012 Ninth Web Information Systems and Applications Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Ninth Web Information Systems and Applications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2012.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This article is composed in the background of the study of scientific field of coauthors phenomenon factual basis. By the study of massive amounts of relational data, it provides us with major significances theoretically and practically on retrieving and obtaining professionally academic information and getting knowing of academic development trend of miscellaneous fields. In process of studying this type of project, the problem of cluttering for coauthors that are in the data is involved. However, it is hard to meet the need of implementing the analysis of massive amounts of data cluttering by the existing cluttering software and algorithms, for this reason, finding an approach to deal with this kind of question is toughly important. To solve this question, this article presents an optimized Bisecting K-Means (BKM) clustering algorithm based on Hadoop and states the fashion of how to optimize the algorithm and the key point of implementing in details after analyzing the status quo related to this study. Estimating the complexity of the algorithm by experiments indicates the current problems and the direction for the future study.