基于分割KMean的社交网络数据集模式识别方法研究

2017 International Conference on Intelligent Computing and Control (I2C2) Pub Date : 2017-06-23 DOI:10.1109/I2C2.2017.8321776

Shilpa V. Gajbhiye, Gaurav B. Malode

{"title":"基于分割KMean的社交网络数据集模式识别方法研究","authors":"Shilpa V. Gajbhiye, Gaurav B. Malode","doi":"10.1109/I2C2.2017.8321776","DOIUrl":null,"url":null,"abstract":"Databases today can range in size more than terabytes. Within these masses of data lies hidden information of strategic importance. So when there are lots of trees, how to find conclusions about the forest? The newest answer is mining of data, which is being used to increase revenues. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. This research uses social networking data set for pattern recognition, because it is one of the emerging application areas in data mining. We used Facebook 100 dataset and applied Bisecting KMeans algorithm on it, so that we would get better clustering outputs. Bisecting KMeans first bisects the data into 2 parts and selects the part with greater number of elements, then apply clustering on it again. This goes on till we have N Number of clusters. We would apply this to our dataset to get desired results. With this we are going to compare Bisecting K Mean algorithm with other data mining algorithm. And finally we are going to find out different pattern from social networking dataset.","PeriodicalId":288351,"journal":{"name":"2017 International Conference on Intelligent Computing and Control (I2C2)","volume":"37 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing pattern recognition in social networking dataset by using bisecting KMean\",\"authors\":\"Shilpa V. Gajbhiye, Gaurav B. Malode\",\"doi\":\"10.1109/I2C2.2017.8321776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Databases today can range in size more than terabytes. Within these masses of data lies hidden information of strategic importance. So when there are lots of trees, how to find conclusions about the forest? The newest answer is mining of data, which is being used to increase revenues. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. This research uses social networking data set for pattern recognition, because it is one of the emerging application areas in data mining. We used Facebook 100 dataset and applied Bisecting KMeans algorithm on it, so that we would get better clustering outputs. Bisecting KMeans first bisects the data into 2 parts and selects the part with greater number of elements, then apply clustering on it again. This goes on till we have N Number of clusters. We would apply this to our dataset to get desired results. With this we are going to compare Bisecting K Mean algorithm with other data mining algorithm. And finally we are going to find out different pattern from social networking dataset.\",\"PeriodicalId\":288351,\"journal\":{\"name\":\"2017 International Conference on Intelligent Computing and Control (I2C2)\",\"volume\":\"37 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Intelligent Computing and Control (I2C2)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/I2C2.2017.8321776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Intelligent Computing and Control (I2C2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2C2.2017.8321776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

今天的数据库的大小可以超过tb。在这些海量的数据中隐藏着具有战略重要性的信息。那么当有很多树的时候，如何找到关于森林的结论呢?最新的答案是数据挖掘，这被用来增加收入。数据挖掘是一个使用各种数据分析工具来发现数据中的模式和关系的过程，这些模式和关系可用于进行有效的预测。本研究使用社交网络数据集进行模式识别，因为它是数据挖掘中新兴的应用领域之一。为了得到更好的聚类输出，我们使用了Facebook 100数据集，并对其应用了平分KMeans算法。平分KMeans首先将数据平分为2部分，选择元素数量较多的部分，然后再次对其进行聚类。这个过程一直持续到我们有N个簇。我们可以将此应用于我们的数据集以获得期望的结果。在此基础上，我们将比较平分K均值算法与其他数据挖掘算法。最后，我们将从社交网络数据中找出不同的模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing pattern recognition in social networking dataset by using bisecting KMean

Databases today can range in size more than terabytes. Within these masses of data lies hidden information of strategic importance. So when there are lots of trees, how to find conclusions about the forest? The newest answer is mining of data, which is being used to increase revenues. Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid predictions. This research uses social networking data set for pattern recognition, because it is one of the emerging application areas in data mining. We used Facebook 100 dataset and applied Bisecting KMeans algorithm on it, so that we would get better clustering outputs. Bisecting KMeans first bisects the data into 2 parts and selects the part with greater number of elements, then apply clustering on it again. This goes on till we have N Number of clusters. We would apply this to our dataset to get desired results. With this we are going to compare Bisecting K Mean algorithm with other data mining algorithm. And finally we are going to find out different pattern from social networking dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on Intelligent Computing and Control (I2C2)

自引率

0.00%

发文量