A study and characterization of chemical properties of soil surface data using K-means algorithm

D. A. Kumar, N. Kannathasan
{"title":"A study and characterization of chemical properties of soil surface data using K-means algorithm","authors":"D. A. Kumar, N. Kannathasan","doi":"10.1109/ICPRIME.2013.6496484","DOIUrl":null,"url":null,"abstract":"Soil is a vital natural resource on whose proper use depends the life supporting system of a country and the socio-economic development of its people. Clustering in agricultural soil datasets is a relatively novel research field. This research paper aims to study the Characterization of Chemical Properties of Soil Surface Data of Bhanapur Micro watershed of Koppal District, Karnataka using K-means algorithm. This work computed average silhouette width which provides an evaluation of clustering validity, and might be used to select an appropriate number of clusters in the soil dataset. Soil dataset clustered by using K-means two clustering with Euclidean distance which provides average silhouette value is 0.7736. And this work also proved high intra-class similarity: cohesive within clusters and low inter-class similarity: distinctive between clusters in the soil dataset by K-means which reassigns points among clusters to decrease the sum of point-to-centroid distances, and then recomputed cluster centroids for the new cluster assignments. Kmeans clustering with Euclidean distance total sum of distance is 1.14402 based on the number of reassign soil data. By default, Kmeans begins the clustering process using a randomly selected set of initial centroid locations. K-means repeats the clustering process starting from different randomly selected centroids. The sum of distances within each cluster for that best solution is 1.1440 with Euclidean distance. K-means three clustering solution with Euclidean distance which provides average silhouette value is 0.6052. K-means three clustering solution with Cosine distance average silhouette value is 0.6219. Hierarchical Clustering dendrogram with Euclidean distance measure is 0.7935. Hierarchical Clustering dendrogram with Cosine distance measure is 0.6678. The results from hierarchical clustering dendrogram with Cosine distance are qualitatively similar to results from K-Means using three clusters. K-means cluster analysis with Euclidean and Cosine distance measures compared with Hierarchical clustering dendrogram. Based on the study and characterization of chemical properties of soil surface data using K-means, Cosine might be a good choice of distance measure.","PeriodicalId":123210,"journal":{"name":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2013.6496484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Soil is a vital natural resource on whose proper use depends the life supporting system of a country and the socio-economic development of its people. Clustering in agricultural soil datasets is a relatively novel research field. This research paper aims to study the Characterization of Chemical Properties of Soil Surface Data of Bhanapur Micro watershed of Koppal District, Karnataka using K-means algorithm. This work computed average silhouette width which provides an evaluation of clustering validity, and might be used to select an appropriate number of clusters in the soil dataset. Soil dataset clustered by using K-means two clustering with Euclidean distance which provides average silhouette value is 0.7736. And this work also proved high intra-class similarity: cohesive within clusters and low inter-class similarity: distinctive between clusters in the soil dataset by K-means which reassigns points among clusters to decrease the sum of point-to-centroid distances, and then recomputed cluster centroids for the new cluster assignments. Kmeans clustering with Euclidean distance total sum of distance is 1.14402 based on the number of reassign soil data. By default, Kmeans begins the clustering process using a randomly selected set of initial centroid locations. K-means repeats the clustering process starting from different randomly selected centroids. The sum of distances within each cluster for that best solution is 1.1440 with Euclidean distance. K-means three clustering solution with Euclidean distance which provides average silhouette value is 0.6052. K-means three clustering solution with Cosine distance average silhouette value is 0.6219. Hierarchical Clustering dendrogram with Euclidean distance measure is 0.7935. Hierarchical Clustering dendrogram with Cosine distance measure is 0.6678. The results from hierarchical clustering dendrogram with Cosine distance are qualitatively similar to results from K-Means using three clusters. K-means cluster analysis with Euclidean and Cosine distance measures compared with Hierarchical clustering dendrogram. Based on the study and characterization of chemical properties of soil surface data using K-means, Cosine might be a good choice of distance measure.
基于k -均值算法的土壤表面数据化学性质研究与表征
土壤是一种重要的自然资源,一个国家的生命维持系统和人民的社会经济发展都依赖于土壤的适当利用。农业土壤数据的聚类是一个比较新的研究领域。本文旨在利用K-means算法研究卡纳塔克邦Koppal地区Bhanapur微流域土壤表面数据的化学性质表征。这项工作计算平均轮廓宽度,提供了聚类有效性的评估,并可用于在土壤数据集中选择适当数量的聚类。采用欧氏距离的K-means二聚类方法聚类土壤数据集,平均廓形值为0.7736。该研究还证明了高类内相似性和低类间相似性:通过K-means在土壤数据集中的聚类之间重新分配点以减少点到质心距离的和,然后重新计算聚类质心来进行新的聚类分配。基于重分配土壤数据个数的欧氏距离Kmeans聚类总距离和为1.14402。默认情况下,Kmeans使用一组随机选择的初始质心位置开始聚类过程。K-means从随机选择的不同质心开始重复聚类过程。该最佳解的每个簇内的距离之和为1.1440,具有欧几里得距离。K-means三聚类解与欧氏距离提供的平均轮廓值为0.6052。K-means三聚类解的余弦距离平均轮廓值为0.6219。欧几里得距离度量的分层聚类树形图为0.7935。余弦距离测度的分层聚类树状图为0.6678。具有余弦距离的分层聚类树形图的结果与使用三个聚类的K-Means的结果在质量上相似。欧几里得和余弦距离度量的k均值聚类分析与分层聚类树形图的比较。基于K-means对土壤表面数据化学性质的研究和表征,余弦可能是一个较好的距离度量选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信