Determining the Optimum Number of Clusters in Hierarchical Clustering Using Pseudo-F

Steven Jansen Sinaga, Neva Satyahadewi., Hendra Perdana
{"title":"Determining the Optimum Number of Clusters in Hierarchical Clustering Using Pseudo-F","authors":"Steven Jansen Sinaga, Neva Satyahadewi., Hendra Perdana","doi":"10.37905/euler.v11i2.23113","DOIUrl":null,"url":null,"abstract":"Poverty refers to the condition where a person cannot meet the basic necessities based on the minimum living standards. Statistics Indonesia proxied an increase in the poverty rate in North Sumatra Province in 2021 from 8.75% to 9.01%. However, this increase is exclusive to North Sumatra Province, which has Indonesia's 3rd largest number of districts/cities. This study discussed mapping the North Sumatra Province region based on 10 poverty factor variables. The 10 variables are life expectancy, health complaints, poverty line, Gross Regional Domestic Product (GRDP), population growth rate, Expected Years of Schooling (EYS), Human Development Index (HDI), labor force participation rate, open unemployment rate, and district/city minimum wage. The Hierarchical Clustering analysis was employed to compare single, complete, and average linkage methods. The best method was determined based on the pseudo-F statistic value. 4 clusters had complete linkage methods, each of which possessed varied characteristics. Cluster 1 contains cities with the lowest poverty rate, including Medan City and  Pematang Siantar City. Cluster 2 consists of cities with low poverty rates, while Cluster 3 consists of cities with high poverty rates. Cities that are included in Cluster 4 have very high poverty rates, including South Nias District and Pakpak Bharat District. The clusters present significant poverty rate gaps among North Sumatra Province regions.","PeriodicalId":504964,"journal":{"name":"Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37905/euler.v11i2.23113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Poverty refers to the condition where a person cannot meet the basic necessities based on the minimum living standards. Statistics Indonesia proxied an increase in the poverty rate in North Sumatra Province in 2021 from 8.75% to 9.01%. However, this increase is exclusive to North Sumatra Province, which has Indonesia's 3rd largest number of districts/cities. This study discussed mapping the North Sumatra Province region based on 10 poverty factor variables. The 10 variables are life expectancy, health complaints, poverty line, Gross Regional Domestic Product (GRDP), population growth rate, Expected Years of Schooling (EYS), Human Development Index (HDI), labor force participation rate, open unemployment rate, and district/city minimum wage. The Hierarchical Clustering analysis was employed to compare single, complete, and average linkage methods. The best method was determined based on the pseudo-F statistic value. 4 clusters had complete linkage methods, each of which possessed varied characteristics. Cluster 1 contains cities with the lowest poverty rate, including Medan City and  Pematang Siantar City. Cluster 2 consists of cities with low poverty rates, while Cluster 3 consists of cities with high poverty rates. Cities that are included in Cluster 4 have very high poverty rates, including South Nias District and Pakpak Bharat District. The clusters present significant poverty rate gaps among North Sumatra Province regions.
使用伪 F 确定分层聚类中的最佳聚类数
贫困是指一个人无法满足最低生活标准的基本需求。根据印尼统计局的预测,2021 年北苏门答腊省的贫困率将从 8.75% 上升至 9.01%。然而,这一增长仅限于北苏门答腊省,因为该省拥有印尼第三多的县/市。本研究讨论了根据 10 个贫困因素变量绘制北苏门答腊省地区地图的问题。这 10 个变量分别是预期寿命、健康投诉、贫困线、地区国内生产总值(GRDP)、人口增长率、预期受教育年数(EYS)、人类发展指数(HDI)、劳动力参与率、公开失业率和地区/城市最低工资。采用了层次聚类分析来比较单一联系、完全联系和平均联系方法。根据伪 F 统计量值确定最佳方法。4 个聚类具有完整的联系方法,每个聚类都具有不同的特征。群组 1 包含贫困率最低的城市,包括棉兰市和 Pematang Siantar 市。第 2 组包括贫困率较低的城市,第 3 组包括贫困率较高的城市。第 4 组城市的贫困率非常高,包括南尼亚斯区和巴帕克巴拉特区。北苏门答腊省各地区之间的贫困率差距很大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信