Classification of Paddy Growth Phase with Machine Learning Algorithms to Handle Imbalanced Multi-Class Big Data

Proceedings of The International Conference on Data Science and Official Statistics Pub Date : 2022-01-04 DOI:10.34123/icdsos.v2021i1.45

Hady Suryono, H. Kuswanto, Nur Iriawan

{"title":"Classification of Paddy Growth Phase with Machine Learning Algorithms to Handle Imbalanced Multi-Class Big Data","authors":"Hady Suryono, H. Kuswanto, Nur Iriawan","doi":"10.34123/icdsos.v2021i1.45","DOIUrl":null,"url":null,"abstract":"The global Sustainable Development Goals (SDGs) adopted by countries in the world have significant implications for national development planning in Indonesia in the period 2015 to 2030. The Agricultural sector is one of the most important sectors in the world and has a very important contribution to achieving the goals. Availability of accurate paddy production data must be available to measure the level of food security. This can be done by monitoring the growth phase of paddy and predicting the classification of its growth phase accurately and precisely. The paddy growth phase has 6 classes with the number of class members usually not the same (imbalanced data). This study describes the results of the classification of paddy growth phases with imbalanced data in Bojonegoro Regency, East Java in 2019 using machine learning algorithms on the Google Earth Engine (GEE) platform. Classification is done by Classification and Regression Tree, Support Vector Machine, and Random Forest. Oversampling technique is used to deal the problem of imbalanced data. The Area Sampling Frame survey in 2019 conducted by BPS was used as a label for classification model training. The results showed that the overall accuracy (OA) using the Random Forest algorithm by modifying the dataset using oversampling was 82.30% and the kappa statistic was 0.76, outperforming the SVM and CART algorithms.","PeriodicalId":151043,"journal":{"name":"Proceedings of The International Conference on Data Science and Official Statistics","volume":"405 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The International Conference on Data Science and Official Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34123/icdsos.v2021i1.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The global Sustainable Development Goals (SDGs) adopted by countries in the world have significant implications for national development planning in Indonesia in the period 2015 to 2030. The Agricultural sector is one of the most important sectors in the world and has a very important contribution to achieving the goals. Availability of accurate paddy production data must be available to measure the level of food security. This can be done by monitoring the growth phase of paddy and predicting the classification of its growth phase accurately and precisely. The paddy growth phase has 6 classes with the number of class members usually not the same (imbalanced data). This study describes the results of the classification of paddy growth phases with imbalanced data in Bojonegoro Regency, East Java in 2019 using machine learning algorithms on the Google Earth Engine (GEE) platform. Classification is done by Classification and Regression Tree, Support Vector Machine, and Random Forest. Oversampling technique is used to deal the problem of imbalanced data. The Area Sampling Frame survey in 2019 conducted by BPS was used as a label for classification model training. The results showed that the overall accuracy (OA) using the Random Forest algorithm by modifying the dataset using oversampling was 82.30% and the kappa statistic was 0.76, outperforming the SVM and CART algorithms.

查看原文本刊更多论文

用机器学习算法处理不平衡多类大数据的水稻生长阶段分类

世界各国通过的全球可持续发展目标(SDGs)对印度尼西亚2015年至2030年的国家发展规划具有重要意义。农业部门是世界上最重要的部门之一，对实现这些目标作出了非常重要的贡献。必须提供准确的水稻生产数据，以衡量粮食安全水平。通过对水稻生长阶段的监测，准确、准确地预测其生长阶段的分类，可以做到这一点。水稻生长期有6个类，类成员数量通常不相同(数据不平衡)。本研究描述了在Google Earth Engine (GEE)平台上使用机器学习算法对2019年东爪哇Bojonegoro Regency数据不平衡的水稻生长阶段进行分类的结果。通过分类与回归树、支持向量机和随机森林进行分类。采用过采样技术处理数据不平衡问题。使用BPS 2019年进行的Area Sampling Frame调查作为分类模型训练的标签。结果表明，采用过采样修改数据集的随机森林算法的总体准确率(OA)为82.30%，kappa统计量为0.76，优于SVM和CART算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of The International Conference on Data Science and Official Statistics

自引率

0.00%

发文量