{"title":"How effective oversampling techniques are in classifying potentially hazardous asteroids","authors":"Md. Sadman, Mir Sakhawat Hossain","doi":"10.1140/epjp/s13360-025-06783-2","DOIUrl":null,"url":null,"abstract":"<div><p>Potentially hazardous asteroids require strict surveillance to ensure the safety of our planet. However, the vast amount of increasing astronomical data makes it troublesome for humans to study these asteroids. Hence, machine learning techniques are used to classify these hazardous asteroids. However, machine learning models are not robust for distinguishing imbalanced classes. Various undersampling and oversampling techniques are used to address this problem. In our study, we refrained from using any undersampling technique as we did not want to lose any valuable information. Instead, we employed various oversampling techniques, including random Oversampling, SMOTE (Synthetic minority over-sampling technique), ADASYN (Adaptive Synthetic Sampling), BorderlineSmote, KMeansSmote, and SVMSmote. For each oversampling technique, we trained the Random Forest, XGBoost, LightGBM, HistGradientBoosting, and AdaBoost classifiers. Our research presents a detailed study of these oversampling techniques to determine which one is more suitable for our dataset.</p></div>","PeriodicalId":792,"journal":{"name":"The European Physical Journal Plus","volume":"140 9","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The European Physical Journal Plus","FirstCategoryId":"4","ListUrlMain":"https://link.springer.com/article/10.1140/epjp/s13360-025-06783-2","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Potentially hazardous asteroids require strict surveillance to ensure the safety of our planet. However, the vast amount of increasing astronomical data makes it troublesome for humans to study these asteroids. Hence, machine learning techniques are used to classify these hazardous asteroids. However, machine learning models are not robust for distinguishing imbalanced classes. Various undersampling and oversampling techniques are used to address this problem. In our study, we refrained from using any undersampling technique as we did not want to lose any valuable information. Instead, we employed various oversampling techniques, including random Oversampling, SMOTE (Synthetic minority over-sampling technique), ADASYN (Adaptive Synthetic Sampling), BorderlineSmote, KMeansSmote, and SVMSmote. For each oversampling technique, we trained the Random Forest, XGBoost, LightGBM, HistGradientBoosting, and AdaBoost classifiers. Our research presents a detailed study of these oversampling techniques to determine which one is more suitable for our dataset.
期刊介绍:
The aims of this peer-reviewed online journal are to distribute and archive all relevant material required to document, assess, validate and reconstruct in detail the body of knowledge in the physical and related sciences.
The scope of EPJ Plus encompasses a broad landscape of fields and disciplines in the physical and related sciences - such as covered by the topical EPJ journals and with the explicit addition of geophysics, astrophysics, general relativity and cosmology, mathematical and quantum physics, classical and fluid mechanics, accelerator and medical physics, as well as physics techniques applied to any other topics, including energy, environment and cultural heritage.