Ailton J.B. Júnior , Jéferson A.S. Fortunato , Leonardo J. Silvestre , Thonimar V. Alencar , Wiliam S. Hipólito-Ricaldi
{"title":"Comparative analysis of machine learning techniques for feature selection and classification of Fast Radio Bursts","authors":"Ailton J.B. Júnior , Jéferson A.S. Fortunato , Leonardo J. Silvestre , Thonimar V. Alencar , Wiliam S. Hipólito-Ricaldi","doi":"10.1016/j.jheap.2025.100449","DOIUrl":null,"url":null,"abstract":"<div><div>Fast Radio Bursts (FRBs) are millisecond-duration radio transients of extragalactic origin, exhibiting a wide range of physical and observational properties. Distinguishing between repeating and non-repeating FRBs remains a key challenge in understanding their nature. In this work, we apply unsupervised machine learning techniques to classify FRBs based on both primary observables from the CHIME catalog and physically motivated derived features. We evaluate three hybrid pipelines combining dimensionality reduction with clustering: Principal Component Analysis (PCA) + k-means, t-distributed Stochastic Neighbor Embedding (t-SNE) + Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), and t-SNE + Spectral Clustering. To identify optimal hyperparameters, we implement a comprehensive grid search using a custom scoring function that prioritizes recall while penalizing excessive cluster fragmentation and noise. Feature relevance is assessed using principal component loadings, mutual information with the known repeater label, and permutation-based F<sub>2</sub> score sensitivity. Our results demonstrate that the derived features, including redshift, luminosity, and spectral properties, such as the spectral index and the spectral running, significantly enhance the classification performance. Finally, we identify a set of FRBs currently labeled as non-repeaters that consistently cluster with known repeaters across all methods, highlighting promising candidates for future follow-up observations and reinforcing the utility of unsupervised approaches in FRB population studies.</div></div>","PeriodicalId":54265,"journal":{"name":"Journal of High Energy Astrophysics","volume":"49 ","pages":"Article 100449"},"PeriodicalIF":10.5000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of High Energy Astrophysics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214404825001302","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0
Abstract
Fast Radio Bursts (FRBs) are millisecond-duration radio transients of extragalactic origin, exhibiting a wide range of physical and observational properties. Distinguishing between repeating and non-repeating FRBs remains a key challenge in understanding their nature. In this work, we apply unsupervised machine learning techniques to classify FRBs based on both primary observables from the CHIME catalog and physically motivated derived features. We evaluate three hybrid pipelines combining dimensionality reduction with clustering: Principal Component Analysis (PCA) + k-means, t-distributed Stochastic Neighbor Embedding (t-SNE) + Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), and t-SNE + Spectral Clustering. To identify optimal hyperparameters, we implement a comprehensive grid search using a custom scoring function that prioritizes recall while penalizing excessive cluster fragmentation and noise. Feature relevance is assessed using principal component loadings, mutual information with the known repeater label, and permutation-based F2 score sensitivity. Our results demonstrate that the derived features, including redshift, luminosity, and spectral properties, such as the spectral index and the spectral running, significantly enhance the classification performance. Finally, we identify a set of FRBs currently labeled as non-repeaters that consistently cluster with known repeaters across all methods, highlighting promising candidates for future follow-up observations and reinforcing the utility of unsupervised approaches in FRB population studies.
期刊介绍:
The journal welcomes manuscripts on theoretical models, simulations, and observations of highly energetic astrophysical objects both in our Galaxy and beyond. Among those, black holes at all scales, neutron stars, pulsars and their nebula, binaries, novae and supernovae, their remnants, active galaxies, and clusters are just a few examples. The journal will consider research across the whole electromagnetic spectrum, as well as research using various messengers, such as gravitational waves or neutrinos. Effects of high-energy phenomena on cosmology and star-formation, results from dedicated surveys expanding the knowledge of extreme environments, and astrophysical implications of dark matter are also welcomed topics.