Efficient identification of broad absorption line quasars using dimensionality reduction and machine learning

IF 2.2 4区 物理与天体物理 Q2 ASTRONOMY & ASTROPHYSICS
Wei-Bo Kao, Yanxia Zhang, Xue-Bing Wu
{"title":"Efficient identification of broad absorption line quasars using dimensionality reduction and machine learning","authors":"Wei-Bo Kao, Yanxia Zhang, Xue-Bing Wu","doi":"10.1093/pasj/psae037","DOIUrl":null,"url":null,"abstract":"Broad Absorption Line Quasars (BALQSOs) represent a significant phenomenon in the realm of quasar astronomy, displaying distinct blueshifted broad absorption lines. These enigmatic objects serve as invaluable probes for unraveling the intricate structure and evolution of quasars, shedding light on the profound influence exerted by supermassive black holes on galaxy formation. The proliferation of large-scale spectroscopic surveys such as LAMOST (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope), SDSS (the Sloan Digital Sky Survey), and DESI (the Dark Energy Spectroscopic Instrument) has exponentially expanded the repository of quasar spectra at our disposal. In this study, we present an innovative approach to streamline the identification of BALQSOs, leveraging the power of dimensionality reduction and machine-learning algorithms. Our dataset is meticulously curated from the SDSS Data Release 16 (DR16), amalgamating quasar spectra with classification labels sourced from the DR16Q quasar catalog. We employ a diverse array of dimensionality-reduction techniques, including principal component analysis (PCA), t-Distributed stochastic neighbor embedding (t-SNE), locally linear embedding (LLE), and isometric mapping (ISOMAP), to distill the essence of the original spectral data. The resultant low-dimensional representations serve as inputs for a suite of machine-learning classifiers, including the robust XGBoost and Random Forest models. Through rigorous experimentation, we unveil PCA as the most effective dimensionality-reduction methodology, adeptly navigating the intricate balance between dimensionality reduction and preservation of vital spectral information. Notably, the synergistic fusion of PCA with the XGBoost classifier emerges as the pinnacle of efficacy in the BALQSO classification endeavor, boasting impressive accuracy rates of $97.60\\%$ by 10-cross validation and $96.92\\%$ on the outer test sample. This study not only introduces a novel machine-learning-based paradigm for quasar classification but also offers invaluable insights transferrable to a myriad of spectral classification challenges pervasive in the realm of astronomy.","PeriodicalId":20733,"journal":{"name":"Publications of the Astronomical Society of Japan","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Publications of the Astronomical Society of Japan","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1093/pasj/psae037","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0

Abstract

Broad Absorption Line Quasars (BALQSOs) represent a significant phenomenon in the realm of quasar astronomy, displaying distinct blueshifted broad absorption lines. These enigmatic objects serve as invaluable probes for unraveling the intricate structure and evolution of quasars, shedding light on the profound influence exerted by supermassive black holes on galaxy formation. The proliferation of large-scale spectroscopic surveys such as LAMOST (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope), SDSS (the Sloan Digital Sky Survey), and DESI (the Dark Energy Spectroscopic Instrument) has exponentially expanded the repository of quasar spectra at our disposal. In this study, we present an innovative approach to streamline the identification of BALQSOs, leveraging the power of dimensionality reduction and machine-learning algorithms. Our dataset is meticulously curated from the SDSS Data Release 16 (DR16), amalgamating quasar spectra with classification labels sourced from the DR16Q quasar catalog. We employ a diverse array of dimensionality-reduction techniques, including principal component analysis (PCA), t-Distributed stochastic neighbor embedding (t-SNE), locally linear embedding (LLE), and isometric mapping (ISOMAP), to distill the essence of the original spectral data. The resultant low-dimensional representations serve as inputs for a suite of machine-learning classifiers, including the robust XGBoost and Random Forest models. Through rigorous experimentation, we unveil PCA as the most effective dimensionality-reduction methodology, adeptly navigating the intricate balance between dimensionality reduction and preservation of vital spectral information. Notably, the synergistic fusion of PCA with the XGBoost classifier emerges as the pinnacle of efficacy in the BALQSO classification endeavor, boasting impressive accuracy rates of $97.60\%$ by 10-cross validation and $96.92\%$ on the outer test sample. This study not only introduces a novel machine-learning-based paradigm for quasar classification but also offers invaluable insights transferrable to a myriad of spectral classification challenges pervasive in the realm of astronomy.
利用降维和机器学习高效识别宽吸收线类星体
宽吸收线类星体(BALQSOs)是类星体天文学领域的一个重要现象,显示出明显的蓝移宽吸收线。这些神秘的天体是揭示类星体复杂结构和演化的宝贵探针,揭示了超大质量黑洞对星系形成的深远影响。随着大天区多目标光纤光谱望远镜(LAMOST)、斯隆数字巡天(SDSS)和暗能量光谱仪(DESI)等大规模光谱巡天的普及,我们所能利用的类星体光谱库也成倍增加。在这项研究中,我们提出了一种创新方法,利用降维和机器学习算法的力量,简化对 BALQSO 的识别。我们的数据集是从 SDSS 第 16 版数据(DR16)中精心整理出来的,它将类星体光谱与来自 DR16Q 类星体目录的分类标签结合在一起。我们采用了多种降维技术,包括主成分分析(PCA)、t-分布随机邻域嵌入(t-SNE)、局部线性嵌入(LLE)和等距映射(ISOMAP),以提炼出原始光谱数据的精华。由此产生的低维表示可作为一套机器学习分类器的输入,其中包括稳健的 XGBoost 和随机森林模型。通过严格的实验,我们发现 PCA 是最有效的降维方法,它能在降维与保留重要光谱信息之间取得巧妙的平衡。值得注意的是,PCA 与 XGBoost 分类器的协同融合在 BALQSO 分类工作中发挥了巅峰功效,在 10 次交叉验证中获得了令人印象深刻的 97.60%$ 的准确率,在外部测试样本中获得了 96.92%$ 的准确率。这项研究不仅为类星体分类引入了一种新颖的基于机器学习的范式,还为天文学领域普遍存在的无数光谱分类挑战提供了宝贵的启示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Publications of the Astronomical Society of Japan
Publications of the Astronomical Society of Japan 地学天文-天文与天体物理
CiteScore
4.10
自引率
13.00%
发文量
98
审稿时长
4-8 weeks
期刊介绍: Publications of the Astronomical Society of Japan (PASJ) publishes the results of original research in all aspects of astronomy, astrophysics, and fields closely related to them.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信