利用统计和机器学习方法对危险小行星进行不规则纵向数据分析

IF 1.9 4区物理与天体物理 Q2 ASTRONOMY & ASTROPHYSICS

Astronomy and Computing Pub Date : 2024-03-06 DOI:10.1016/j.ascom.2024.100818

I. Tanriverdi , O. Ilk , M.A. Gürkan

{"title":"利用统计和机器学习方法对危险小行星进行不规则纵向数据分析","authors":"I. Tanriverdi , O. Ilk , M.A. Gürkan","doi":"10.1016/j.ascom.2024.100818","DOIUrl":null,"url":null,"abstract":"<div><p>Observations of the asteroids have been performed as long as it has been feasible by the available observational equipment. Recorded data, going back to 18<span><math><mtext>th</mtext></math></span> century, allowed a classification of these celestial objects’ hazardous status. Unfortunately, previous studies used methods that ignore subject dependency in Near-Earth Asteroids (NEA) data. This study aims to perform hazard classification of asteroids by proposing various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the NASA. We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. To the best of our knowledge and based on a comprehensive review of the existing literature, our study stands as the pioneering in the utilization of these advanced methodologies for the in-depth analysis of Near-Earth Asteroid (NEA) data. According to the findings, the accuracies of the models range from 0.89 to 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest one.</p></div>","PeriodicalId":48757,"journal":{"name":"Astronomy and Computing","volume":"47 ","pages":"Article 100818"},"PeriodicalIF":1.9000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids\",\"authors\":\"I. Tanriverdi , O. Ilk , M.A. Gürkan\",\"doi\":\"10.1016/j.ascom.2024.100818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Observations of the asteroids have been performed as long as it has been feasible by the available observational equipment. Recorded data, going back to 18<span><math><mtext>th</mtext></math></span> century, allowed a classification of these celestial objects’ hazardous status. Unfortunately, previous studies used methods that ignore subject dependency in Near-Earth Asteroids (NEA) data. This study aims to perform hazard classification of asteroids by proposing various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the NASA. We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. To the best of our knowledge and based on a comprehensive review of the existing literature, our study stands as the pioneering in the utilization of these advanced methodologies for the in-depth analysis of Near-Earth Asteroid (NEA) data. According to the findings, the accuracies of the models range from 0.89 to 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest one.</p></div>\",\"PeriodicalId\":48757,\"journal\":{\"name\":\"Astronomy and Computing\",\"volume\":\"47 \",\"pages\":\"Article 100818\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Astronomy and Computing\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2213133724000337\",\"RegionNum\":4,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Astronomy and Computing","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213133724000337","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}

引用次数: 0

摘要

对小行星的观测是在现有观测设备可行的情况下进行的，记录的数据可以追溯到 18 世纪，从而可以对这些天体的危险状态进行分类。遗憾的是，以前的研究使用的方法忽略了近地小行星（NEA）数据中的主体依赖性。本研究旨在通过对近地小行星数据提出各种统计和机器学习方法来克服这些缺陷，从而对小行星进行危险分类。我们分析了美国国家航空航天局以不规则时间间隔观测到的 751 颗小行星的数据。我们比较了适合纵向数据结构的算法，如广义线性混合模型（GLMM）、边际模型、GLMM-树、历史随机森林、GPBoost 和 Spline。据我们所知，基于对现有文献的全面回顾，我们的研究在利用这些先进方法深入分析近地小行星（NEA）数据方面开创了先河。研究结果表明，这些模型的精确度在 0.89 到 0.99 之间。GPBoost 模型的性能最高，而边际模型的性能最差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Irregular longitudinal data analysis with statistical and machine learning methods for hazardous asteroids

Observations of the asteroids have been performed as long as it has been feasible by the available observational equipment. Recorded data, going back to 18 $th$ century, allowed a classification of these celestial objects’ hazardous status. Unfortunately, previous studies used methods that ignore subject dependency in Near-Earth Asteroids (NEA) data. This study aims to perform hazard classification of asteroids by proposing various statistical and machine learning methods on NEA data to overcome these shortcomings. We analyze data from 751 asteroids observed at irregular time intervals through the NASA. We compare algorithms suitable for longitudinal data structure, such as the Generalized Linear Mixed Models (GLMM), marginal model, GLMM-Tree, Historical Random Forest, GPBoost, and Spline. To the best of our knowledge and based on a comprehensive review of the existing literature, our study stands as the pioneering in the utilization of these advanced methodologies for the in-depth analysis of Near-Earth Asteroid (NEA) data. According to the findings, the accuracies of the models range from 0.89 to 0.99. The GPBoost model has the highest performance, while the marginal model has the poorest one.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Astronomy and Computing ASTRONOMY & ASTROPHYSICSCOMPUTER SCIENCE,-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

4.10

自引率

8.00%

发文量

期刊介绍： Astronomy and Computing is a peer-reviewed journal that focuses on the broad area between astronomy, computer science and information technology. The journal aims to publish the work of scientists and (software) engineers in all aspects of astronomical computing, including the collection, analysis, reduction, visualisation, preservation and dissemination of data, and the development of astronomical software and simulations. The journal covers applications for academic computer science techniques to astronomy, as well as novel applications of information technologies within astronomy.