Pulsar Candidate Selection Using Gaussian Hellinger Extremely Fast Decision Tree

Venoli Gamage, Mohamed Ayoob, Krishnakripa Jayakumar
{"title":"Pulsar Candidate Selection Using Gaussian Hellinger Extremely Fast Decision Tree","authors":"Venoli Gamage, Mohamed Ayoob, Krishnakripa Jayakumar","doi":"10.1109/ICIPRob54042.2022.9798721","DOIUrl":null,"url":null,"abstract":"Radio wave data gathered by pulsar finding telescopes are required to be classified while being streamed. The reason for that is the practical constraints of traditional machine learning algorithms on streaming datasets. Traditional machine learning algorithms would take considerable compute power, memory and time to give pragmatic results.(recent surveys collect data at the rate of 0.5 – 1 terabyte per second) Stream classification algorithms are specifically developed to address the above limitations and can classify data streams without taking up a lot of memory or training time. They relate with characteristics of data streams such as concept drift and limited memory. Extremely Fast Decision Tree is one of the stream classification algorithms that can learn incrementally when it sees new data. However, data from pulsar detecting datastreams are highly imbalanced (there are less examples of pulsars in the data than non-pulsar objects). Learning incrementally from such a datastream would be a destructive interference for the model’s precision (of detecting pulsars). In this research, we introduce an improved version of the Extremely Fast Decision Tree, that is able to learn imbalanced data streams. Our approach is fast, accurate, and avoids the pitfalls of class imbalance and concept drift.","PeriodicalId":435575,"journal":{"name":"2022 2nd International Conference on Image Processing and Robotics (ICIPRob)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Image Processing and Robotics (ICIPRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIPRob54042.2022.9798721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Radio wave data gathered by pulsar finding telescopes are required to be classified while being streamed. The reason for that is the practical constraints of traditional machine learning algorithms on streaming datasets. Traditional machine learning algorithms would take considerable compute power, memory and time to give pragmatic results.(recent surveys collect data at the rate of 0.5 – 1 terabyte per second) Stream classification algorithms are specifically developed to address the above limitations and can classify data streams without taking up a lot of memory or training time. They relate with characteristics of data streams such as concept drift and limited memory. Extremely Fast Decision Tree is one of the stream classification algorithms that can learn incrementally when it sees new data. However, data from pulsar detecting datastreams are highly imbalanced (there are less examples of pulsars in the data than non-pulsar objects). Learning incrementally from such a datastream would be a destructive interference for the model’s precision (of detecting pulsars). In this research, we introduce an improved version of the Extremely Fast Decision Tree, that is able to learn imbalanced data streams. Our approach is fast, accurate, and avoids the pitfalls of class imbalance and concept drift.
基于高斯海灵格极快决策树的脉冲星候选选择
脉冲星探测望远镜收集的无线电波数据需要在传输的同时进行分类。其原因是传统机器学习算法在流数据集上的实际限制。传统的机器学习算法需要相当大的计算能力、内存和时间才能给出实用的结果。(最近的调查以每秒0.5 - 1tb的速率收集数据)流分类算法是专门为解决上述限制而开发的,它可以在不占用大量内存或训练时间的情况下对数据流进行分类。它们与数据流的特征有关,如概念漂移和有限内存。极快决策树是一种流分类算法,它可以在看到新数据时进行增量学习。然而,来自脉冲星探测数据流的数据是高度不平衡的(数据中脉冲星的例子比非脉冲星的例子少)。从这样的数据流中逐步学习将对模型的精度(探测脉冲星)造成破坏性干扰。我们的方法快速、准确,避免了类不平衡和概念漂移的陷阱。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信