Computation of persistent homology on streaming data using topological data summaries

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Intelligence Pub Date : 2023-07-30 DOI:10.1111/coin.12597

Anindya Moitra, Nicholas O. Malott, Philip A. Wilsey

{"title":"Computation of persistent homology on streaming data using topological data summaries","authors":"Anindya Moitra, Nicholas O. Malott, Philip A. Wilsey","doi":"10.1111/coin.12597","DOIUrl":null,"url":null,"abstract":"<p>Persistent homology is a computationally intensive and yet extremely powerful tool for Topological Data Analysis. Applying the tool on potentially infinite sequence of data objects is a challenging task. For this reason, persistent homology and data stream mining have long been two important but disjoint areas of data science. The first computational model, that was recently introduced to bridge the gap between the two areas, is useful for detecting steady or gradual changes in data streams, such as certain genomic modifications during the evolution of species. However, that model is not suitable for applications that encounter abrupt changes of extremely short duration. This paper presents another model for computing persistent homology on streaming data that addresses the shortcoming of the previous work. The model is validated on the important real-world application of network anomaly detection. It is shown that in addition to detecting the occurrence of anomalies or attacks in computer networks, the proposed model is able to visually identify several types of traffic. Moreover, the model can accurately detect abrupt changes of extremely short as well as longer duration in the network traffic. These capabilities are not achievable by the previous model or by traditional data mining techniques.</p>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"39 5","pages":"860-899"},"PeriodicalIF":1.8000,"publicationDate":"2023-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.12597","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Persistent homology is a computationally intensive and yet extremely powerful tool for Topological Data Analysis. Applying the tool on potentially infinite sequence of data objects is a challenging task. For this reason, persistent homology and data stream mining have long been two important but disjoint areas of data science. The first computational model, that was recently introduced to bridge the gap between the two areas, is useful for detecting steady or gradual changes in data streams, such as certain genomic modifications during the evolution of species. However, that model is not suitable for applications that encounter abrupt changes of extremely short duration. This paper presents another model for computing persistent homology on streaming data that addresses the shortcoming of the previous work. The model is validated on the important real-world application of network anomaly detection. It is shown that in addition to detecting the occurrence of anomalies or attacks in computer networks, the proposed model is able to visually identify several types of traffic. Moreover, the model can accurately detect abrupt changes of extremely short as well as longer duration in the network traffic. These capabilities are not achievable by the previous model or by traditional data mining techniques.

查看原文本刊更多论文

使用拓扑数据摘要计算流数据的持久同源性

持久同源性是拓扑数据分析的一种计算密集但功能极其强大的工具。将该工具应用于可能无限序列的数据对象是一项具有挑战性的任务。因此，持久同源性和数据流挖掘长期以来一直是数据科学的两个重要但不相交的领域。最近引入的第一个计算模型是为了弥合这两个领域之间的差距，它有助于检测数据流中的稳定或渐进变化，例如物种进化过程中的某些基因组修饰。然而，该模型不适用于遇到持续时间极短的突然变化的应用程序。本文提出了另一种在流数据上计算持久同源性的模型，解决了先前工作的不足。该模型在网络异常检测的重要现实应用中得到了验证。结果表明，除了检测计算机网络中异常或攻击的发生外，所提出的模型还能够直观地识别几种类型的流量。此外，该模型可以准确地检测网络流量中持续时间极短和较长的突然变化。这些功能是以前的模型或传统数据挖掘技术无法实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational Intelligence 工程技术-计算机：人工智能

CiteScore

6.90

自引率

3.60%

发文量

审稿时长

>12 weeks

期刊介绍： This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.