Cognitive Based Detection of Anomalous Sequences Using Bayesian Surprise

IF 2.3 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems Pub Date : 2025-07-28 DOI:10.1111/exsy.70106

Ken McGarry, David Nelson

{"title":"Cognitive Based Detection of Anomalous Sequences Using Bayesian Surprise","authors":"Ken McGarry, David Nelson","doi":"10.1111/exsy.70106","DOIUrl":null,"url":null,"abstract":"<p>In this work we implement Bayesian surprise as a method to sift through sequences of discrete patterns and identify any unusual or interesting patterns that deviate from known sequences. Surprise is a biological trait inherent in humans and animals and is essential for many creative acts and efforts of discovery. Numerous technical domains are comprised of discrete elements in sequences such as e-commerce transactions, genome data searching, online financial transactions of many types, criminal cyber-attacks and life-course data from sociology. In addition to the complexity and computational burden of this type of problem is the issue of their rarity. Many anomalies are infrequent and may defy categorisation; therefore, they are not suited to classification solutions. We test our methods on four discrete datasets (Hospital Sepsis patients, Chess Moves, the Wisconsin Card Sorting Task and BioFamilies) consisting of discrete sequences. Probabilistic Suffix Trees are trained on this data which maintain each discrete symbol's location and position in a given sequence. The trained models are exposed to “new” data where any deviations from learned patterns either in location on the sequence or frequency of occurrence will denote patterns that are unusual compared with the original training data. To assist in the identification of new patterns and to avoid confusing old patterns as new or novel we use Bayesian surprise to detect the discrepancies between what we are expecting and actual results. We can assign the degree of surprise or unexpectedness to any new pattern and provide an indication of why certain patterns are deemed novel or surprising and why others are not.</p>","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":"42 9","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/exsy.70106","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/exsy.70106","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this work we implement Bayesian surprise as a method to sift through sequences of discrete patterns and identify any unusual or interesting patterns that deviate from known sequences. Surprise is a biological trait inherent in humans and animals and is essential for many creative acts and efforts of discovery. Numerous technical domains are comprised of discrete elements in sequences such as e-commerce transactions, genome data searching, online financial transactions of many types, criminal cyber-attacks and life-course data from sociology. In addition to the complexity and computational burden of this type of problem is the issue of their rarity. Many anomalies are infrequent and may defy categorisation; therefore, they are not suited to classification solutions. We test our methods on four discrete datasets (Hospital Sepsis patients, Chess Moves, the Wisconsin Card Sorting Task and BioFamilies) consisting of discrete sequences. Probabilistic Suffix Trees are trained on this data which maintain each discrete symbol's location and position in a given sequence. The trained models are exposed to “new” data where any deviations from learned patterns either in location on the sequence or frequency of occurrence will denote patterns that are unusual compared with the original training data. To assist in the identification of new patterns and to avoid confusing old patterns as new or novel we use Bayesian surprise to detect the discrepancies between what we are expecting and actual results. We can assign the degree of surprise or unexpectedness to any new pattern and provide an indication of why certain patterns are deemed novel or surprising and why others are not.

Abstract Image

查看原文本刊更多论文

基于贝叶斯惊奇度的异常序列认知检测

在这项工作中，我们将贝叶斯惊讶度作为一种方法来筛选离散模式序列，并识别偏离已知序列的任何不寻常或有趣的模式。惊奇是人类和动物固有的一种生物学特性，对于许多创造性的行为和发现的努力都是必不可少的。许多技术领域由离散的元素组成，如电子商务交易、基因组数据搜索、多种类型的在线金融交易、犯罪网络攻击和社会学的生命历程数据。除了这类问题的复杂性和计算负担之外，它们的稀有性也是一个问题。许多反常现象并不常见，可能无法归类；因此，它们不适合分类解决方案。我们在由离散序列组成的四个离散数据集（医院败血症患者、国际象棋移动、威斯康星卡片分类任务和生物家族）上测试了我们的方法。在此基础上训练概率后缀树，以保持每个离散符号在给定序列中的位置和位置。经过训练的模型暴露在“新”数据中，其中任何与学习模式在序列位置或出现频率上的偏差都将表示与原始训练数据相比不寻常的模式。为了帮助识别新模式并避免将旧模式混淆为新模式或新模式，我们使用贝叶斯惊奇度来检测我们期望的结果与实际结果之间的差异。我们可以为任何新模式分配惊喜或意外的程度，并提供一个指示，说明为什么某些模式被认为是新颖或令人惊讶的，而为什么其他模式不是。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems 工程技术-计算机：理论方法

CiteScore

7.40

自引率

6.10%

发文量

266

审稿时长

24 months

期刊介绍： Expert Systems: The Journal of Knowledge Engineering publishes papers dealing with all aspects of knowledge engineering, including individual methods and techniques in knowledge acquisition and representation, and their application in the construction of systems – including expert systems – based thereon. Detailed scientific evaluation is an essential part of any paper. As well as traditional application areas, such as Software and Requirements Engineering, Human-Computer Interaction, and Artificial Intelligence, we are aiming at the new and growing markets for these technologies, such as Business, Economy, Market Research, and Medical and Health Care. The shift towards this new focus will be marked by a series of special issues covering hot and emergent topics.