The Semantic Adjacency Criterion in Time Intervals Mining

IF 4.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Big Data and Cognitive Computing Pub Date : 2023-11-09 DOI:10.3390/bdcc7040173

Alexander Shknevsky, Yuval Shahar, Robert Moskovitch

{"title":"The Semantic Adjacency Criterion in Time Intervals Mining","authors":"Alexander Shknevsky, Yuval Shahar, Robert Moskovitch","doi":"10.3390/bdcc7040173","DOIUrl":null,"url":null,"abstract":"We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" 94","pages":"0"},"PeriodicalIF":4.4000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc7040173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 3

Abstract

We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.

查看原文本刊更多论文

时间间隔挖掘中的语义邻接准则

我们提出了一种新的修剪约束，当挖掘频繁的时间模式用于分类和预测特征时，语义邻接准则[SAC]，它过滤掉包含潜在语义矛盾成分的时间模式，利用每个医学领域的知识。我们定义了三个SAC版本，并在三个医学领域(肿瘤学、肝炎、糖尿病)和一个频繁时间模式发现框架中对它们进行了测试。之前，我们已经证明，使用SAC可以提高在同一临床领域内不同患者组中以相似比例发现相同时间模式的可重复性。在这里，我们关注SAC在模式发现、分类和预测方面的计算意义，使用发现的模式作为特征，通过四种不同的机器学习方法:随机森林、Naïve贝叶斯、支持向量机和逻辑回归。在所有医学领域和分类方法中，使用SAC可以显著减少发现的时间模式的数量，最多可减少97%，在发现过程的运行时中，最多可减少98%。然而，当使用高度简化的语义透明模式集作为特征时，产生的分类和预测模型的性能至少与使用完整时间模式集产生的模型一样好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊