The Semantic Adjacency Criterion in Time Intervals Mining

IF 3.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Alexander Shknevsky, Yuval Shahar, Robert Moskovitch
{"title":"The Semantic Adjacency Criterion in Time Intervals Mining","authors":"Alexander Shknevsky, Yuval Shahar, Robert Moskovitch","doi":"10.3390/bdcc7040173","DOIUrl":null,"url":null,"abstract":"We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.","PeriodicalId":36397,"journal":{"name":"Big Data and Cognitive Computing","volume":" 94","pages":"0"},"PeriodicalIF":3.7000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/bdcc7040173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 3

Abstract

We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.
时间间隔挖掘中的语义邻接准则
我们提出了一种新的修剪约束,当挖掘频繁的时间模式用于分类和预测特征时,语义邻接准则[SAC],它过滤掉包含潜在语义矛盾成分的时间模式,利用每个医学领域的知识。我们定义了三个SAC版本,并在三个医学领域(肿瘤学、肝炎、糖尿病)和一个频繁时间模式发现框架中对它们进行了测试。之前,我们已经证明,使用SAC可以提高在同一临床领域内不同患者组中以相似比例发现相同时间模式的可重复性。在这里,我们关注SAC在模式发现、分类和预测方面的计算意义,使用发现的模式作为特征,通过四种不同的机器学习方法:随机森林、Naïve贝叶斯、支持向量机和逻辑回归。在所有医学领域和分类方法中,使用SAC可以显著减少发现的时间模式的数量,最多可减少97%,在发现过程的运行时中,最多可减少98%。然而,当使用高度简化的语义透明模式集作为特征时,产生的分类和预测模型的性能至少与使用完整时间模式集产生的模型一样好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Big Data and Cognitive Computing
Big Data and Cognitive Computing Business, Management and Accounting-Management Information Systems
CiteScore
7.10
自引率
8.10%
发文量
128
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信