EsmamDS: A more diverse exceptional survival model mining approach

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2024-10-16 DOI:10.1016/j.ins.2024.121549

Renato Vimieiro , Juliana Barcellos Mattos , Paulo S.G. de Mattos Neto

{"title":"EsmamDS: A more diverse exceptional survival model mining approach","authors":"Renato Vimieiro , Juliana Barcellos Mattos , Paulo S.G. de Mattos Neto","doi":"10.1016/j.ins.2024.121549","DOIUrl":null,"url":null,"abstract":"<div><div>In this work we present an Ant Colony Optimization heuristic to find subgroups with exceptional behavior in time-to-event data. The area of time-to-event or survival data analysis has its basis in statistics, where the main goal is to predict <em>if</em> and <em>when</em> an event will happen. In other words, the main goal in survival analysis has long been to build global models able to predict the time for the occurrence of an event. Nevertheless, very often predictive models are used to compare stratified data in order to evaluate whether a variable is associated or not with the outcome. For instance, patients might be stratified according to a treatment variable (placebo or not) to compare models (survival curves) and decide on the effectiveness of the treatment. Although this is an effective approach if the variable of interest is already known, it does not provide an alternative for the cases where specialists do not know how to stratify the data, that is, if they do not know which variable could be related to the outcome. Our approach targets exactly this. Our method seeks combinations of variables that are associated, i.e. describe, subgroups of individuals with unexpected or exceptional survival curves. In this sense, we complement the literature with a descriptive approach that is able to find and characterize those groups for specialists. Our method is based on the framework of exceptional model mining. It improves on a preliminary version presented in a conference. The main enhancement was to redesign our heuristic to retrieve interesting and diverse subgroups while minimizing three aspects of redundancy: coverage; description; and model. Our second extension regards how the quality function is applied. We now allow users to control whether the quality measure compares subgroups against the population, or against individuals that do not satisfy the descriptive rule. Third, we conduct further experiments to compare the performance of our approach to state of the art algorithms with real world benchmark data sets. Finally, we also present a case study showing a possible application of our method in the bioinformatics/health domain.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121549"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014634","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this work we present an Ant Colony Optimization heuristic to find subgroups with exceptional behavior in time-to-event data. The area of time-to-event or survival data analysis has its basis in statistics, where the main goal is to predict if and when an event will happen. In other words, the main goal in survival analysis has long been to build global models able to predict the time for the occurrence of an event. Nevertheless, very often predictive models are used to compare stratified data in order to evaluate whether a variable is associated or not with the outcome. For instance, patients might be stratified according to a treatment variable (placebo or not) to compare models (survival curves) and decide on the effectiveness of the treatment. Although this is an effective approach if the variable of interest is already known, it does not provide an alternative for the cases where specialists do not know how to stratify the data, that is, if they do not know which variable could be related to the outcome. Our approach targets exactly this. Our method seeks combinations of variables that are associated, i.e. describe, subgroups of individuals with unexpected or exceptional survival curves. In this sense, we complement the literature with a descriptive approach that is able to find and characterize those groups for specialists. Our method is based on the framework of exceptional model mining. It improves on a preliminary version presented in a conference. The main enhancement was to redesign our heuristic to retrieve interesting and diverse subgroups while minimizing three aspects of redundancy: coverage; description; and model. Our second extension regards how the quality function is applied. We now allow users to control whether the quality measure compares subgroups against the population, or against individuals that do not satisfy the descriptive rule. Third, we conduct further experiments to compare the performance of our approach to state of the art algorithms with real world benchmark data sets. Finally, we also present a case study showing a possible application of our method in the bioinformatics/health domain.

查看原文本刊更多论文

EsmamDS：更多样化的特殊生存模型挖掘方法

在这项工作中，我们提出了一种蚁群优化启发式方法，用于在时间到事件数据中寻找具有特殊行为的子群。时间到事件或生存数据分析领域的基础是统计学，其主要目标是预测事件是否发生以及何时发生。换句话说，长期以来，生存分析的主要目标是建立能够预测事件发生时间的全局模型。然而，预测模型通常用于比较分层数据，以评估变量是否与结果相关。例如，可以根据治疗变量（安慰剂或非安慰剂）对患者进行分层，以比较模型（生存曲线）并决定治疗的有效性。虽然在相关变量已知的情况下，这是一种有效的方法，但在专家不知道如何对数据进行分层的情况下，也就是在专家不知道哪个变量可能与结果相关的情况下，这种方法并不能提供替代方案。我们的方法正是针对这种情况。我们的方法寻求与之相关的变量组合，即描述具有意外或特殊生存曲线的个体亚群。从这个意义上说，我们用一种描述性方法对文献进行了补充，这种方法能够为专家找到并描述这些群体。我们的方法基于特殊模型挖掘框架。它改进了在一次会议上提出的初步版本。主要的改进是重新设计了我们的启发式，以检索有趣且多样化的子群，同时最大限度地减少冗余的三个方面：覆盖范围、描述和模型。我们的第二个扩展涉及如何应用质量函数。现在，我们允许用户控制质量度量是将子群与总体进行比较，还是与不符合描述规则的个体进行比较。第三，我们进行了进一步的实验，用现实世界的基准数据集比较了我们的方法与最先进算法的性能。最后，我们还介绍了一个案例研究，展示了我们的方法在生物信息学/健康领域的可能应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.