Renato Vimieiro , Juliana Barcellos Mattos , Paulo S.G. de Mattos Neto
{"title":"EsmamDS: A more diverse exceptional survival model mining approach","authors":"Renato Vimieiro , Juliana Barcellos Mattos , Paulo S.G. de Mattos Neto","doi":"10.1016/j.ins.2024.121549","DOIUrl":null,"url":null,"abstract":"<div><div>In this work we present an Ant Colony Optimization heuristic to find subgroups with exceptional behavior in time-to-event data. The area of time-to-event or survival data analysis has its basis in statistics, where the main goal is to predict <em>if</em> and <em>when</em> an event will happen. In other words, the main goal in survival analysis has long been to build global models able to predict the time for the occurrence of an event. Nevertheless, very often predictive models are used to compare stratified data in order to evaluate whether a variable is associated or not with the outcome. For instance, patients might be stratified according to a treatment variable (placebo or not) to compare models (survival curves) and decide on the effectiveness of the treatment. Although this is an effective approach if the variable of interest is already known, it does not provide an alternative for the cases where specialists do not know how to stratify the data, that is, if they do not know which variable could be related to the outcome. Our approach targets exactly this. Our method seeks combinations of variables that are associated, i.e. describe, subgroups of individuals with unexpected or exceptional survival curves. In this sense, we complement the literature with a descriptive approach that is able to find and characterize those groups for specialists. Our method is based on the framework of exceptional model mining. It improves on a preliminary version presented in a conference. The main enhancement was to redesign our heuristic to retrieve interesting and diverse subgroups while minimizing three aspects of redundancy: coverage; description; and model. Our second extension regards how the quality function is applied. We now allow users to control whether the quality measure compares subgroups against the population, or against individuals that do not satisfy the descriptive rule. Third, we conduct further experiments to compare the performance of our approach to state of the art algorithms with real world benchmark data sets. Finally, we also present a case study showing a possible application of our method in the bioinformatics/health domain.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"690 ","pages":"Article 121549"},"PeriodicalIF":8.1000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524014634","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this work we present an Ant Colony Optimization heuristic to find subgroups with exceptional behavior in time-to-event data. The area of time-to-event or survival data analysis has its basis in statistics, where the main goal is to predict if and when an event will happen. In other words, the main goal in survival analysis has long been to build global models able to predict the time for the occurrence of an event. Nevertheless, very often predictive models are used to compare stratified data in order to evaluate whether a variable is associated or not with the outcome. For instance, patients might be stratified according to a treatment variable (placebo or not) to compare models (survival curves) and decide on the effectiveness of the treatment. Although this is an effective approach if the variable of interest is already known, it does not provide an alternative for the cases where specialists do not know how to stratify the data, that is, if they do not know which variable could be related to the outcome. Our approach targets exactly this. Our method seeks combinations of variables that are associated, i.e. describe, subgroups of individuals with unexpected or exceptional survival curves. In this sense, we complement the literature with a descriptive approach that is able to find and characterize those groups for specialists. Our method is based on the framework of exceptional model mining. It improves on a preliminary version presented in a conference. The main enhancement was to redesign our heuristic to retrieve interesting and diverse subgroups while minimizing three aspects of redundancy: coverage; description; and model. Our second extension regards how the quality function is applied. We now allow users to control whether the quality measure compares subgroups against the population, or against individuals that do not satisfy the descriptive rule. Third, we conduct further experiments to compare the performance of our approach to state of the art algorithms with real world benchmark data sets. Finally, we also present a case study showing a possible application of our method in the bioinformatics/health domain.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.