Hugo Manuel Proença, R. Klijn, Thomas Bäck, M. Leeuwen
{"title":"Identifying flight delay patterns using diverse subgroup discovery","authors":"Hugo Manuel Proença, R. Klijn, Thomas Bäck, M. Leeuwen","doi":"10.1109/SSCI.2018.8628933","DOIUrl":null,"url":null,"abstract":"Flight delay is a common hassle that affects around one fourth of flights and has been a major concern for airlines for decades. Therefore, an increasing amount of research was done on this topic in recent years. Notably, the fields of machine learning and data mining have proposed various solutions for the prediction of flight delays, typically some hours before departure. However, the most important decisions made by airlines that could benefit from such predictions, i.e., those on scheduled block time and crew schedules, are made between two to six months prior to departure. Consequently, late delay predictions are useless for these scheduling tasks.As accurately predicting delays for individual flights a long time in advance is practically infeasible, we instead propose to search for circumstances associated to large delays. For this we propose to use diverse Subgroup Discovery (SD), a data mining technique that allows to discover subsets of the data that 1) deviate from the overall data with regard to some target variable, and 2) can be described by a simple conjunctive query on the other variables. We apply diverse SD to historic flight data and mine subgroups of flights that, on average, have a large delay. We show that this approach gives subgroups that can be easily understood by experts, despite the fact that non-trivial relations between multiple variables can be discovered. We show that using diverse SD gives less redundant results than standard top-k SD and demonstrate that even in situations where inferring an accurate predictive model is infeasible, local deviations can be effectively captured and described by local patterns, potentially providing valuable insights for, e.g., airline scheduling problems.","PeriodicalId":235735,"journal":{"name":"2018 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2018.8628933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Flight delay is a common hassle that affects around one fourth of flights and has been a major concern for airlines for decades. Therefore, an increasing amount of research was done on this topic in recent years. Notably, the fields of machine learning and data mining have proposed various solutions for the prediction of flight delays, typically some hours before departure. However, the most important decisions made by airlines that could benefit from such predictions, i.e., those on scheduled block time and crew schedules, are made between two to six months prior to departure. Consequently, late delay predictions are useless for these scheduling tasks.As accurately predicting delays for individual flights a long time in advance is practically infeasible, we instead propose to search for circumstances associated to large delays. For this we propose to use diverse Subgroup Discovery (SD), a data mining technique that allows to discover subsets of the data that 1) deviate from the overall data with regard to some target variable, and 2) can be described by a simple conjunctive query on the other variables. We apply diverse SD to historic flight data and mine subgroups of flights that, on average, have a large delay. We show that this approach gives subgroups that can be easily understood by experts, despite the fact that non-trivial relations between multiple variables can be discovered. We show that using diverse SD gives less redundant results than standard top-k SD and demonstrate that even in situations where inferring an accurate predictive model is infeasible, local deviations can be effectively captured and described by local patterns, potentially providing valuable insights for, e.g., airline scheduling problems.