Identifying flight delay patterns using diverse subgroup discovery

2018 IEEE Symposium Series on Computational Intelligence (SSCI) Pub Date : 2018-11-01 DOI:10.1109/SSCI.2018.8628933

Hugo Manuel Proença, R. Klijn, Thomas Bäck, M. Leeuwen

{"title":"Identifying flight delay patterns using diverse subgroup discovery","authors":"Hugo Manuel Proença, R. Klijn, Thomas Bäck, M. Leeuwen","doi":"10.1109/SSCI.2018.8628933","DOIUrl":null,"url":null,"abstract":"Flight delay is a common hassle that affects around one fourth of flights and has been a major concern for airlines for decades. Therefore, an increasing amount of research was done on this topic in recent years. Notably, the fields of machine learning and data mining have proposed various solutions for the prediction of flight delays, typically some hours before departure. However, the most important decisions made by airlines that could benefit from such predictions, i.e., those on scheduled block time and crew schedules, are made between two to six months prior to departure. Consequently, late delay predictions are useless for these scheduling tasks.As accurately predicting delays for individual flights a long time in advance is practically infeasible, we instead propose to search for circumstances associated to large delays. For this we propose to use diverse Subgroup Discovery (SD), a data mining technique that allows to discover subsets of the data that 1) deviate from the overall data with regard to some target variable, and 2) can be described by a simple conjunctive query on the other variables. We apply diverse SD to historic flight data and mine subgroups of flights that, on average, have a large delay. We show that this approach gives subgroups that can be easily understood by experts, despite the fact that non-trivial relations between multiple variables can be discovered. We show that using diverse SD gives less redundant results than standard top-k SD and demonstrate that even in situations where inferring an accurate predictive model is infeasible, local deviations can be effectively captured and described by local patterns, potentially providing valuable insights for, e.g., airline scheduling problems.","PeriodicalId":235735,"journal":{"name":"2018 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2018.8628933","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Flight delay is a common hassle that affects around one fourth of flights and has been a major concern for airlines for decades. Therefore, an increasing amount of research was done on this topic in recent years. Notably, the fields of machine learning and data mining have proposed various solutions for the prediction of flight delays, typically some hours before departure. However, the most important decisions made by airlines that could benefit from such predictions, i.e., those on scheduled block time and crew schedules, are made between two to six months prior to departure. Consequently, late delay predictions are useless for these scheduling tasks.As accurately predicting delays for individual flights a long time in advance is practically infeasible, we instead propose to search for circumstances associated to large delays. For this we propose to use diverse Subgroup Discovery (SD), a data mining technique that allows to discover subsets of the data that 1) deviate from the overall data with regard to some target variable, and 2) can be described by a simple conjunctive query on the other variables. We apply diverse SD to historic flight data and mine subgroups of flights that, on average, have a large delay. We show that this approach gives subgroups that can be easily understood by experts, despite the fact that non-trivial relations between multiple variables can be discovered. We show that using diverse SD gives less redundant results than standard top-k SD and demonstrate that even in situations where inferring an accurate predictive model is infeasible, local deviations can be effectively captured and described by local patterns, potentially providing valuable insights for, e.g., airline scheduling problems.

查看原文本刊更多论文

使用不同的子群发现识别航班延误模式

航班延误是一个常见的麻烦，影响了大约四分之一的航班，几十年来一直是航空公司关注的主要问题。因此，近年来对这一课题的研究越来越多。值得注意的是，机器学习和数据挖掘领域已经提出了各种解决方案来预测航班延误，通常是在起飞前几小时。然而，航空公司可以从这种预测中受益的最重要的决定，即那些关于预定的块时间和机组人员安排的决定，是在起飞前两到六个月做出的。因此，延迟预测对于这些调度任务是无用的。由于提前很长时间准确预测单个航班的延误实际上是不可能的，因此我们建议搜索与大规模延误相关的情况。为此，我们建议使用不同的子组发现(SD)，这是一种数据挖掘技术，允许发现数据的子集，1)在某些目标变量方面偏离整体数据，2)可以通过对其他变量的简单联合查询来描述。我们将不同的SD应用于历史航班数据，并挖掘平均延迟较大的航班子组。我们表明，尽管可以发现多个变量之间的非平凡关系，但这种方法给出了专家可以容易理解的子组。我们表明，使用不同的SD比标准的top-k SD给出更少的冗余结果，并证明即使在推断准确的预测模型不可行的情况下，局部偏差也可以通过本地模式有效地捕获和描述，从而潜在地为航空公司调度问题等提供有价值的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Symposium Series on Computational Intelligence (SSCI)

自引率

0.00%

发文量