{"title":"Data Mining with Python","authors":"Tadej Roškarič, S. Bobek","doi":"10.18690/um.epf.5.2022.32","DOIUrl":null,"url":null,"abstract":"As the amount of data in the world is exponentially on the rise, we need all the tools and knowledge we can get to analyse this data and extract valuable information. This allows important stakeholders to make data-driven decisions, thus providing added value in any organisation. The data mining process can be applied in virtually all kinds of organisations ranging from the public to the private sector. Employees use data in their professional lives and therefore need to be familiar with the knowledge discovery process. The focus of this article is Python as a tool for data mining. The authors concluded that Python is a great option for this task since it is open-source, free and comes with a huge community that develops the packages needed for analytics workloads and it also has lots of documentation. Its capabilities are demonstrated at the end of this paper, where the authors have set up a case study relating to airline passenger satisfaction. The main approach is exploratory data analysis through visualisations with the goal of finding hidden patterns in the data. A decision tree machine learning model was also developed to extract the features that contribute to a higher satisfaction level.","PeriodicalId":217320,"journal":{"name":"6th FEB International Scientific Conference 2022","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th FEB International Scientific Conference 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18690/um.epf.5.2022.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As the amount of data in the world is exponentially on the rise, we need all the tools and knowledge we can get to analyse this data and extract valuable information. This allows important stakeholders to make data-driven decisions, thus providing added value in any organisation. The data mining process can be applied in virtually all kinds of organisations ranging from the public to the private sector. Employees use data in their professional lives and therefore need to be familiar with the knowledge discovery process. The focus of this article is Python as a tool for data mining. The authors concluded that Python is a great option for this task since it is open-source, free and comes with a huge community that develops the packages needed for analytics workloads and it also has lots of documentation. Its capabilities are demonstrated at the end of this paper, where the authors have set up a case study relating to airline passenger satisfaction. The main approach is exploratory data analysis through visualisations with the goal of finding hidden patterns in the data. A decision tree machine learning model was also developed to extract the features that contribute to a higher satisfaction level.