{"title":"基于广义线性模型和决策树算法的泰坦尼克号幸存者分析与检测","authors":"Burcu Durmuş, Ö. I. Güneri","doi":"10.18100/ijamec.785297","DOIUrl":null,"url":null,"abstract":"In the article, it is aimed to investigate the factors affecting survival in today's legendary giant accident with different methods. The analysis aims to find the method that best determines survival. For this purpose, logit and probit models from generalized linear models and random tree algorithm from decision tree methods were used. The study was carried out in two stages. Firstly; in the analysis made with generalized linear models, variables that did not contribute significantly to the model were determined. Classification accuracy was found to be 79.89% for the logit model and 79.04% for the probit model. In the second stage; classification analysis was performed with random tree decision trees. Classification accuracy was determined to be 77.21%. In addition; according to the results obtained from the generalized linear models, the classification analysis was repeated by removing the data that made meaningless contribution to the model. The classification rate increased by 4.36% and reached 81.57%. After all; It was determined that the decision tree analysis made with the variables extracted from the model gave better results than the analysis made with the original variables. These results are thought to be useful for researchers working on classification analysis. In addition, the results can be used for purposes such as data preprocessing, data cleaning.","PeriodicalId":120305,"journal":{"name":"International Journal of Applied Mathematics Electronics and Computers","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis and detection of Titanic survivors using generalized linear models and decision tree algorithm\",\"authors\":\"Burcu Durmuş, Ö. I. Güneri\",\"doi\":\"10.18100/ijamec.785297\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the article, it is aimed to investigate the factors affecting survival in today's legendary giant accident with different methods. The analysis aims to find the method that best determines survival. For this purpose, logit and probit models from generalized linear models and random tree algorithm from decision tree methods were used. The study was carried out in two stages. Firstly; in the analysis made with generalized linear models, variables that did not contribute significantly to the model were determined. Classification accuracy was found to be 79.89% for the logit model and 79.04% for the probit model. In the second stage; classification analysis was performed with random tree decision trees. Classification accuracy was determined to be 77.21%. In addition; according to the results obtained from the generalized linear models, the classification analysis was repeated by removing the data that made meaningless contribution to the model. The classification rate increased by 4.36% and reached 81.57%. After all; It was determined that the decision tree analysis made with the variables extracted from the model gave better results than the analysis made with the original variables. These results are thought to be useful for researchers working on classification analysis. In addition, the results can be used for purposes such as data preprocessing, data cleaning.\",\"PeriodicalId\":120305,\"journal\":{\"name\":\"International Journal of Applied Mathematics Electronics and Computers\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Applied Mathematics Electronics and Computers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18100/ijamec.785297\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics Electronics and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18100/ijamec.785297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis and detection of Titanic survivors using generalized linear models and decision tree algorithm
In the article, it is aimed to investigate the factors affecting survival in today's legendary giant accident with different methods. The analysis aims to find the method that best determines survival. For this purpose, logit and probit models from generalized linear models and random tree algorithm from decision tree methods were used. The study was carried out in two stages. Firstly; in the analysis made with generalized linear models, variables that did not contribute significantly to the model were determined. Classification accuracy was found to be 79.89% for the logit model and 79.04% for the probit model. In the second stage; classification analysis was performed with random tree decision trees. Classification accuracy was determined to be 77.21%. In addition; according to the results obtained from the generalized linear models, the classification analysis was repeated by removing the data that made meaningless contribution to the model. The classification rate increased by 4.36% and reached 81.57%. After all; It was determined that the decision tree analysis made with the variables extracted from the model gave better results than the analysis made with the original variables. These results are thought to be useful for researchers working on classification analysis. In addition, the results can be used for purposes such as data preprocessing, data cleaning.