{"title":"Predicting the Survivors of the Titanic Kaggle, Machine Learning From Disaster","authors":"Nadine Farag, Ghada Hassan","doi":"10.1145/3220267.3220282","DOIUrl":null,"url":null,"abstract":"April 14th, 1912 was very unfortunate for the most powerful ship ever built at that time, the Titanic. Grievously, 1503 out of 2203 passengers perished the sinking, but the rationale behind survival still remains a question mark. In efforts to study the Titanic passengers; Kaggle, a popular data science website, assembled information about each passenger back in the days of the Titanic into a dataset, and made it available for a competition titled: \"Titanic: Machine Learning from Disaster.\" This research aims to use machine learning techniques on the Titanic data to analyze the data for classification and to predict the survival of the Titanic passengers by using data-mining algorithms; specifically Decision Trees and Naïve Bayes. The prediction and efficiency of these algorithms depend greatly on data analysis and the model. The paper presents an implementation which combines the benefits of feature selection and machine learning to accurately select and distinguish characteristics of passengers' age, class, cabin, and port of embarkation then consequently infer an authentic model for an accurate prediction. The data-set is described and the implementation details and prediction results are presented then compared to other results. The Decision Tree algorithm has accurately predicted 90.01% of the survival of passengers, while the Gaussian Naïve Bayes witnessed 92.52% accuracy in prediction.","PeriodicalId":177522,"journal":{"name":"International Conference on Software and Information Engineering","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Software and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3220267.3220282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
April 14th, 1912 was very unfortunate for the most powerful ship ever built at that time, the Titanic. Grievously, 1503 out of 2203 passengers perished the sinking, but the rationale behind survival still remains a question mark. In efforts to study the Titanic passengers; Kaggle, a popular data science website, assembled information about each passenger back in the days of the Titanic into a dataset, and made it available for a competition titled: "Titanic: Machine Learning from Disaster." This research aims to use machine learning techniques on the Titanic data to analyze the data for classification and to predict the survival of the Titanic passengers by using data-mining algorithms; specifically Decision Trees and Naïve Bayes. The prediction and efficiency of these algorithms depend greatly on data analysis and the model. The paper presents an implementation which combines the benefits of feature selection and machine learning to accurately select and distinguish characteristics of passengers' age, class, cabin, and port of embarkation then consequently infer an authentic model for an accurate prediction. The data-set is described and the implementation details and prediction results are presented then compared to other results. The Decision Tree algorithm has accurately predicted 90.01% of the survival of passengers, while the Gaussian Naïve Bayes witnessed 92.52% accuracy in prediction.