B. Senthil Kumar, Harvey Vanlalpeka, J. Zohmingthanga, N. S. Kumar, L. Hmingliana, Lalrempuia Sailo
{"title":"Logistic Regression for Gastric Cancer Classification using epidemiological risk factors in Cases and Controls","authors":"B. Senthil Kumar, Harvey Vanlalpeka, J. Zohmingthanga, N. S. Kumar, L. Hmingliana, Lalrempuia Sailo","doi":"10.22232/stj.2021.09.02.19","DOIUrl":null,"url":null,"abstract":"The main purpose of this study is to design a machine learning classifier that can accurately classify between gastric cancer (cases) patient and healthy individuals (controls) from epidemiological and environmental factors. The dataset contains missing values which are replaced by median using imputation technique. The basic idea of this work is to reduce the cost function by applying gradient descent to detect the optimal global minima. The proposed logistic regression has utilized 29 features as the input and produces an accuracy of 98.51%. This accuracy is achieved with learning rate 0.000915 and number of iterations 150000, which are devised for training the logistic regression model.","PeriodicalId":22107,"journal":{"name":"Silpakorn University Science and Technology Journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Silpakorn University Science and Technology Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22232/stj.2021.09.02.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The main purpose of this study is to design a machine learning classifier that can accurately classify between gastric cancer (cases) patient and healthy individuals (controls) from epidemiological and environmental factors. The dataset contains missing values which are replaced by median using imputation technique. The basic idea of this work is to reduce the cost function by applying gradient descent to detect the optimal global minima. The proposed logistic regression has utilized 29 features as the input and produces an accuracy of 98.51%. This accuracy is achieved with learning rate 0.000915 and number of iterations 150000, which are devised for training the logistic regression model.