{"title":"Effect of dimension reduction with PCA and machine learning algorithms on diabetes diagnosis performance","authors":"Yavuz Bahadir Koca, Elif Aktepe","doi":"10.31127/tuje.1413087","DOIUrl":"https://doi.org/10.31127/tuje.1413087","url":null,"abstract":"Diabetes, a long-term metabolic disorder, causes persistently high blood sugar and presents a significant global health challenge. Early diagnosis is of vital importance in mitigating the effects of diabetes. This study aims to investigate diabetes diagnosis and risk prediction using a comprehensive diabetes dataset created in 2023. The dataset contains clinical and anthropometric data of patients. Data simplification was successfully applied to clean unnecessary information and reduce data dimensionality. Additionally, methods like Principal Component Analysis were applied to decrease the number of variables in the dataset. These analyses rendered the dataset more manageable and improved its performance. In this study, a dataset encompassing health data of a total of 100,000 individuals was utilized. This dataset consists of 8 input features and 1 output feature. The primary objective is to determine the algorithm that exhibits the best performance for diabetes diagnosis. There was no missing data during the data preprocessing stage, and the necessary transformations were carried out successfully. Nine different machine learning algorithms were applied to the dataset in this study. Each algorithm employed various modelling approaches to evaluate its performance in diagnosing diabetes. The results demonstrate that machine learning models are successful in predicting the presence of diabetes and the risk of developing it in healthy individuals. Particularly, the random forest model provided superior results across all performance metrics. This study provides significant findings that can shed light on future research in diabetes diagnosis and risk prediction. Dimensionality reduction techniques have proven to be valuable in data analysis and have highlighted the potential to facilitate diabetes diagnosis, thereby enhancing the quality of life for patients.","PeriodicalId":518565,"journal":{"name":"Turkish Journal of Engineering","volume":" 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141674291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection of cotton leaf disease with machine learning model","authors":"Unain Hyder, Mir Rahib Hussain","doi":"10.31127/tuje.1406755","DOIUrl":"https://doi.org/10.31127/tuje.1406755","url":null,"abstract":"This study aims to use a machine learning (ML) model to accurately classify four datasets of cotton crop leaves as either infected or healthy. Bacterial blight, Curly virus, Fussarium Wilt, and healthy leaves were used as the datasets for the study. ML is a useful tool in detecting cotton leaf diseases and can minimize the rate of disease. The problem is that without machine learning technique it is very difficult and time consuming to detect the diseases then to sort out this problem a machine learning model is proposed and to test the accuracy of the proposed model, the confusion matrix concept was used. The researchers have done their research works to diagnose the diseases by using (ML) model but the drawback of their research was that the results which were given by the different (ML) models were not accurate. The target of the study was to identify diseases affecting the cotton plant in the early stages using traditional techniques. However, utilizing various image processing techniques and machine learning algorithms, including a convolutional neural network, proved to be helpful in diagnosing the diseases. This technological approach can simplify the detection of damaged leaves and minimize the efforts of farmers in detecting those diseases. Cotton is a natural fiber produced on a large scale, and it is grown on 2.5% of overall agronomic land. The detection of cotton leaf diseases is crucial to maintain the crop's productivity and provide reliable earnings to farmers. A confusion matrix is N X N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by machine learning model. This technique has four parameters to test the accuracy of the results which is given in my research work.","PeriodicalId":518565,"journal":{"name":"Turkish Journal of Engineering","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140686711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A lasso regression-based forecasting model for daily gasoline consumption: Türkiye Case","authors":"Ertugrul Ayyıldız, Mirac Murat","doi":"10.31127/tuje.1354501","DOIUrl":"https://doi.org/10.31127/tuje.1354501","url":null,"abstract":"Gasoline is one of the most sought-after resources in the world, where the need for energy is indispensable and continuously increasing for human life today. A shortage of gasoline may negatively affect the economies of countries. Therefore, analyzes and estimates about gasoline consumption are critical. Better forecast performance on gasoline consumption can serve the policymakers, managers, researchers, and other gasoline sector stakeholders. Parallel to the world economy, gasoline consumption in Turkey is among the top among the most consumed energy source. Therefore, it is aimed at forecasting the amount of daily gasoline consumption in Turkey in this study. For this purpose, a lasso regression-based forecasting methodology is proposed. The forecasting approach used for daily gasoline consumption consisting of 3 main stages: i) cleaning the data ii) extracting and selecting features iii) forecasting the future of daily gasoline consumption time series via the proposed models. Besides, Ridge Regression is used to compare the performance of the proposed model.","PeriodicalId":518565,"journal":{"name":"Turkish Journal of Engineering","volume":"68 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140532042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}