缺失值估算法在肝炎数据集上交叉验证的不同倍数的影响

T. Astuti, H. A. Nugroho, T. B. Adji
{"title":"缺失值估算法在肝炎数据集上交叉验证的不同倍数的影响","authors":"T. Astuti, H. A. Nugroho, T. B. Adji","doi":"10.1109/QIR.2015.7374894","DOIUrl":null,"url":null,"abstract":"Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.","PeriodicalId":127270,"journal":{"name":"2015 International Conference on Quality in Research (QiR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"The impact of different fold for cross validation of missing values imputation method on hepatitis dataset\",\"authors\":\"T. Astuti, H. A. Nugroho, T. B. Adji\",\"doi\":\"10.1109/QIR.2015.7374894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.\",\"PeriodicalId\":127270,\"journal\":{\"name\":\"2015 International Conference on Quality in Research (QiR)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Quality in Research (QiR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QIR.2015.7374894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Quality in Research (QiR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QIR.2015.7374894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

肝炎是一种由肝炎病毒引起的肝脏疾病。如今,肝炎是一个全球性的健康问题,包括在印度尼西亚。慢性肝炎可导致肝硬化和肝癌,因此需要早期诊断。为了改善肝炎疾病的诊断过程,已经进行了一些计算机辅助系统开发的研究工作。加州欧文(UCI)机器学习存储库提供可公开访问的肝炎疾病数据集;然而,数据集包含许多缺失值。数据集中缺失值的存在可能会影响结果分析的质量。因此,需要对缺失值进行处理。本文分析了应用不同倍数对缺失值估算方法进行交叉验证的性能。该方法将特征选择方法和机器学习算法结合在肝炎数据集上。在k-fold交叉验证中应用于估算方法的结果没有显示出显著的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The impact of different fold for cross validation of missing values imputation method on hepatitis dataset
Hepatitis is a liver disease caused by hepatitis viruses. Nowadays, hepatitis is a global health problem, including in Indonesia. Chronic hepatitis can lead to cirrhosis and liver cancer, therefore early diagnosis is needed. Several research works on development of computer aided systems have been conducted to improve the diagnosis process of hepatitis disease. California Irvine (UCI) machine-learning repository provides hepatitis disease dataset which can be publicly accessed; however, the dataset contains many missing values. The existing of missing values in the dataset may affect the quality of the results analysis. Therefore, it needs to be conducted for handling the missing values. This paper analyses the performance of applying varied number of fold for cross validation of missing values imputation methods. The imputation method is combined with the feature selection method and machine-learning algorithm on the hepatitis dataset. The results that varied fold in k-fold cross validation which applied in the imputation method does not reveal significant advantages.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信