{"title":"Dataset dependency of low-density lipoprotein-cholesterol estimation by machine learning.","authors":"Ishida Hidekazu, Hiroki Nagasawa, Yasuko Yamamoto, Hiroki Doi, Midori Saito, Yuya Ishihara, Takashi Fujita, Mariko Ishida, Yohei Kato, Ryosuke Kikuchi, Hidetoshi Matsunami, Masao Takemura, Hiroyasu Ito, Kuniaki Saito","doi":"10.1177/00045632231180408","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.</p><p><strong>Methods: </strong>Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (<i>N</i> = 2664), clinical patients at Gifu University Hospital (<i>N</i> = 7409), and clinical patients at Fujita Health University Hospital (<i>N</i> = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.</p><p><strong>Results: </strong>The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.</p><p><strong>Conclusion: </strong>Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.</p>","PeriodicalId":8005,"journal":{"name":"Annals of Clinical Biochemistry","volume":" ","pages":"396-405"},"PeriodicalIF":2.1000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Clinical Biochemistry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/00045632231180408","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/6/5 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: We evaluated the applicability of a machine learning-based low-density lipoprotein-cholesterol (LDL-C) estimation method and the influence of the characteristics of the training datasets.
Methods: Three training datasets were chosen from training datasets: health check-up participants at the Resource Center for Health Science (N = 2664), clinical patients at Gifu University Hospital (N = 7409), and clinical patients at Fujita Health University Hospital (N = 14,842). Nine different machine learning models were constructed through hyperparameter tuning and 10-fold cross-validation. Another test dataset of another 3711 clinical patients at Fujita Health University Hospital was selected as the test set used for comparing and validating the model against the Friedewald formula and the Martin method.
Results: The coefficients of determination of the models trained on the health check-up dataset produced coefficients of determination that were equal to or inferior to those of the Martin method. In contrast, the coefficients of determination of several models trained on clinical patients exceeded those of the Martin method. The means of the differences and the convergences to the direct method were higher for the models trained on the clinical patients' dataset than for those trained on the health check-up participants' dataset. The models trained on the latter dataset tended to overestimate the 2019 ESC/EAS Guideline for LDL-cholesterol classification.
Conclusion: Although machine learning models provide valuable method for LDL-C estimates, they should be trained on datasets with matched characteristics. The versatility of machine learning methods is another important consideration.
期刊介绍:
Annals of Clinical Biochemistry is the fully peer reviewed international journal of the Association for Clinical Biochemistry and Laboratory Medicine.
Annals of Clinical Biochemistry accepts papers that contribute to knowledge in all fields of laboratory medicine, especially those pertaining to the understanding, diagnosis and treatment of human disease. It publishes papers on clinical biochemistry, clinical audit, metabolic medicine, immunology, genetics, biotechnology, haematology, microbiology, computing and management where they have both biochemical and clinical relevance. Papers describing evaluation or implementation of commercial reagent kits or the performance of new analysers require substantial original information. Unless of exceptional interest and novelty, studies dealing with the redox status in various diseases are not generally considered within the journal''s scope. Studies documenting the association of single nucleotide polymorphisms (SNPs) with particular phenotypes will not normally be considered, given the greater strength of genome wide association studies (GWAS). Research undertaken in non-human animals will not be considered for publication in the Annals.
Annals of Clinical Biochemistry is also the official journal of NVKC (de Nederlandse Vereniging voor Klinische Chemie) and JSCC (Japan Society of Clinical Chemistry).