Chih-Hao Shen, Ruei-Hao Huang, Yaw-Kuen Li, Ta-Wei Chu, Dee Pei
{"title":"利用机器学习方法研究挥发性有机化合物在非酒精性脂肪性肝病中的作用。","authors":"Chih-Hao Shen, Ruei-Hao Huang, Yaw-Kuen Li, Ta-Wei Chu, Dee Pei","doi":"10.3389/fmolb.2025.1631265","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>Approximately 25%-30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals.</p><p><strong>Methods: </strong>Participants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC).</p><p><strong>Results: </strong>Subjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone, butyl 2-methylbutanoate, diethylethanolamine, urethane, β-caryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1-butanol.</p><p><strong>Conclusion: </strong>Incorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model's predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.</p>","PeriodicalId":12465,"journal":{"name":"Frontiers in Molecular Biosciences","volume":"12 ","pages":"1631265"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12365380/pdf/","citationCount":"0","resultStr":"{\"title\":\"Using machine learning methods to investigate the role of volatile organic compounds in non-alcoholic fatty liver disease.\",\"authors\":\"Chih-Hao Shen, Ruei-Hao Huang, Yaw-Kuen Li, Ta-Wei Chu, Dee Pei\",\"doi\":\"10.3389/fmolb.2025.1631265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Aims: </strong>Approximately 25%-30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals.</p><p><strong>Methods: </strong>Participants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC).</p><p><strong>Results: </strong>Subjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone, butyl 2-methylbutanoate, diethylethanolamine, urethane, β-caryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1-butanol.</p><p><strong>Conclusion: </strong>Incorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model's predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.</p>\",\"PeriodicalId\":12465,\"journal\":{\"name\":\"Frontiers in Molecular Biosciences\",\"volume\":\"12 \",\"pages\":\"1631265\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12365380/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Molecular Biosciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3389/fmolb.2025.1631265\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Molecular Biosciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fmolb.2025.1631265","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Using machine learning methods to investigate the role of volatile organic compounds in non-alcoholic fatty liver disease.
Aims: Approximately 25%-30% of the global population is affected by non-alcoholic fatty liver disease (NAFLD). This study aimed to explore whether NAFLD could be effectively detected using 341 volatile organic compounds (VOCs) via 10 machine learning (Mach-L) algorithms in a cohort of 1,501 individuals.
Methods: Participants were selected from the Taiwan MJ cohort, which includes comprehensive demographic, biochemical, lifestyle, and VOCs data. NAFLD was diagnosed by experienced gastroenterologists. Exhaled breath samples were collected using a 1.0-L aluminum bag (late expiratory fraction) and analyzed with selected-ion flow-tube mass spectrometry. Ten Mach-L techniques were employed to evaluate two predictive models: Model 1 (demographic, lifestyle, and biochemical data), and Model 2 (Model 1 + VOCs), assessed using area under the receiver operating characteristic curve (AUC).
Results: Subjects with NAFLD had significantly higher values for age, BMI, blood pressure, and other biomedical markers, except for eGFR and HDL-C. Key predictors of NAFLD included BMI, triglycerides (TG), uric acid (UA), fasting plasma glucose (FPG), γ-GT, gender, LDL-C, and sleep duration. The addition of VOCs to Model 1 improved the AUC from 0.722 ± 0.149 to 0.770 ± 0.264 (p < 0.001). Ten VOCs were identified as the most influential, in order of importance: 2-propanol, acetone, butyl 2-methylbutanoate, diethylethanolamine, urethane, β-caryophyllene, furfural, tridecane, 4-methyloctanoic acid, and (S)-2-methyl-1-butanol.
Conclusion: Incorporating VOCs into traditional demographic, biochemical, and lifestyle data significantly enhanced the model's predictive performance. This suggests that VOCs may be associated with the underlying pathophysiology of NAFLD.
期刊介绍:
Much of contemporary investigation in the life sciences is devoted to the molecular-scale understanding of the relationships between genes and the environment — in particular, dynamic alterations in the levels, modifications, and interactions of cellular effectors, including proteins. Frontiers in Molecular Biosciences offers an international publication platform for basic as well as applied research; we encourage contributions spanning both established and emerging areas of biology. To this end, the journal draws from empirical disciplines such as structural biology, enzymology, biochemistry, and biophysics, capitalizing as well on the technological advancements that have enabled metabolomics and proteomics measurements in massively parallel throughput, and the development of robust and innovative computational biology strategies. We also recognize influences from medicine and technology, welcoming studies in molecular genetics, molecular diagnostics and therapeutics, and nanotechnology.
Our ultimate objective is the comprehensive illustration of the molecular mechanisms regulating proteins, nucleic acids, carbohydrates, lipids, and small metabolites in organisms across all branches of life.
In addition to interesting new findings, techniques, and applications, Frontiers in Molecular Biosciences will consider new testable hypotheses to inspire different perspectives and stimulate scientific dialogue. The integration of in silico, in vitro, and in vivo approaches will benefit endeavors across all domains of the life sciences.