基于优化策略的多重特征选择，用于健康数据的因果分析。

IF 3.4 3区医学 Q1 MEDICAL INFORMATICS

Health Information Science and Systems Pub Date : 2024-11-12 eCollection Date: 2024-12-01 DOI:10.1007/s13755-024-00312-8

Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin

{"title":"基于优化策略的多重特征选择，用于健康数据的因果分析。","authors":"Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin","doi":"10.1007/s13755-024-00312-8","DOIUrl":null,"url":null,"abstract":"Purpose: Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.Methods: We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.Results: The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.Conclusion: By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.","PeriodicalId":46312,"journal":{"name":"Health Information Science and Systems","volume":"12 1","pages":"52"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11554952/pdf/","citationCount":"0","resultStr":"{\"title\":\"Multiple feature selection based on an optimization strategy for causal analysis of health data.\",\"authors\":\"Ruichen Cong, Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin\",\"doi\":\"10.1007/s13755-024-00312-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.Methods: We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.Results: The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.Conclusion: By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.\",\"PeriodicalId\":46312,\"journal\":{\"name\":\"Health Information Science and Systems\",\"volume\":\"12 1\",\"pages\":\"52\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11554952/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Health Information Science and Systems\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s13755-024-00312-8\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/12/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Information Science and Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13755-024-00312-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

目的信息技术和可穿戴设备的最新进展通过健康数据分析彻底改变了医疗保健。从复杂的健康数据中找出重要的关系，有助于加强医疗保健和公共卫生战略。在健康分析中，因果图对于研究健康特征之间的关系非常重要。然而，由于特征数量大、复杂性高和计算要求高，它们面临着挑战。特征选择方法有助于应对这些挑战。在本文中，我们提出了一个基于优化策略的多特征选择框架，用于健康数据的因果分析：我们根据优化策略选择多个健康特征。首先，我们定义了一个加权总分（WTS）指数，用于评估不同特征选择方法组合后的特征重要性。为了探索每种方法的最佳权重集，我们设计了一种与贪婪算法相结合的多重特征选择算法。然后根据 WTS 对特征进行排序，从而选出最重要的特征。然后，根据所选特征构建因果图，并评估路径的统计意义。此外，我们还在为本研究收集的实验数据集和糖尿病公开数据集上进行了评估实验：结果表明，我们的方法在提高模型性能的同时减少了特征数量，从而优于基线模型。此外，通过因果图揭示的特征间关系的统计意义在两个数据集上都得到了验证：结论：通过使用基于因果分析优化策略的多特征选择框架，减少了特征数量，揭示并验证了因果关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multiple feature selection based on an optimization strategy for causal analysis of health data.

查看原文本刊更多论文

Multiple feature selection based on an optimization strategy for causal analysis of health data.

Purpose: Recent advancements in information technology and wearable devices have revolutionized healthcare through health data analysis. Identifying significant relationships in complex health data enhances healthcare and public health strategies. In health analytics, causal graphs are important for investigating the relationships among health features. However, they face challenges owing to the large number of features, complexity, and computational demands. Feature selection methods are useful for addressing these challenges. In this paper, we present a framework for multiple feature selection based on an optimization strategy for causal analysis of health data.

Methods: We select multiple health features based on an optimization strategy. First, we define a Weighted Total Score (WTS) index to assess the feature importance after the combination of different feature selection methods. To explore an optimal set of weights for each method, we design a multiple feature selection algorithm integrated with the greedy algorithm. The features are then ranked according to their WTS, enabling selection of the most important ones. After that, causal graphs are constructed based on the selected features, and the statistical significance of the paths is assessed. Furthermore, evaluation experiments are conducted on an experiment dataset collected for this study and an open dataset for diabetes.

Results: The results demonstrate that our approach outperforms baseline models by reducing the number of features while improving model performance. Moreover, the statistical significance of the relationships between features uncovered through causal graphs is validated for both datasets.

Conclusion: By using the proposed framework for multiple feature selection based on an optimization strategy for causal analysis, the number of features is reduced and the causal relationships are uncovered and validated.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Health Information Science and Systems MEDICAL INFORMATICS-

CiteScore

11.30

自引率

5.00%

发文量

期刊介绍： Health Information Science and Systems is a multidisciplinary journal that integrates artificial intelligence/computer science/information technology with health science and services, embracing information science research coupled with topics related to the modeling, design, development, integration and management of health information systems, smart health, artificial intelligence in medicine, and computer aided diagnosis, medical expert systems. The scope includes: i.) smart health, artificial Intelligence in medicine, computer aided diagnosis, medical image processing, medical expert systems ii.) medical big data, medical/health/biomedicine information resources such as patient medical records, devices and equipments, software and tools to capture, store, retrieve, process, analyze, optimize the use of information in the health domain, iii.) data management, data mining, and knowledge discovery, all of which play a key role in decision making, management of public health, examination of standards, privacy and security issues, iv.) development of new architectures and applications for health information systems.