疾病暴发期间用于自动追踪接触者的机器学习模型

Healthcare analytics (New York, N.Y.) Pub Date : 2025-03-08 DOI:10.1016/j.health.2025.100389

Zeyad Aklah , Amean Al-Safi , Marwa H. Abdali , Khalid Al-jabery

{"title":"疾病暴发期间用于自动追踪接触者的机器学习模型","authors":"Zeyad Aklah , Amean Al-Safi , Marwa H. Abdali , Khalid Al-jabery","doi":"10.1016/j.health.2025.100389","DOIUrl":null,"url":null,"abstract":"<div><div>This study aims to develop and evaluate a conceptual model for assessing the Risk of Infection (ROI) within the context of automated digital contact tracing during pandemics. The proposed model incorporates five input parameters: distance, overlap time, contamination interval, incubation time, and contact facility size. These parameters capture various aspects of disease transmission dynamics. The model employs logistic functions to quantify the influence of each parameter on the overall ROI. The evaluation of the model involves two methods: a partial evaluation to observe the impact of parameter pairs on ROI, and a full evaluation, which is trained on a dataset of 24,000 simulated scenarios to identify central clusters for high, medium, and low-risk categories using K-means and the Hidden Markov Model. Additionally, the model is tested on another 16,000 simulated scenarios to assess its overall performance. Results indicate that the Hidden Markov Model categorizes 63.8% of the testing dataset as low risk, 20.7% as medium risk, and 15.5% as high risk. In contrast, K-means classifies 44.3% as low risk, 30.7% as medium risk, and 25% as high risk. The evaluation metrics favor the Hidden Markov Model, which demonstrates higher performance in terms of Log-Likelihood, with a value of 50,688, as well as in the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with values of -101,365.6430 and -101,319.5609, respectively. In both evaluations, the results validate the model’s ability to automate digital contact tracing based on the input parameters. Future studies could explore classification accuracy using real contact tracing datasets. The proposed approach enhances the efficiency of public health authorities by directing their efforts toward individuals with the highest risk of infection, rather than applying the same level of intervention indiscriminately to everyone.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100389"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A machine learning model for automated contact tracing during disease outbreaks\",\"authors\":\"Zeyad Aklah , Amean Al-Safi , Marwa H. Abdali , Khalid Al-jabery\",\"doi\":\"10.1016/j.health.2025.100389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study aims to develop and evaluate a conceptual model for assessing the Risk of Infection (ROI) within the context of automated digital contact tracing during pandemics. The proposed model incorporates five input parameters: distance, overlap time, contamination interval, incubation time, and contact facility size. These parameters capture various aspects of disease transmission dynamics. The model employs logistic functions to quantify the influence of each parameter on the overall ROI. The evaluation of the model involves two methods: a partial evaluation to observe the impact of parameter pairs on ROI, and a full evaluation, which is trained on a dataset of 24,000 simulated scenarios to identify central clusters for high, medium, and low-risk categories using K-means and the Hidden Markov Model. Additionally, the model is tested on another 16,000 simulated scenarios to assess its overall performance. Results indicate that the Hidden Markov Model categorizes 63.8% of the testing dataset as low risk, 20.7% as medium risk, and 15.5% as high risk. In contrast, K-means classifies 44.3% as low risk, 30.7% as medium risk, and 25% as high risk. The evaluation metrics favor the Hidden Markov Model, which demonstrates higher performance in terms of Log-Likelihood, with a value of 50,688, as well as in the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with values of -101,365.6430 and -101,319.5609, respectively. In both evaluations, the results validate the model’s ability to automate digital contact tracing based on the input parameters. Future studies could explore classification accuracy using real contact tracing datasets. The proposed approach enhances the efficiency of public health authorities by directing their efforts toward individuals with the highest risk of infection, rather than applying the same level of intervention indiscriminately to everyone.</div></div>\",\"PeriodicalId\":73222,\"journal\":{\"name\":\"Healthcare analytics (New York, N.Y.)\",\"volume\":\"7 \",\"pages\":\"Article 100389\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare analytics (New York, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772442525000085\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442525000085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在开发和评估在大流行期间自动数字接触者追踪背景下评估感染风险（ROI）的概念模型。提出的模型包含五个输入参数：距离、重叠时间、污染间隔、孵化时间和接触设施大小。这些参数反映了疾病传播动力学的各个方面。该模型采用logistic函数来量化各参数对整体ROI的影响。模型的评估包括两种方法：部分评估，观察参数对对ROI的影响；全面评估，在24,000个模拟场景的数据集上进行训练，使用K-means和隐马尔可夫模型识别高、中、低风险类别的中心聚类。此外，该模型还在另外16,000个模拟场景中进行了测试，以评估其整体性能。结果表明，隐马尔可夫模型将测试数据集的63.8%分类为低风险，20.7%为中等风险，15.5%为高风险。相比之下，K-means将44.3%分类为低风险，30.7%为中风险，25%为高风险。评价指标倾向于隐马尔可夫模型，它在对数似然方面表现出更高的性能，其值为50,688，而赤池信息准则（AIC）和贝叶斯信息准则（BIC）的值分别为-101,365.6430和-101,319.5609。在这两个评估中，结果验证了模型基于输入参数自动数字接触跟踪的能力。未来的研究可以利用真实的接触追踪数据集来探索分类的准确性。拟议的方法提高了公共卫生当局的效率，将工作重点放在感染风险最高的个人身上，而不是不分青红皂白地对所有人实施同样水平的干预。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A machine learning model for automated contact tracing during disease outbreaks

This study aims to develop and evaluate a conceptual model for assessing the Risk of Infection (ROI) within the context of automated digital contact tracing during pandemics. The proposed model incorporates five input parameters: distance, overlap time, contamination interval, incubation time, and contact facility size. These parameters capture various aspects of disease transmission dynamics. The model employs logistic functions to quantify the influence of each parameter on the overall ROI. The evaluation of the model involves two methods: a partial evaluation to observe the impact of parameter pairs on ROI, and a full evaluation, which is trained on a dataset of 24,000 simulated scenarios to identify central clusters for high, medium, and low-risk categories using K-means and the Hidden Markov Model. Additionally, the model is tested on another 16,000 simulated scenarios to assess its overall performance. Results indicate that the Hidden Markov Model categorizes 63.8% of the testing dataset as low risk, 20.7% as medium risk, and 15.5% as high risk. In contrast, K-means classifies 44.3% as low risk, 30.7% as medium risk, and 25% as high risk. The evaluation metrics favor the Hidden Markov Model, which demonstrates higher performance in terms of Log-Likelihood, with a value of 50,688, as well as in the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC), with values of -101,365.6430 and -101,319.5609, respectively. In both evaluations, the results validate the model’s ability to automate digital contact tracing based on the input parameters. Future studies could explore classification accuracy using real contact tracing datasets. The proposed approach enhances the efficiency of public health authorities by directing their efforts toward individuals with the highest risk of infection, rather than applying the same level of intervention indiscriminately to everyone.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Healthcare analytics (New York, N.Y.) Applied Mathematics, Modelling and Simulation, Nursing and Health Professions (General)

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

79 days