CogProg：利用大型语言模型预测即时健康评估。

IF 8

ACM transactions on computing for healthcare Pub Date : 2025-04-01 Epub Date: 2025-04-24 DOI:10.1145/3709153

Gina Sprint, Maureen Schmitter-Edgecombe, Raven Weaver, Lisa Wiese, Diane J Cook

{"title":"CogProg：利用大型语言模型预测即时健康评估。","authors":"Gina Sprint, Maureen Schmitter-Edgecombe, Raven Weaver, Lisa Wiese, Diane J Cook","doi":"10.1145/3709153","DOIUrl":null,"url":null,"abstract":"Forecasting future health status is beneficial for understanding health patterns and providing anticipatory support for cognitive and physical health difficulties. In recent years, generative large language models (LLMs) have shown promise as forecasters. Though not traditionally considered strong candidates for numeric tasks, LLMs demonstrate emerging abilities to address various forecasting problems. They also provide the ability to incorporate unstructured information and explain their reasoning process. In this paper, we explore whether LLMs can effectively forecast future self-reported health state. To do this, we utilized in-the-moment assessments of mental sharpness, fatigue, and stress from multiple studies, utilizing daily responses (N=106 participants) and responses that are accompanied by text descriptions of activities (N=32 participants). With these data, we constructed prompt/response pairs to predict a participant's next answer. We fine-tuned several LLMs and applied chain-of-thought prompting evaluating forecasting accuracy and prediction explainability. Notably, we found that LLMs achieved the lowest mean absolute error (MAE) overall (0.851), while gradient boosting achieved the lowest overall root mean squared error (RMSE) (1.356). When additional text context was provided, LLM forecasts achieved the lowest MAE for predicting mental sharpness (0.862), fatigue (1.000), and stress (0.414). These multimodal LLMs further outperformed the numeric baselines in terms of RMSE when predicting stress (0.947), although numeric algorithms achieved the best RMSE results for mental sharpness (1.246) and fatigue (1.587). This study offers valuable insights for future applications of LLMs in health-based forecasting. The findings suggest that LLMs, when supplemented with additional text information, can be effective tools for improving health forecasting accuracy.","PeriodicalId":72043,"journal":{"name":"ACM transactions on computing for healthcare","volume":"6 2","pages":""},"PeriodicalIF":8.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330958/pdf/","citationCount":"0","resultStr":"{\"title\":\"CogProg: Utilizing Large Language Models to Forecast In-the-moment Health Assessment.\",\"authors\":\"Gina Sprint, Maureen Schmitter-Edgecombe, Raven Weaver, Lisa Wiese, Diane J Cook\",\"doi\":\"10.1145/3709153\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Forecasting future health status is beneficial for understanding health patterns and providing anticipatory support for cognitive and physical health difficulties. In recent years, generative large language models (LLMs) have shown promise as forecasters. Though not traditionally considered strong candidates for numeric tasks, LLMs demonstrate emerging abilities to address various forecasting problems. They also provide the ability to incorporate unstructured information and explain their reasoning process. In this paper, we explore whether LLMs can effectively forecast future self-reported health state. To do this, we utilized in-the-moment assessments of mental sharpness, fatigue, and stress from multiple studies, utilizing daily responses (N=106 participants) and responses that are accompanied by text descriptions of activities (N=32 participants). With these data, we constructed prompt/response pairs to predict a participant's next answer. We fine-tuned several LLMs and applied chain-of-thought prompting evaluating forecasting accuracy and prediction explainability. Notably, we found that LLMs achieved the lowest mean absolute error (MAE) overall (0.851), while gradient boosting achieved the lowest overall root mean squared error (RMSE) (1.356). When additional text context was provided, LLM forecasts achieved the lowest MAE for predicting mental sharpness (0.862), fatigue (1.000), and stress (0.414). These multimodal LLMs further outperformed the numeric baselines in terms of RMSE when predicting stress (0.947), although numeric algorithms achieved the best RMSE results for mental sharpness (1.246) and fatigue (1.587). This study offers valuable insights for future applications of LLMs in health-based forecasting. The findings suggest that LLMs, when supplemented with additional text information, can be effective tools for improving health forecasting accuracy.\",\"PeriodicalId\":72043,\"journal\":{\"name\":\"ACM transactions on computing for healthcare\",\"volume\":\"6 2\",\"pages\":\"\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330958/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM transactions on computing for healthcare\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3709153\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM transactions on computing for healthcare","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3709153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

预测未来的健康状况有助于了解健康模式，并为认知和身体健康困难提供预期支持。近年来，生成式大型语言模型（llm）作为预测者显示出了希望。虽然传统上不认为法学硕士是数字任务的有力候选人，但法学硕士在解决各种预测问题方面表现出了新兴的能力。它们还提供了整合非结构化信息和解释其推理过程的能力。在本文中，我们探讨LLMs是否可以有效地预测未来的自我报告健康状态。为了做到这一点，我们利用了来自多个研究的心理敏锐度、疲劳和压力的即时评估，利用了日常回复（N=106参与者）和附有活动文字描述的回复（N=32参与者）。有了这些数据，我们构建了提示/回答对来预测参与者的下一个答案。我们对几个法学硕士进行了微调，并应用了思维链来评估预测的准确性和预测的可解释性。值得注意的是，我们发现llm总体上实现了最低的平均绝对误差（MAE）（0.851），而梯度增强实现了最低的总体均方根误差（RMSE）（1.356）。当提供额外的文本上下文时，LLM预测在预测精神敏锐度（0.862）、疲劳（1.000）和压力（0.414）方面达到了最低的MAE。这些多模态llm在预测压力（0.947）方面的RMSE进一步优于数值基线，尽管数值算法在心理敏锐度（1.246）和疲劳（1.587）方面的RMSE结果最好。该研究为法学硕士在基于健康的预测中的未来应用提供了有价值的见解。研究结果表明，llm在补充了额外的文本信息后，可以有效地提高健康预测的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CogProg: Utilizing Large Language Models to Forecast In-the-moment Health Assessment.

Forecasting future health status is beneficial for understanding health patterns and providing anticipatory support for cognitive and physical health difficulties. In recent years, generative large language models (LLMs) have shown promise as forecasters. Though not traditionally considered strong candidates for numeric tasks, LLMs demonstrate emerging abilities to address various forecasting problems. They also provide the ability to incorporate unstructured information and explain their reasoning process. In this paper, we explore whether LLMs can effectively forecast future self-reported health state. To do this, we utilized in-the-moment assessments of mental sharpness, fatigue, and stress from multiple studies, utilizing daily responses (N=106 participants) and responses that are accompanied by text descriptions of activities (N=32 participants). With these data, we constructed prompt/response pairs to predict a participant's next answer. We fine-tuned several LLMs and applied chain-of-thought prompting evaluating forecasting accuracy and prediction explainability. Notably, we found that LLMs achieved the lowest mean absolute error (MAE) overall (0.851), while gradient boosting achieved the lowest overall root mean squared error (RMSE) (1.356). When additional text context was provided, LLM forecasts achieved the lowest MAE for predicting mental sharpness (0.862), fatigue (1.000), and stress (0.414). These multimodal LLMs further outperformed the numeric baselines in terms of RMSE when predicting stress (0.947), although numeric algorithms achieved the best RMSE results for mental sharpness (1.246) and fatigue (1.587). This study offers valuable insights for future applications of LLMs in health-based forecasting. The findings suggest that LLMs, when supplemented with additional text information, can be effective tools for improving health forecasting accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM transactions on computing for healthcare

CiteScore

10.30

自引率

0.00%

发文量