开发一种从自由文本临床记录中识别临床风险的工具：自然语言处理研究。

IF 2

JMIR AI Pub Date : 2025-09-22 DOI:10.2196/64898

Natasha Biscoe, Daniel Leightley, Dominic Murphy

{"title":"开发一种从自由文本临床记录中识别临床风险的工具：自然语言处理研究。","authors":"Natasha Biscoe, Daniel Leightley, Dominic Murphy","doi":"10.2196/64898","DOIUrl":null,"url":null,"abstract":"Background: Electronic patient records are a valuable yet underused data source; they have been explored in research using natural language processing, but not yet within a third-sector organization.Objective: This study aimed to apply natural language processing to develop a risk identification tool capable of discerning high and low suicide risk among veterans, using electronic patient records from a United Kingdom-based veteran mental health charity.Methods: A total of 20,342 notes were extracted for this purpose. To develop the risk tool, 70% of the records formed the training dataset, while the remaining 30% were allocated for testing and evaluation. The classification framework was devised and trained to categorize risk as a binary outcome: 1 indicating high risk and 0 indicating low risk.Results: The efficacy of each classifier model was assessed by comparing its results with those from clinical risk assessments. A logistic regression classifier was found to perform best and was used to develop the final model. This comparison allowed for the calculation of the positive predictive value (mean 0.74, SD 0.059; 95% CI 0.70-0.77), negative predictive value (mean 0.73, SD 0.024; 95% CI 0.72-0.75), sensitivity (mean 0.75, SD 0.017; 95% CI 0.74-0.76), F1-score (mean 0.74, SD 0.033; 95% CI 0.72-0.76), and accuracy, which was measured using the Youden index (mean 0.73, SD 0.035; 95% CI 0.71-0.76).Conclusions: The risk identification tool successfully determined the correct risk category of veterans from a large sample of clinical notes. Future studies should investigate whether this tool can detect more nuanced differences in risk and be generalizable across data sources.","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e64898"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12501529/pdf/","citationCount":"0","resultStr":"{\"title\":\"Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study.\",\"authors\":\"Natasha Biscoe, Daniel Leightley, Dominic Murphy\",\"doi\":\"10.2196/64898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Electronic patient records are a valuable yet underused data source; they have been explored in research using natural language processing, but not yet within a third-sector organization.Objective: This study aimed to apply natural language processing to develop a risk identification tool capable of discerning high and low suicide risk among veterans, using electronic patient records from a United Kingdom-based veteran mental health charity.Methods: A total of 20,342 notes were extracted for this purpose. To develop the risk tool, 70% of the records formed the training dataset, while the remaining 30% were allocated for testing and evaluation. The classification framework was devised and trained to categorize risk as a binary outcome: 1 indicating high risk and 0 indicating low risk.Results: The efficacy of each classifier model was assessed by comparing its results with those from clinical risk assessments. A logistic regression classifier was found to perform best and was used to develop the final model. This comparison allowed for the calculation of the positive predictive value (mean 0.74, SD 0.059; 95% CI 0.70-0.77), negative predictive value (mean 0.73, SD 0.024; 95% CI 0.72-0.75), sensitivity (mean 0.75, SD 0.017; 95% CI 0.74-0.76), F1-score (mean 0.74, SD 0.033; 95% CI 0.72-0.76), and accuracy, which was measured using the Youden index (mean 0.73, SD 0.035; 95% CI 0.71-0.76).Conclusions: The risk identification tool successfully determined the correct risk category of veterans from a large sample of clinical notes. Future studies should investigate whether this tool can detect more nuanced differences in risk and be generalizable across data sources.\",\"PeriodicalId\":73551,\"journal\":{\"name\":\"JMIR AI\",\"volume\":\"4 \",\"pages\":\"e64898\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12501529/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/64898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/64898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

背景：电子病历是一个有价值但未充分利用的数据源；它们已经在使用自然语言处理的研究中进行了探索，但尚未在第三部门组织中进行。目的：本研究旨在利用英国退伍军人心理健康慈善机构的电子病历，应用自然语言处理技术开发一种能够识别退伍军人自杀风险高低的风险识别工具。方法：共提取20,342个音符。为了开发风险工具，70%的记录形成了训练数据集，而剩余的30%被分配用于测试和评估。设计并训练了分类框架，将风险分类为二元结果：1表示高风险，0表示低风险。结果：通过与临床风险评估结果的比较，评价各分类器模型的疗效。发现逻辑回归分类器表现最好，并用于开发最终模型。该比较允许计算阳性预测值（平均值0.74,SD 0.059, 95% CI 0.70-0.77）、阴性预测值（平均值0.73,SD 0.024, 95% CI 0.72-0.75）、敏感性（平均值0.75,SD 0.017, 95% CI 0.74-0.76）、f1评分（平均值0.74,SD 0.033, 95% CI 0.72-0.76）和准确度，使用约登指数（平均值0.73,SD 0.035, 95% CI 0.71-0.76）进行测量。结论：风险识别工具成功地从大量临床记录样本中确定了退伍军人的正确风险类别。未来的研究应该调查该工具是否可以检测到更细微的风险差异，并在数据来源之间进行推广。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study.

Background: Electronic patient records are a valuable yet underused data source; they have been explored in research using natural language processing, but not yet within a third-sector organization.

Objective: This study aimed to apply natural language processing to develop a risk identification tool capable of discerning high and low suicide risk among veterans, using electronic patient records from a United Kingdom-based veteran mental health charity.

Methods: A total of 20,342 notes were extracted for this purpose. To develop the risk tool, 70% of the records formed the training dataset, while the remaining 30% were allocated for testing and evaluation. The classification framework was devised and trained to categorize risk as a binary outcome: 1 indicating high risk and 0 indicating low risk.

Results: The efficacy of each classifier model was assessed by comparing its results with those from clinical risk assessments. A logistic regression classifier was found to perform best and was used to develop the final model. This comparison allowed for the calculation of the positive predictive value (mean 0.74, SD 0.059; 95% CI 0.70-0.77), negative predictive value (mean 0.73, SD 0.024; 95% CI 0.72-0.75), sensitivity (mean 0.75, SD 0.017; 95% CI 0.74-0.76), F₁-score (mean 0.74, SD 0.033; 95% CI 0.72-0.76), and accuracy, which was measured using the Youden index (mean 0.73, SD 0.035; 95% CI 0.71-0.76).

Conclusions: The risk identification tool successfully determined the correct risk category of veterans from a large sample of clinical notes. Future studies should investigate whether this tool can detect more nuanced differences in risk and be generalizable across data sources.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR AI

自引率

0.00%

发文量