精神健康记录中的机器学习：创伤注释的金标准方法。

IF 6.2 1区医学 Q1 PSYCHIATRY

Translational Psychiatry Pub Date : 2025-08-01 DOI:10.1038/s41398-025-03487-0

Eben Holderness, Bruce Atwood, Marc Verhagen, Ann K Shinn, Philip Cawkwell, Hudson Cerruti, James Pustejovsky, Mei-Hua Hall

{"title":"精神健康记录中的机器学习：创伤注释的金标准方法。","authors":"Eben Holderness, Bruce Atwood, Marc Verhagen, Ann K Shinn, Philip Cawkwell, Hudson Cerruti, James Pustejovsky, Mei-Hua Hall","doi":"10.1038/s41398-025-03487-0","DOIUrl":null,"url":null,"abstract":"Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset's suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.","PeriodicalId":23278,"journal":{"name":"Translational Psychiatry","volume":"15 1","pages":"260"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317041/pdf/","citationCount":"0","resultStr":"{\"title\":\"Machine learning in psychiatric health records: A gold standard approach to trauma annotation.\",\"authors\":\"Eben Holderness, Bruce Atwood, Marc Verhagen, Ann K Shinn, Philip Cawkwell, Hudson Cerruti, James Pustejovsky, Mei-Hua Hall\",\"doi\":\"10.1038/s41398-025-03487-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset's suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.\",\"PeriodicalId\":23278,\"journal\":{\"name\":\"Translational Psychiatry\",\"volume\":\"15 1\",\"pages\":\"260\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317041/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Translational Psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1038/s41398-025-03487-0\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Translational Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41398-025-03487-0","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

摘要

精神病学电子健康记录由于其非结构化、复杂和可变的性质，对机器学习提出了独特的挑战。本研究旨在通过识别一组患有精神障碍和创伤后应激障碍（PTSD）的患者来创建一个金标准数据集，开发临床指导方针，在他们的健康记录中注释创伤事件，创建一个金标准公开可用的数据集，并证明该数据集适合训练机器学习模型，以检测新记录中的症状、物质使用和创伤指标。我们从一个集中的数据库中编制了200个叙述性重健康记录（470,489个令牌）的代表性语料库，并与临床专家和计算语言学团队一起制定了详细的注释方案。临床医生对语料库中与创伤相关的事件和相关的临床信息进行注释，注释者之间的一致性很高（实体/跨度标签为0.715，属性为0.874）。此外，我们还开发了机器学习模型来证明黄金标准语料库在机器学习应用中的实际可行性，在跨度和属性上分别获得了0.76和0.82的微F1分数，表明了它们的预测可靠性。本研究建立了第一个金标准数据集，用于标记精神健康记录中创伤特征的复杂任务。高注释者间一致性和模型性能说明了它在推进机器学习在精神卫生保健中的应用方面的效用，以便更好地理解疾病异质性和治疗意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Machine learning in psychiatric health records: A gold standard approach to trauma annotation.

查看原文本刊更多论文

Machine learning in psychiatric health records: A gold standard approach to trauma annotation.

Psychiatric electronic health records present unique challenges for machine learning due to their unstructured, complex, and variable nature. This study aimed to create a gold standard dataset by identifying a cohort of patients with psychotic disorders and posttraumatic stress disorder, (PTSD), developing clinically-informed guidelines for annotating traumatic events in their health records and to create a gold standard publicly available dataset, and demonstrating the dataset's suitability for training machine learning models to detect indicators of symptoms, substance use, and trauma in new records. We compiled a representative corpus of 200 narrative heavy health records (470,489 tokens) from a centralized database and developed a detailed annotation scheme with a team of clinical experts and computational linguistics. Clinicians annotated the corpus for trauma-related events and relevant clinical information with high inter-annotator agreement (0.715 for entity/span tags and 0.874 for attributes). Additionally, machine learning models were developed to demonstrate practical viability of the gold standard corpus for machine learning applications, achieving a micro F1 score of 0.76 and 0.82 for spans and attributes respectively, indicative of their predictive reliability. This study established the first gold-standard dataset for the complex task of labelling traumatic features in psychiatric health records. High inter-annotator agreement and model performance illustrate its utility in advancing the application of machine learning in psychiatric healthcare in order to better understand disease heterogeneity and treatment implications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Translational Psychiatry PSYCHIATRY-

CiteScore

11.50

自引率

2.90%

发文量

484

审稿时长

23 weeks

期刊介绍： Psychiatry has suffered tremendously by the limited translational pipeline. Nobel laureate Julius Axelrod''s discovery in 1961 of monoamine reuptake by pre-synaptic neurons still forms the basis of contemporary antidepressant treatment. There is a grievous gap between the explosion of knowledge in neuroscience and conceptually novel treatments for our patients. Translational Psychiatry bridges this gap by fostering and highlighting the pathway from discovery to clinical applications, healthcare and global health. We view translation broadly as the full spectrum of work that marks the pathway from discovery to global health, inclusive. The steps of translation that are within the scope of Translational Psychiatry include (i) fundamental discovery, (ii) bench to bedside, (iii) bedside to clinical applications (clinical trials), (iv) translation to policy and health care guidelines, (v) assessment of health policy and usage, and (vi) global health. All areas of medical research, including — but not restricted to — molecular biology, genetics, pharmacology, imaging and epidemiology are welcome as they contribute to enhance the field of translational psychiatry.