利用自然语言处理和机器学习自动识别阿司匹林导致的呼吸道疾病:算法开发与评估研究》。

IF 0.3 3区 社会学 0 ASIAN STUDIES
Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon
{"title":"利用自然语言处理和机器学习自动识别阿司匹林导致的呼吸道疾病:算法开发与评估研究》。","authors":"Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon","doi":"10.2196/44191","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.</p><p><strong>Objective: </strong>Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).</p><p><strong>Methods: </strong>A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.</p><p><strong>Results: </strong>The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.</p><p><strong>Conclusions: </strong>We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.</p>","PeriodicalId":46190,"journal":{"name":"BULLETIN OF THE SCHOOL OF ORIENTAL AND AFRICAN STUDIES-UNIVERSITY OF LONDON","volume":"19 1","pages":"e44191"},"PeriodicalIF":0.3000,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11296676/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study.\",\"authors\":\"Thanai Pongdee, Nicholas B Larson, Rohit Divekar, Suzette J Bielinski, Hongfang Liu, Sungrim Moon\",\"doi\":\"10.2196/44191\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.</p><p><strong>Objective: </strong>Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).</p><p><strong>Methods: </strong>A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.</p><p><strong>Results: </strong>The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.</p><p><strong>Conclusions: </strong>We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.</p>\",\"PeriodicalId\":46190,\"journal\":{\"name\":\"BULLETIN OF THE SCHOOL OF ORIENTAL AND AFRICAN STUDIES-UNIVERSITY OF LONDON\",\"volume\":\"19 1\",\"pages\":\"e44191\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2023-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11296676/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BULLETIN OF THE SCHOOL OF ORIENTAL AND AFRICAN STUDIES-UNIVERSITY OF LONDON\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/44191\",\"RegionNum\":3,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"ASIAN STUDIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BULLETIN OF THE SCHOOL OF ORIENTAL AND AFRICAN STUDIES-UNIVERSITY OF LONDON","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/44191","RegionNum":3,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"ASIAN STUDIES","Score":null,"Total":0}
引用次数: 0

摘要

背景:阿司匹林加重呼吸道疾病(AERD)是一种获得性炎症,其特征是在摄入阿司匹林或其他非甾体抗炎药物(NSAIDs)后出现哮喘、慢性鼻炎伴鼻息肉和呼吸道超敏反应。尽管 AERD 具有典型的症状群,但诊断却常常被忽视,从出现症状到确诊 AERD 平均需要 10 年以上的时间。如果没有确诊,患者将缺乏接受有效治疗的机会,如阿司匹林脱敏治疗或生物药物治疗:我们的目的是开发一种综合算法,将自然语言处理(NLP)和机器学习(ML)技术相结合,从电子健康记录(EHR)中识别出急性胃食管反流病患者:方法:利用梅奥诊所电子病历中的临床文档,开发了一种基于规则的决策树算法,其中包含基于 NLP 的特征。利用 NLP 技术从临床记录中提取了 7 个特征,包括以下内容:AERD、哮喘、非甾体抗炎药过敏、鼻息肉、慢性鼻窦炎、尿液白三烯 E4 水平升高以及无非甾体抗炎药过敏记录。MedTagger 用于从非结构化临床文本中提取这 7 个特征,并根据两位过敏和免疫学专家对 AERD 病历的审查结果给出一组关键字和模式。每个提取出的特征的状态是通过分配其在每个受试者的临床文件中出现的频率来量化的。我们在训练集上优化了决策树分类器的超参数截止阈值,以确定区分 AERD 的代表性特征组合。然后,我们在测试集上对得到的模型进行了评估:结合了 NLP 和 ML 技术的 AERD 算法在测试集上的接收者工作特征曲线下面积得分、灵敏度和特异性分别达到了 0.86(95% CI 0.78-0.94)、80.00(95% CI 70.82-87.33)和 88.00(95% CI 79.98-93.64):我们开发了一种很有前途的 AERD 算法,但还需要进一步改进,以提高 AERD 诊断水平。NLP和ML技术的不断发展有望减少AERD的诊断延误,改善患者的健康状况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automated Identification of Aspirin-Exacerbated Respiratory Disease Using Natural Language Processing and Machine Learning: Algorithm Development and Evaluation Study.

Background: Aspirin-exacerbated respiratory disease (AERD) is an acquired inflammatory condition characterized by the presence of asthma, chronic rhinosinusitis with nasal polyposis, and respiratory hypersensitivity reactions on ingestion of aspirin or other nonsteroidal anti-inflammatory drugs (NSAIDs). Despite AERD having a classic constellation of symptoms, the diagnosis is often overlooked, with an average of greater than 10 years between the onset of symptoms and diagnosis of AERD. Without a diagnosis, individuals will lack opportunities to receive effective treatments, such as aspirin desensitization or biologic medications.

Objective: Our aim was to develop a combined algorithm that integrates both natural language processing (NLP) and machine learning (ML) techniques to identify patients with AERD from an electronic health record (EHR).

Methods: A rule-based decision tree algorithm incorporating NLP-based features was developed using clinical documents from the EHR at Mayo Clinic. From clinical notes, using NLP techniques, 7 features were extracted that included the following: AERD, asthma, NSAID allergy, nasal polyps, chronic sinusitis, elevated urine leukotriene E4 level, and documented no-NSAID allergy. MedTagger was used to extract these 7 features from the unstructured clinical text given a set of keywords and patterns based on the chart review of 2 allergy and immunology experts for AERD. The status of each extracted feature was quantified by assigning the frequency of its occurrence in clinical documents per subject. We optimized the decision tree classifier's hyperparameters cutoff threshold on the training set to determine the representative feature combination to discriminate AERD. We then evaluated the resulting model on the test set.

Results: The AERD algorithm, which combines NLP and ML techniques, achieved an area under the receiver operating characteristic curve score, sensitivity, and specificity of 0.86 (95% CI 0.78-0.94), 80.00 (95% CI 70.82-87.33), and 88.00 (95% CI 79.98-93.64) for the test set, respectively.

Conclusions: We developed a promising AERD algorithm that needs further refinement to improve AERD diagnosis. Continued development of NLP and ML technologies has the potential to reduce diagnostic delays for AERD and improve the health of our patients.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.80
自引率
25.00%
发文量
69
期刊介绍: The Bulletin of the School of Oriental and African Studies is the leading interdisciplinary journal on Asia, Africa and the Near and Middle East. It carries unparalleled coverage of the languages, cultures and civilisations of these regions from ancient times to the present. Publishing articles, review articles, notes and communications of the highest academic standard, it also features an extensive and influential reviews section and an annual index. Published for the School of Oriental and African Studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信