Using natural language processing to automatically classify written self-reported narratives by patients with migraine or cluster headache.

The Journal of Headache and Pain Pub Date : 2022-09-30 DOI:10.1186/s10194-022-01490-0

Nicolas Vandenbussche, Cynthia Van Hee, Véronique Hoste, Koen Paemeleire

{"title":"Using natural language processing to automatically classify written self-reported narratives by patients with migraine or cluster headache.","authors":"Nicolas Vandenbussche, Cynthia Van Hee, Véronique Hoste, Koen Paemeleire","doi":"10.1186/s10194-022-01490-0","DOIUrl":null,"url":null,"abstract":"Background: Headache medicine is largely based on detailed history taking by physicians analysing patients' descriptions of headache. Natural language processing (NLP) structures and processes linguistic data into quantifiable units. In this study, we apply these digital techniques on self-reported narratives by patients with headache disorders to research the potential of analysing and automatically classifying human-generated text and information extraction in clinical contexts.Methods: A prospective cross-sectional clinical trial collected self-reported narratives on headache disorders from participants with either migraine or cluster headache. NLP was applied for the analysis of lexical, semantic and thematic properties of the texts. Machine learning (ML) algorithms were applied to classify the descriptions of headache attacks from individual participants into their correct group (migraine versus cluster headache).Results: One-hundred and twenty-one patients (81 participants with migraine and 40 participants with cluster headache) provided a self-reported narrative on their headache disorder. Lexical analysis of this text corpus resulted in several specific key words per diagnostic group (cluster headache: Dutch (nl): \"oog\" | English (en): \"eye\", nl: \"pijn\" | en: \"pain\" and nl: \"terug\" | en: \"back/to come back\"; migraine: nl: \"hoofdpijn\" | en: \"headache\", nl: \"stress\" | en: \"stress\" and nl: \"misselijkheid\" | en: \"nausea\"). Thematic and sentiment analysis of text revealed largely negative sentiment in texts by both patients with migraine and cluster headache. Logistic regression and support vector machine algorithms with different feature groups performed best for the classification of attack descriptions (with F1-scores for detecting cluster headache varying between 0.82 and 0.86) compared to naïve Bayes classifiers.Conclusions: Differences in lexical choices between patients with migraine and cluster headache are detected with NLP and are congruent with domain expert knowledge of the disorders. Our research shows that ML algorithms have potential to classify patients' self-reported narratives of migraine or cluster headache with good performance. NLP shows its capability to discern relevant linguistic aspects in narratives from patients with different headache disorders and demonstrates relevance in clinical information extraction. The potential benefits on the classification performance of larger datasets and neural NLP methods can be investigated in the future.Trial registration: This study was registered with clinicaltrials.gov with ID NCT05377437.","PeriodicalId":501630,"journal":{"name":"The Journal of Headache and Pain","volume":" ","pages":"129"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9524092/pdf/","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Headache and Pain","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s10194-022-01490-0","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Background: Headache medicine is largely based on detailed history taking by physicians analysing patients' descriptions of headache. Natural language processing (NLP) structures and processes linguistic data into quantifiable units. In this study, we apply these digital techniques on self-reported narratives by patients with headache disorders to research the potential of analysing and automatically classifying human-generated text and information extraction in clinical contexts.

Methods: A prospective cross-sectional clinical trial collected self-reported narratives on headache disorders from participants with either migraine or cluster headache. NLP was applied for the analysis of lexical, semantic and thematic properties of the texts. Machine learning (ML) algorithms were applied to classify the descriptions of headache attacks from individual participants into their correct group (migraine versus cluster headache).

Results: One-hundred and twenty-one patients (81 participants with migraine and 40 participants with cluster headache) provided a self-reported narrative on their headache disorder. Lexical analysis of this text corpus resulted in several specific key words per diagnostic group (cluster headache: Dutch (nl): "oog" | English (en): "eye", nl: "pijn" | en: "pain" and nl: "terug" | en: "back/to come back"; migraine: nl: "hoofdpijn" | en: "headache", nl: "stress" | en: "stress" and nl: "misselijkheid" | en: "nausea"). Thematic and sentiment analysis of text revealed largely negative sentiment in texts by both patients with migraine and cluster headache. Logistic regression and support vector machine algorithms with different feature groups performed best for the classification of attack descriptions (with F1-scores for detecting cluster headache varying between 0.82 and 0.86) compared to naïve Bayes classifiers.

Conclusions: Differences in lexical choices between patients with migraine and cluster headache are detected with NLP and are congruent with domain expert knowledge of the disorders. Our research shows that ML algorithms have potential to classify patients' self-reported narratives of migraine or cluster headache with good performance. NLP shows its capability to discern relevant linguistic aspects in narratives from patients with different headache disorders and demonstrates relevance in clinical information extraction. The potential benefits on the classification performance of larger datasets and neural NLP methods can be investigated in the future.

Trial registration: This study was registered with clinicaltrials.gov with ID NCT05377437.

Abstract Image

查看原文本刊更多论文

使用自然语言处理对偏头痛或丛集性头痛患者的书面自我报告叙述进行自动分类。

背景:头痛医学在很大程度上是基于医生对患者头痛描述的详细病史分析。自然语言处理(NLP)将语言数据结构和处理成可量化的单位。在这项研究中，我们将这些数字技术应用于头痛疾病患者的自我报告叙述，以研究在临床环境中分析和自动分类人类生成的文本和信息提取的潜力。方法:一项前瞻性横断面临床试验收集了偏头痛或丛集性头痛参与者关于头痛疾病的自我报告叙述。运用自然语言处理对语篇的词法、语义和主位特征进行分析。应用机器学习(ML)算法将个体参与者的头痛发作描述分类到正确的组(偏头痛与丛集性头痛)。结果:121名患者(81名偏头痛患者和40名丛集性头痛患者)提供了他们头痛障碍的自我报告叙述。对该文本语料库的词法分析得出每个诊断组有几个特定的关键词(丛集性头痛:荷兰语(nl):“oog”|英语(en):“eye”，nl:“pijn”| en:“pain”和nl:“terug”| en:“back/to come back”;偏头痛:n1:“hoofdpijn”| en:“头痛”，n1:“压力”| en:“压力”和n1:“misselijkheid”| en:“恶心”)。文本的主题和情感分析揭示了偏头痛和丛集性头痛患者在文本中的大部分负面情绪。与naïve贝叶斯分类器相比，具有不同特征组的逻辑回归和支持向量机算法在攻击描述分类方面表现最好(检测聚类头痛的f1分数在0.82到0.86之间变化)。结论:用NLP检测到偏头痛和丛集性头痛患者在词汇选择上的差异，并与疾病的领域专家知识一致。我们的研究表明，ML算法有潜力对患者自我报告的偏头痛或丛集性头痛进行分类，并且表现良好。NLP显示出其在不同头痛疾病患者叙述中识别相关语言方面的能力，并在临床信息提取中显示出相关性。对更大数据集和神经自然语言处理方法的分类性能的潜在好处可以在未来进行研究。试验注册:本研究已在clinicaltrials.gov注册，ID为NCT05377437。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Headache and Pain

自引率

0.00%

发文量