Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data.

IF 3.4 2区 医学 Q2 PSYCHIATRY
Eoin McElroy, Thomas Wood, Raymond Bond, Maurice Mulvenna, Mark Shevlin, George B Ploubidis, Mauricio Scopel Hoffmann, Bettina Moltrecht
{"title":"Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data.","authors":"Eoin McElroy, Thomas Wood, Raymond Bond, Maurice Mulvenna, Mark Shevlin, George B Ploubidis, Mauricio Scopel Hoffmann, Bettina Moltrecht","doi":"10.1186/s12888-024-05954-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Pooling data from different sources will advance mental health research by providing larger sample sizes and allowing cross-study comparisons; however, the heterogeneity in how variables are measured across studies poses a challenge to this process.</p><p><strong>Methods: </strong>This study explored the potential of using natural language processing (NLP) to harmonise different mental health questionnaires by matching individual questions based on their semantic content. Using the Sentence-BERT model, we calculated the semantic similarity (cosine index) between 741 pairs of questions from five questionnaires. Drawing on data from a representative UK sample of adults (N = 2,058), we calculated a Spearman rank correlation for each of the same pairs of items, and then estimated the correlation between the cosine values and Spearman coefficients. We also used network analysis to explore the model's ability to uncover structures within the data and metadata.</p><p><strong>Results: </strong>We found a moderate overall correlation (r = .48, p < .001) between the two indices. In a holdout sample, the cosine scores predicted the real-world correlations with a small degree of error (MAE = 0.05, MedAE = 0.04, RMSE = 0.064) suggesting the utility of NLP in identifying similar items for cross-study data pooling. Our NLP model could detect more complex patterns in our data, however it required manual rules to decide which edges to include in the network.</p><p><strong>Conclusions: </strong>This research shows that it is possible to quantify the semantic similarity between pairs of questionnaire items from their meta-data, and these similarity indices correlate with how participants would answer the same two items. This highlights the potential of NLP to facilitate cross-study data pooling in mental health research. Nevertheless, researchers are cautioned to verify the psychometric equivalence of matched items.</p>","PeriodicalId":9029,"journal":{"name":"BMC Psychiatry","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11267737/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12888-024-05954-2","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Pooling data from different sources will advance mental health research by providing larger sample sizes and allowing cross-study comparisons; however, the heterogeneity in how variables are measured across studies poses a challenge to this process.

Methods: This study explored the potential of using natural language processing (NLP) to harmonise different mental health questionnaires by matching individual questions based on their semantic content. Using the Sentence-BERT model, we calculated the semantic similarity (cosine index) between 741 pairs of questions from five questionnaires. Drawing on data from a representative UK sample of adults (N = 2,058), we calculated a Spearman rank correlation for each of the same pairs of items, and then estimated the correlation between the cosine values and Spearman coefficients. We also used network analysis to explore the model's ability to uncover structures within the data and metadata.

Results: We found a moderate overall correlation (r = .48, p < .001) between the two indices. In a holdout sample, the cosine scores predicted the real-world correlations with a small degree of error (MAE = 0.05, MedAE = 0.04, RMSE = 0.064) suggesting the utility of NLP in identifying similar items for cross-study data pooling. Our NLP model could detect more complex patterns in our data, however it required manual rules to decide which edges to include in the network.

Conclusions: This research shows that it is possible to quantify the semantic similarity between pairs of questionnaire items from their meta-data, and these similarity indices correlate with how participants would answer the same two items. This highlights the potential of NLP to facilitate cross-study data pooling in mental health research. Nevertheless, researchers are cautioned to verify the psychometric equivalence of matched items.

使用自然语言处理促进心理健康问卷的统一:使用真实世界数据的验证研究。
背景:汇集来自不同来源的数据将提供更大的样本量并允许进行跨研究比较,从而推动心理健康研究的发展;然而,不同研究在测量变量方面的异质性给这一过程带来了挑战:本研究探索了使用自然语言处理(NLP)来协调不同心理健康调查问卷的可能性,方法是根据语义内容对各个问题进行匹配。利用句子-BERT模型,我们计算了五份问卷中741对问题之间的语义相似度(余弦指数)。根据英国具有代表性的成人样本数据(N = 2,058),我们计算了每对相同题目的斯皮尔曼等级相关性,然后估算了余弦值与斯皮尔曼系数之间的相关性。我们还使用网络分析来探索该模型揭示数据和元数据内部结构的能力:结果:我们发现总体相关性适中(r = .48,p 结论:数据和元数据之间的相关性较低:这项研究表明,从元数据中量化成对问卷项目之间的语义相似性是可能的,而且这些相似性指数与参与者如何回答相同的两个项目相关。这凸显了 NLP 在促进心理健康研究中跨研究数据汇集方面的潜力。尽管如此,研究人员仍需注意验证匹配项目的心理测量等效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Psychiatry
BMC Psychiatry 医学-精神病学
CiteScore
5.90
自引率
4.50%
发文量
716
审稿时长
3-6 weeks
期刊介绍: BMC Psychiatry is an open access, peer-reviewed journal that considers articles on all aspects of the prevention, diagnosis and management of psychiatric disorders, as well as related molecular genetics, pathophysiology, and epidemiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信