Samah Fodeh, Rixin Wang, Terrence E Murphy, Farah Kidwai-Khan, Linda S Leo-Summers, Baylah Tessier-Sherman, Evelyn Hsieh, Julie A Womack
{"title":"BoneScore: A natural language processing algorithm to extract bone mineral density data from DXA scans.","authors":"Samah Fodeh, Rixin Wang, Terrence E Murphy, Farah Kidwai-Khan, Linda S Leo-Summers, Baylah Tessier-Sherman, Evelyn Hsieh, Julie A Womack","doi":"10.1177/14604582241295930","DOIUrl":null,"url":null,"abstract":"<p><p><b>Objective:</b> To develop and test an NLP algorithm that accurately detects the presence of information reported from DXA scans containing femoral neck T-scores of the patients scanned. <b>Methods:</b> A rule-based NLP algorithm that iteratively built a collection of regular expressions in testing data consisting of 889 snippets of text pulled from DXA reports. This was manually checked by clinical experts to determine the proportion of manually verified annotations that contained T-score information detected by this algorithm called 'BoneScore'. Testing of 30- and 50-word lengths on each side of the key term 'femoral' were pursued until achievement of adequate accuracy. A separate clinical validation regressed the extracted T-score values on five risk factors with established associations. <b>Results:</b> BoneScore built a set of 20 regular expressions that in concert with a width of 50 words on each side of the key term yielded an accuracy of 98% in the testing data. The extracted T-scores, when modeled with multivariable linear regression, consistently exhibited associations supported by the literature. <b>Conclusion:</b> BoneScore uses regular expressions to accurately extract annotations of T-score values of bone mineral density with a width of 50 words on each side of the key term. The extracted T-scores exhibit clinical face validity.</p>","PeriodicalId":55069,"journal":{"name":"Health Informatics Journal","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Informatics Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/14604582241295930","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To develop and test an NLP algorithm that accurately detects the presence of information reported from DXA scans containing femoral neck T-scores of the patients scanned. Methods: A rule-based NLP algorithm that iteratively built a collection of regular expressions in testing data consisting of 889 snippets of text pulled from DXA reports. This was manually checked by clinical experts to determine the proportion of manually verified annotations that contained T-score information detected by this algorithm called 'BoneScore'. Testing of 30- and 50-word lengths on each side of the key term 'femoral' were pursued until achievement of adequate accuracy. A separate clinical validation regressed the extracted T-score values on five risk factors with established associations. Results: BoneScore built a set of 20 regular expressions that in concert with a width of 50 words on each side of the key term yielded an accuracy of 98% in the testing data. The extracted T-scores, when modeled with multivariable linear regression, consistently exhibited associations supported by the literature. Conclusion: BoneScore uses regular expressions to accurately extract annotations of T-score values of bone mineral density with a width of 50 words on each side of the key term. The extracted T-scores exhibit clinical face validity.
期刊介绍:
Health Informatics Journal is an international peer-reviewed journal. All papers submitted to Health Informatics Journal are subject to peer review by members of a carefully appointed editorial board. The journal operates a conventional single-blind reviewing policy in which the reviewer’s name is always concealed from the submitting author.