Building and Evaluating an Orthodontic Natural Language Processing Model for Automated Clinical Note Information Extraction.

IF 2.4 3区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Jay S Patel, Divakar Karanth
{"title":"Building and Evaluating an Orthodontic Natural Language Processing Model for Automated Clinical Note Information Extraction.","authors":"Jay S Patel, Divakar Karanth","doi":"10.1111/ocr.12944","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Malocclusion presents functional and aesthetic challenges, necessitating accurate diagnosis and treatment. However, variability in orthodontic treatment planning persists due to subjective assessments, limiting consistency and objectivity. Electronic dental records (EDRs) contain vast patient data that could address these challenges, but much of the rich clinical information is documented as free text, complicating analysis. This study aims to develop an Orthodontic Natural Language Processing (ONLP) model to extract structured orthodontics-related information from unstructured EDRs and identify critical features influencing malocclusion using machine learning (ML).</p><p><strong>Methods: </strong>Data from 7693 orthodontic patients were analysed to train, test and validate the ONLP and ML models. A gold-standard dataset was created through manual review. The ONLP model utilised supervised (Named Entity Recognition-NER) and unsupervised (K-means clustering) approaches to structure information from free text. Machine learning models, including Logistic Regression, Gaussian Naive Bayes, Random Forest and XGBoost, were subsequently applied to identify feature importance for malocclusion classification.</p><p><strong>Results: </strong>The ONLP model achieved 89% sensitivity, 92% specificity and 91% accuracy in extracting orthodontics-related information. The supervised model demonstrated 84% accuracy, 82% F1-score and 84% recall, excelling in identifying Classes I and III malocclusions but showing reduced sensitivity for Class II. Machine learning analysis highlighted key features for malocclusion classification: maxillary crowding, overjet and arch perimeter discrepancy for Class I; maxillary spacing and anterior crossbite for Class II; and dental midline deviation and occlusal wear for Class III.</p><p><strong>Conclusion: </strong>This study demonstrates a novel approach to automating orthodontic data extraction using the ONLP model, enabling advanced big data analytics and enhancing data-driven orthodontic research and care.</p>","PeriodicalId":19652,"journal":{"name":"Orthodontics & Craniofacial Research","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthodontics & Craniofacial Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ocr.12944","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Malocclusion presents functional and aesthetic challenges, necessitating accurate diagnosis and treatment. However, variability in orthodontic treatment planning persists due to subjective assessments, limiting consistency and objectivity. Electronic dental records (EDRs) contain vast patient data that could address these challenges, but much of the rich clinical information is documented as free text, complicating analysis. This study aims to develop an Orthodontic Natural Language Processing (ONLP) model to extract structured orthodontics-related information from unstructured EDRs and identify critical features influencing malocclusion using machine learning (ML).

Methods: Data from 7693 orthodontic patients were analysed to train, test and validate the ONLP and ML models. A gold-standard dataset was created through manual review. The ONLP model utilised supervised (Named Entity Recognition-NER) and unsupervised (K-means clustering) approaches to structure information from free text. Machine learning models, including Logistic Regression, Gaussian Naive Bayes, Random Forest and XGBoost, were subsequently applied to identify feature importance for malocclusion classification.

Results: The ONLP model achieved 89% sensitivity, 92% specificity and 91% accuracy in extracting orthodontics-related information. The supervised model demonstrated 84% accuracy, 82% F1-score and 84% recall, excelling in identifying Classes I and III malocclusions but showing reduced sensitivity for Class II. Machine learning analysis highlighted key features for malocclusion classification: maxillary crowding, overjet and arch perimeter discrepancy for Class I; maxillary spacing and anterior crossbite for Class II; and dental midline deviation and occlusal wear for Class III.

Conclusion: This study demonstrates a novel approach to automating orthodontic data extraction using the ONLP model, enabling advanced big data analytics and enhancing data-driven orthodontic research and care.

用于临床记录信息自动提取的正畸自然语言处理模型的构建与评价。
错牙合带来了功能和美学上的挑战,需要准确的诊断和治疗。然而,由于主观评估,正畸治疗计划的可变性仍然存在,限制了一致性和客观性。电子牙科记录(EDRs)包含大量的患者数据,可以解决这些挑战,但许多丰富的临床信息以自由文本的形式记录,使分析变得复杂。本研究旨在开发一个正畸自然语言处理(ONLP)模型,从非结构化的edr中提取结构化的正畸相关信息,并使用机器学习(ML)识别影响错牙合的关键特征。方法:分析7693例正畸患者的数据,对ONLP和ML模型进行训练、测试和验证。通过人工审查创建了一个金标准数据集。ONLP模型利用有监督(命名实体识别- ner)和无监督(K-means聚类)方法从自由文本中构造信息。随后应用逻辑回归、高斯朴素贝叶斯、随机森林和XGBoost等机器学习模型来识别错颌错分类的特征重要性。结果:ONLP模型提取正畸相关信息的灵敏度为89%,特异性为92%,准确率为91%。监督模型的准确率为84%,f1评分为82%,召回率为84%,在识别I类和III类错误方面表现出色,但对II类错误的敏感性较低。机器学习分析强调了错牙合分类的关键特征:上颌拥挤、覆盖和弓周差异;上颌间距和前牙合用于第二类;牙齿中线偏差和咬合磨损为第三类。结论:本研究展示了一种使用ONLP模型自动化正畸数据提取的新方法,实现了先进的大数据分析,增强了数据驱动的正畸研究和护理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Orthodontics & Craniofacial Research
Orthodontics & Craniofacial Research 医学-牙科与口腔外科
CiteScore
5.30
自引率
3.20%
发文量
65
审稿时长
>12 weeks
期刊介绍: Orthodontics & Craniofacial Research - Genes, Growth and Development is published to serve its readers as an international forum for the presentation and critical discussion of issues pertinent to the advancement of the specialty of orthodontics and the evidence-based knowledge of craniofacial growth and development. This forum is based on scientifically supported information, but also includes minority and conflicting opinions. The objective of the journal is to facilitate effective communication between the research community and practicing clinicians. Original papers of high scientific quality that report the findings of clinical trials, clinical epidemiology, and novel therapeutic or diagnostic approaches are appropriate submissions. Similarly, we welcome papers in genetics, developmental biology, syndromology, surgery, speech and hearing, and other biomedical disciplines related to clinical orthodontics and normal and abnormal craniofacial growth and development. In addition to original and basic research, the journal publishes concise reviews, case reports of substantial value, invited essays, letters, and announcements. The journal is published quarterly. The review of submitted papers will be coordinated by the editor and members of the editorial board. It is policy to review manuscripts within 3 to 4 weeks of receipt and to publish within 3 to 6 months of acceptance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信