{"title":"Building and Evaluating an Orthodontic Natural Language Processing Model for Automated Clinical Note Information Extraction.","authors":"Jay S Patel, Divakar Karanth","doi":"10.1111/ocr.12944","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Malocclusion presents functional and aesthetic challenges, necessitating accurate diagnosis and treatment. However, variability in orthodontic treatment planning persists due to subjective assessments, limiting consistency and objectivity. Electronic dental records (EDRs) contain vast patient data that could address these challenges, but much of the rich clinical information is documented as free text, complicating analysis. This study aims to develop an Orthodontic Natural Language Processing (ONLP) model to extract structured orthodontics-related information from unstructured EDRs and identify critical features influencing malocclusion using machine learning (ML).</p><p><strong>Methods: </strong>Data from 7693 orthodontic patients were analysed to train, test and validate the ONLP and ML models. A gold-standard dataset was created through manual review. The ONLP model utilised supervised (Named Entity Recognition-NER) and unsupervised (K-means clustering) approaches to structure information from free text. Machine learning models, including Logistic Regression, Gaussian Naive Bayes, Random Forest and XGBoost, were subsequently applied to identify feature importance for malocclusion classification.</p><p><strong>Results: </strong>The ONLP model achieved 89% sensitivity, 92% specificity and 91% accuracy in extracting orthodontics-related information. The supervised model demonstrated 84% accuracy, 82% F1-score and 84% recall, excelling in identifying Classes I and III malocclusions but showing reduced sensitivity for Class II. Machine learning analysis highlighted key features for malocclusion classification: maxillary crowding, overjet and arch perimeter discrepancy for Class I; maxillary spacing and anterior crossbite for Class II; and dental midline deviation and occlusal wear for Class III.</p><p><strong>Conclusion: </strong>This study demonstrates a novel approach to automating orthodontic data extraction using the ONLP model, enabling advanced big data analytics and enhancing data-driven orthodontic research and care.</p>","PeriodicalId":19652,"journal":{"name":"Orthodontics & Craniofacial Research","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Orthodontics & Craniofacial Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/ocr.12944","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Malocclusion presents functional and aesthetic challenges, necessitating accurate diagnosis and treatment. However, variability in orthodontic treatment planning persists due to subjective assessments, limiting consistency and objectivity. Electronic dental records (EDRs) contain vast patient data that could address these challenges, but much of the rich clinical information is documented as free text, complicating analysis. This study aims to develop an Orthodontic Natural Language Processing (ONLP) model to extract structured orthodontics-related information from unstructured EDRs and identify critical features influencing malocclusion using machine learning (ML).
Methods: Data from 7693 orthodontic patients were analysed to train, test and validate the ONLP and ML models. A gold-standard dataset was created through manual review. The ONLP model utilised supervised (Named Entity Recognition-NER) and unsupervised (K-means clustering) approaches to structure information from free text. Machine learning models, including Logistic Regression, Gaussian Naive Bayes, Random Forest and XGBoost, were subsequently applied to identify feature importance for malocclusion classification.
Results: The ONLP model achieved 89% sensitivity, 92% specificity and 91% accuracy in extracting orthodontics-related information. The supervised model demonstrated 84% accuracy, 82% F1-score and 84% recall, excelling in identifying Classes I and III malocclusions but showing reduced sensitivity for Class II. Machine learning analysis highlighted key features for malocclusion classification: maxillary crowding, overjet and arch perimeter discrepancy for Class I; maxillary spacing and anterior crossbite for Class II; and dental midline deviation and occlusal wear for Class III.
Conclusion: This study demonstrates a novel approach to automating orthodontic data extraction using the ONLP model, enabling advanced big data analytics and enhancing data-driven orthodontic research and care.
期刊介绍:
Orthodontics & Craniofacial Research - Genes, Growth and Development is published to serve its readers as an international forum for the presentation and critical discussion of issues pertinent to the advancement of the specialty of orthodontics and the evidence-based knowledge of craniofacial growth and development. This forum is based on scientifically supported information, but also includes minority and conflicting opinions.
The objective of the journal is to facilitate effective communication between the research community and practicing clinicians. Original papers of high scientific quality that report the findings of clinical trials, clinical epidemiology, and novel therapeutic or diagnostic approaches are appropriate submissions. Similarly, we welcome papers in genetics, developmental biology, syndromology, surgery, speech and hearing, and other biomedical disciplines related to clinical orthodontics and normal and abnormal craniofacial growth and development. In addition to original and basic research, the journal publishes concise reviews, case reports of substantial value, invited essays, letters, and announcements.
The journal is published quarterly. The review of submitted papers will be coordinated by the editor and members of the editorial board. It is policy to review manuscripts within 3 to 4 weeks of receipt and to publish within 3 to 6 months of acceptance.