{"title":"在日本三级保健中心的行政索赔数据库中识别系统性红斑狼疮患者的病例查找算法的开发和验证。","authors":"Ken-Ei Sada, Yoshia Miyawaki, Ryo Yanai, Takashi Kida, Akira Ohnishi, Ryusuke Yoshimi, Kunihiro Ichinose, Yasuhiro Shimojima","doi":"10.1093/mr/roaf091","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and validate algorithms for identifying patients with systemic lupus erythematosus (SLE) in Japanese administrative claims databases from tertiary care centers using statistical and machine learning methods.</p><p><strong>Methods: </strong>This retrospective cross-sectional study included 13,538 patients from six hospitals. One-year claims data were linked to chart-confirmed SLE diagnoses. Patients were randomly assigned to training (n = 8,811) and test (n = 3,775) sets; an external validation set (n = 952) was drawn from another hospital. Feature selection used Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, and Recursive Feature Elimination. Logistic regression, random forest, and decision tree models were trained with synthetic oversampling to address class imbalance. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), and other standard performance metrics.</p><p><strong>Results: </strong>The random forest model achieved the best performance (AUROC: 0.995; sensitivity: 0.971; specificity: 0.969). A simplified rule based on diagnosis code and anti-double-stranded DNA antibody testing showed high accuracy in both test and validation sets. Adding urine sediment examination modestly improved sensitivity but reduced specificity.</p><p><strong>Conclusion: </strong>A claims-based algorithm incorporating diagnosis codes and standard laboratory tests accurately identified patients with SLE facilitating reliable use of administrative data in real-world research.</p>","PeriodicalId":18705,"journal":{"name":"Modern Rheumatology","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development and validation of case-finding algorithms for identifying patients with systemic lupus erythematosus in an administrative claim database from tertiary care centers in Japan.\",\"authors\":\"Ken-Ei Sada, Yoshia Miyawaki, Ryo Yanai, Takashi Kida, Akira Ohnishi, Ryusuke Yoshimi, Kunihiro Ichinose, Yasuhiro Shimojima\",\"doi\":\"10.1093/mr/roaf091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To develop and validate algorithms for identifying patients with systemic lupus erythematosus (SLE) in Japanese administrative claims databases from tertiary care centers using statistical and machine learning methods.</p><p><strong>Methods: </strong>This retrospective cross-sectional study included 13,538 patients from six hospitals. One-year claims data were linked to chart-confirmed SLE diagnoses. Patients were randomly assigned to training (n = 8,811) and test (n = 3,775) sets; an external validation set (n = 952) was drawn from another hospital. Feature selection used Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, and Recursive Feature Elimination. Logistic regression, random forest, and decision tree models were trained with synthetic oversampling to address class imbalance. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), and other standard performance metrics.</p><p><strong>Results: </strong>The random forest model achieved the best performance (AUROC: 0.995; sensitivity: 0.971; specificity: 0.969). A simplified rule based on diagnosis code and anti-double-stranded DNA antibody testing showed high accuracy in both test and validation sets. Adding urine sediment examination modestly improved sensitivity but reduced specificity.</p><p><strong>Conclusion: </strong>A claims-based algorithm incorporating diagnosis codes and standard laboratory tests accurately identified patients with SLE facilitating reliable use of administrative data in real-world research.</p>\",\"PeriodicalId\":18705,\"journal\":{\"name\":\"Modern Rheumatology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Modern Rheumatology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/mr/roaf091\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"RHEUMATOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Modern Rheumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/mr/roaf091","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
Development and validation of case-finding algorithms for identifying patients with systemic lupus erythematosus in an administrative claim database from tertiary care centers in Japan.
Objective: To develop and validate algorithms for identifying patients with systemic lupus erythematosus (SLE) in Japanese administrative claims databases from tertiary care centers using statistical and machine learning methods.
Methods: This retrospective cross-sectional study included 13,538 patients from six hospitals. One-year claims data were linked to chart-confirmed SLE diagnoses. Patients were randomly assigned to training (n = 8,811) and test (n = 3,775) sets; an external validation set (n = 952) was drawn from another hospital. Feature selection used Least Absolute Shrinkage and Selection Operator (LASSO), Boruta, and Recursive Feature Elimination. Logistic regression, random forest, and decision tree models were trained with synthetic oversampling to address class imbalance. Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), and other standard performance metrics.
Results: The random forest model achieved the best performance (AUROC: 0.995; sensitivity: 0.971; specificity: 0.969). A simplified rule based on diagnosis code and anti-double-stranded DNA antibody testing showed high accuracy in both test and validation sets. Adding urine sediment examination modestly improved sensitivity but reduced specificity.
Conclusion: A claims-based algorithm incorporating diagnosis codes and standard laboratory tests accurately identified patients with SLE facilitating reliable use of administrative data in real-world research.
期刊介绍:
Modern Rheumatology publishes original papers in English on research pertinent to rheumatology and associated areas such as pathology, physiology, clinical immunology, microbiology, biochemistry, experimental animal models, pharmacology, and orthopedic surgery.
Occasional reviews of topics which may be of wide interest to the readership will be accepted. In addition, concise papers of special scientific importance that represent definitive and original studies will be considered.
Modern Rheumatology is currently indexed in Science Citation Index Expanded (SciSearch), Journal Citation Reports/Science Edition, PubMed/Medline, SCOPUS, EMBASE, Chemical Abstracts Service (CAS), Google Scholar, EBSCO, CSA, Academic OneFile, Current Abstracts, Elsevier Biobase, Gale, Health Reference Center Academic, OCLC, SCImago, Summon by Serial Solutions