Treatment journey clustering with a novel kernel k-means machine learning algorithm: a retrospective analysis of insurance claims in bipolar I disorder.

IF 4.5 Q1 Computer Science

Brain Informatics Pub Date : 2025-05-22 DOI:10.1186/s40708-025-00258-x

Matthew Littman, Huy-Binh Nguyen, Joanna Campbell, Katelyn R Keyloun

{"title":"Treatment journey clustering with a novel kernel k-means machine learning algorithm: a retrospective analysis of insurance claims in bipolar I disorder.","authors":"Matthew Littman, Huy-Binh Nguyen, Joanna Campbell, Katelyn R Keyloun","doi":"10.1186/s40708-025-00258-x","DOIUrl":null,"url":null,"abstract":"<p><p>In real-world psychiatric practice, patients may experience complex treatment journeys, including various diagnoses and lines of therapy. Insurance claims databases could potentially provide insight into outcomes of psychiatric treatment processes, but the diversity of event sequences restricts analyses with currently available methods. Here, we developed a novel kernel k-means clustering algorithm for event sequences that can accommodate highly diverse event types and sequence lengths. The approach, Divisive Optimized Clustering using Kernel K-means for Event Sequences (DOCKKES), also leverages a novel performance metric, the transition score, which measures sequence coherence in individual clusters. The performance of DOCKKES was evaluated in the context of bipolar I disorder, which is characterized by heterogeneous treatment journeys. We conducted a retrospective, observational analysis of a large sample (n = 31,578) of patients with bipolar I disorder from the MarketScan® Commercial Database. Using insurance claims, bipolar episode diagnoses and mental health-related lines of therapy were identified as events of interest for patient clustering. The dataset included 202,122 events; 75% of the cohort experienced unique treatment journeys. Based on an optimal run, DOCKKES identified 16 treatment journey clusters, which were evenly split for initial manic/mixed or depressive episodes (8 clusters each) and varied in sequence length and early lines of therapy. Variability across clusters was also observed for demographics, comorbidities, and mental health-related healthcare resource utilization and cost. This proof-of-concept study demonstrated the use of DOCKKES for integrating information from large datasets, enabling comparisons between patient clusters and evaluation of real-world treatment journeys in the context of evidence-based guidelines.</p>","PeriodicalId":37465,"journal":{"name":"Brain Informatics","volume":"12 1","pages":"12"},"PeriodicalIF":4.5000,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12098244/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brain Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40708-025-00258-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

In real-world psychiatric practice, patients may experience complex treatment journeys, including various diagnoses and lines of therapy. Insurance claims databases could potentially provide insight into outcomes of psychiatric treatment processes, but the diversity of event sequences restricts analyses with currently available methods. Here, we developed a novel kernel k-means clustering algorithm for event sequences that can accommodate highly diverse event types and sequence lengths. The approach, Divisive Optimized Clustering using Kernel K-means for Event Sequences (DOCKKES), also leverages a novel performance metric, the transition score, which measures sequence coherence in individual clusters. The performance of DOCKKES was evaluated in the context of bipolar I disorder, which is characterized by heterogeneous treatment journeys. We conducted a retrospective, observational analysis of a large sample (n = 31,578) of patients with bipolar I disorder from the MarketScan® Commercial Database. Using insurance claims, bipolar episode diagnoses and mental health-related lines of therapy were identified as events of interest for patient clustering. The dataset included 202,122 events; 75% of the cohort experienced unique treatment journeys. Based on an optimal run, DOCKKES identified 16 treatment journey clusters, which were evenly split for initial manic/mixed or depressive episodes (8 clusters each) and varied in sequence length and early lines of therapy. Variability across clusters was also observed for demographics, comorbidities, and mental health-related healthcare resource utilization and cost. This proof-of-concept study demonstrated the use of DOCKKES for integrating information from large datasets, enabling comparisons between patient clusters and evaluation of real-world treatment journeys in the context of evidence-based guidelines.

Abstract Image

查看原文本刊更多论文

一种新型核k-均值机器学习算法的治疗过程聚类：双相I型障碍保险索赔的回顾性分析。

在现实世界的精神病学实践中，患者可能会经历复杂的治疗过程，包括各种诊断和治疗方法。保险索赔数据库可以潜在地提供对精神病治疗过程结果的洞察，但是事件序列的多样性限制了当前可用方法的分析。在这里，我们为事件序列开发了一种新的核k-均值聚类算法，该算法可以适应高度不同的事件类型和序列长度。该方法，分裂优化聚类使用核k -均值事件序列（DOCKKES），也利用了一种新的性能指标，过渡分数，衡量序列一致性在单个集群。DOCKKES的表现在双相I型障碍的背景下进行了评估，其特点是异质性的治疗过程。我们对来自MarketScan®商业数据库的大样本（n = 31,578）双相I型障碍患者进行了回顾性观察性分析。使用保险索赔，双相情感障碍发作诊断和精神健康相关的治疗线被确定为患者聚类感兴趣的事件。该数据集包括202,122个事件；75%的队列经历了独特的治疗旅程。基于最佳运行，DOCKKES确定了16个治疗旅程集群，这些集群平均分配给初始躁狂/混合性或抑郁发作（每个8个集群），并且序列长度和早期治疗线不同。在人口统计学、合并症和精神健康相关的医疗资源利用和成本方面，也观察到不同集群之间的差异。这项概念验证研究展示了DOCKKES用于整合来自大型数据集的信息，能够在循证指南的背景下比较患者群和评估现实世界的治疗过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Brain Informatics Computer Science-Computer Science Applications

CiteScore

9.50

自引率

0.00%

发文量

审稿时长

13 weeks

期刊介绍： Brain Informatics is an international, peer-reviewed, interdisciplinary open-access journal published under the brand SpringerOpen, which provides a unique platform for researchers and practitioners to disseminate original research on computational and informatics technologies related to brain. This journal addresses the computational, cognitive, physiological, biological, physical, ecological and social perspectives of brain informatics. It also welcomes emerging information technologies and advanced neuro-imaging technologies, such as big data analytics and interactive knowledge discovery related to various large-scale brain studies and their applications. This journal will publish high-quality original research papers, brief reports and critical reviews in all theoretical, technological, clinical and interdisciplinary studies that make up the field of brain informatics and its applications in brain-machine intelligence, brain-inspired intelligent systems, mental health and brain disorders, etc. The scope of papers includes the following five tracks: Track 1: Cognitive and Computational Foundations of Brain Science Track 2: Human Information Processing Systems Track 3: Brain Big Data Analytics, Curation and Management Track 4: Informatics Paradigms for Brain and Mental Health Research Track 5: Brain-Machine Intelligence and Brain-Inspired Computing