Chao Chieh Cheng, Po-Cheng Shih, Su Boon Yong, Edward Chia-Cheng Lai
{"title":"NHIRD and TriNetX in Rheumatology: Opportunities and Challenges","authors":"Chao Chieh Cheng, Po-Cheng Shih, Su Boon Yong, Edward Chia-Cheng Lai","doi":"10.1111/1756-185X.70203","DOIUrl":null,"url":null,"abstract":"<p>Rheumatic diseases are prevalent worldwide, with a considerable impact on patients' quality of life and a tendency to require long-term management. These conditions are complicated by common comorbidities, including cardiovascular disease, endocrine disorder, and mood disorder, further increasing both disease burden and treatment complexity. While biological therapies have revolutionized the management of various rheumatic diseases, high costs, potential adverse effects, and the heterogeneity of patient populations remain significant barriers to achieving optimal, personalized care.</p><p>In the history of the development of clinical research, randomized controlled trials (RCTs) have been the gold standard for evaluating efficacy and safety. However, their strict inclusion and exclusion criteria, limited sample sizes, and relatively short follow-up periods may fail to capture the full spectrum of disease heterogeneity and long-term real-world positive and negative outcomes. However, the use of real-world evidence (RWE) in research, as demonstrated in the original article published in the <i>International Journal of Rheumatic Diseases</i> in early 2019 (Su-Boon Yong et al., 2019) [<span>1</span>], has brought new hope for addressing such predicaments. Real-world evidence (RWE) derived from large-scale data sources, such as Taiwan's National Health Insurance Research Database (NHIRD) and the recent build global networks like TriNetX, offers the potential to overcome many of these limitations.</p><p>This editorial intends to explore the growing influence of RWE on rheumatologic research, discussing both its potential benefits and its limitations. We also highlight future directions for leveraging big data—particularly from Taiwan's NHIRD, TriNetX, and other global repositories, to optimize treatment strategies, refine risk prediction models, and guide real-world clinical decision-making in rheumatology.</p><p>The large heterogeneity of diseases leading to the challenge of research, such as systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and idiopathic inflammatory myositis (IIM), each exhibiting considerable variability in clinical manifestations, progression, and therapeutic responses. Diagnostic accuracy is often compromised by nonspecific presentations and the limited reliability of current biomarkers, incomplete imaging data sets, and then elevating the risk of misdiagnosis. These limitations are especially magnified in rare rheumatic diseases, where small patient populations restrict the feasibility of clinical trials and inflate the costs of multinational studies.</p><p>With the advances of biological and targeted synthetic medications emerging, prolonged follow-up studies are frequently hampered by high financial demands and significant patient attrition. Insufficient resources, particularly in the realm of rare diseases and innovative drug development, further restrict progress [<span>2</span>].</p><p>To overcome these challenges, global collaboration and the adoption of novel methodologies are essential. The large-scale, real-world data sources, such as Taiwan's NHIRD or multi-institutional platforms like TriNetX, can enhance disease surveillance, treatment evaluation, and patient stratification beyond what traditional RCTs can capture.</p><p>With the rapid emergence of biological therapies, randomized controlled trials (RCTs) remain a cornerstone for evaluating efficacy and safety in rheumatology. However, the strict inclusion and exclusion criteria in RCTs often limit their external validity, as real-world patient populations typically present with greater heterogeneity and multiple comorbidities [<span>3</span>]. In contrast, real-world evidence (RWE), derived from patient registries, national health insurance databases (e.g., Taiwan's NHIRD), and electronic healthcare records (EHRs) enables the assessment of treatment outcomes in diverse, large-scale, and long-term clinical settings, more closely mirroring routine practice [<span>4</span>].</p><p>Randomized controlled trials (RCTs) provide high-level causal evidence primarily because randomization effectively mitigates confounding variables within a carefully defined cohort. At the same time, real-world evidence (RWE) offers a broader perspective by capturing larger and more diverse patient populations, including those with rare or complex conditions. This expansive scope facilitates long-term monitoring of treatment effectiveness and adverse events that may be missed in the relatively short duration of many RCTs. Furthermore, RWE studies often have lower costs, allowing rapid responses to pressing clinical or policy-related questions [<span>4</span>].</p><p>Despite their robust internal validity, RCTs, commonly rely on smaller sample sizes, have shorter follow-up periods and higher operational costs, all of which reduce their ability to detect long-term outcomes or rare adverse events. Strict protocols may also exclude patients with multiple comorbidities or atypical disease presentations, thereby limiting the generalizability of RCT findings [<span>3</span>]. By contrast, RWE can suffer from variable data quality across different sources, raising the risks of misclassification, incomplete information, and selection bias. Consequently, rigorous patient matching and sophisticated statistical techniques (e.g., propensity score analysis) are crucial for ensuring that RWE produces reliable and accurate results [<span>4</span>].</p><p>The National Health Insurance Research Database (NHIRD), derived from Taiwan's National Health Insurance system, encompasses claims data for over 99.99% of the population, spanning outpatient and inpatient visits since 2000 [<span>5</span>]. Unlike many EHR-based systems that capture data primarily from specific hospital networks or healthcare systems, NHIRD offers a population-based data set, enhancing its representativeness. Since its inception, NHIRD has amassed extensive information on outpatient and inpatient services, prescription medications, and procedural claims, making it a comprehensive resource for large-scale epidemiological and health services research. By linking to other robust data sources—such as cancer registries and death records—the NHIRD enables longitudinal analyses and facilitates the study of both common and rare diseases in real-world clinical settings [<span>5</span>].</p><p>Several countries have developed comparable population-level data sets, such as the Clinical Practice Research Datalink (CPRD) in the United Kingdom, various national healthcare registries in Scandinavian countries (e.g., Denmark, Sweden), and the National Health Information Database (NHID) in South Korea. What sets the NHIRD apart is its nearly universal enrollment, minimal loss to follow up, and the capacity to investigate a vast array of topics ranging from pharmacoepidemiology to cost-effectiveness analyses. Additionally, NHIRD's scalability allows linkage with other databases, enriching research dimensions and improving data accuracy.</p><p>However, limitations remain. The database's reliance on administrative coding means data accuracy can be affected by coding practices, including potential issues like “upcoding” or coding errors [<span>5</span>]. Important unmeasured confounders, such as disease severity and lifestyle behaviors (e.g., smoking, alcohol use), are often absent, posing challenges for adjusting bias in research. NHIRD also excludes self-paid services, like cosmetic procedures, limiting its scope in certain areas [<span>6</span>]. Privacy and regulatory restrictions further complicate data access, as researchers must conduct onsite analysis at designated centers, with applications subjected to expert review. Despite these challenges, NHIRD remains a unique and invaluable population-based data set, serving as a cornerstone for advancing rheumatology and broader medical research through its comprehensive and representative data.</p><p>TriNetX is a global network that connects EHR data from over 130 healthcare organizations with biopharmaceutical companies to facilitate clinical trials while ensuring patient privacy through the use of aggregated data for feasibility assessments and recruitment.</p><p>Its key advantages include robust privacy safeguards, opportunities for collaboration between hospitals and pharmaceutical companies, and extensive global coverage, making it well suited for multi-institutional studies. Easy access and a friendly operating interface are also advantages of the database. After the adjustment in the recent platform, there are several collaborative networks with different purposes in the database to match the needs. However, limitations include the lack of standardized trial identifiers, which complicate the evaluation of system impact, and a focus on data queries with limited clarity regarding their influence on trial progression [<span>7</span>].</p><p>The National Health Insurance Research Database (NHIRD) and TriNetX exhibit distinct features and applications, as summarized in Table 1. NHIRD is population-based, providing comprehensive claims data, including diagnoses and prescriptions. It is primarily utilized for epidemiological research and health policy analysis, with robust data linkage to other government data sets. However, access to NHIRD is restricted to Taiwanese researchers or collaborators, and limited validation of certain diagnostic data highlights the need for improved accuracy [<span>5, 6</span>].</p><p>In contrast, TriNetX operates as a global federated network, connecting more than 130 healthcare organizations and offering aggregated patient data, such as counts of diagnoses and procedures, to ensure privacy. Its initial setting's primary focus is on supporting sponsor-initiated clinical trials, particularly for feasibility assessments and patient recruitment, though it does not offer data linkage capabilities. TriNetX has fewer access restrictions, varying by participating institutions, and no major data validation concerns have been reported. While both platforms serve valuable but distinct purposes, they differ in data coverage, level of detail, and access policies [<span>7</span>].</p><p>The NHIRD and TriNetX offer complementary strengths for rheumatologic research, each designed to address different investigative goals. NHIRD, with its nearly universal population coverage, excels in epidemiological and health services research, including analyses of disease prevalence, healthcare utilization, and policy impacts. In contrast, TriNetX leverages detailed EHR data, making it particularly suited for examining treatment effectiveness, disease progression, and patient stratification within diverse clinical settings.</p><p>For instance, NHIRD can capture nationwide trends in the incidence and management of rheumatoid arthritis, while TriNetX enables granular assessments of treatment adherence and clinical outcomes in well-defined patient subgroups. By integrating these two data sources, researchers can combine NHIRD's population-level breadth with TriNetX's clinical depth, employing cross-validation to corroborate key findings or using hybrid modeling approaches to explore disparities and outcomes across different cohorts. However, there is still a limitation in the combination of the studies between different claims or databases due to the validity differences and different definitions of raw data. Additionally, advanced machine learning or artificial intelligence methods can enhance both data integration and analysis, revealing novel patterns in rheumatologic care, guiding precision medicine, and informing evidence-based clinical and policy decisions.</p><p>Advancing rheumatologic research demands innovative strategies to navigate the complexities of these diseases. The NHIRD excels in population-level analyses, whereas TriNetX offers granular clinical insights, making their integration a powerful tool for comprehensive investigations. Combining these resources enables researchers to gain a deeper understanding of disease mechanisms, treatment outcomes, and healthcare systems. The combination with EHR database and population-based database mitigates the problem in external validity, misclassification, information, and selection biases from the full perspective analysis. This integrative approach holds the potential to transform rheumatologic research, fostering the development of personalized, data-driven care strategies.</p><p>C.C.C. and P.-C.S.: writing – original draft. S.B.Y., E.C.-C.L.: review and editing. S.B.Y., P.-C.S.: writing – review and editing.</p><p>The authors declare no conflicts of interest.</p>","PeriodicalId":14330,"journal":{"name":"International Journal of Rheumatic Diseases","volume":"28 4","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1756-185X.70203","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Rheumatic Diseases","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1756-185X.70203","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RHEUMATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Rheumatic diseases are prevalent worldwide, with a considerable impact on patients' quality of life and a tendency to require long-term management. These conditions are complicated by common comorbidities, including cardiovascular disease, endocrine disorder, and mood disorder, further increasing both disease burden and treatment complexity. While biological therapies have revolutionized the management of various rheumatic diseases, high costs, potential adverse effects, and the heterogeneity of patient populations remain significant barriers to achieving optimal, personalized care.
In the history of the development of clinical research, randomized controlled trials (RCTs) have been the gold standard for evaluating efficacy and safety. However, their strict inclusion and exclusion criteria, limited sample sizes, and relatively short follow-up periods may fail to capture the full spectrum of disease heterogeneity and long-term real-world positive and negative outcomes. However, the use of real-world evidence (RWE) in research, as demonstrated in the original article published in the International Journal of Rheumatic Diseases in early 2019 (Su-Boon Yong et al., 2019) [1], has brought new hope for addressing such predicaments. Real-world evidence (RWE) derived from large-scale data sources, such as Taiwan's National Health Insurance Research Database (NHIRD) and the recent build global networks like TriNetX, offers the potential to overcome many of these limitations.
This editorial intends to explore the growing influence of RWE on rheumatologic research, discussing both its potential benefits and its limitations. We also highlight future directions for leveraging big data—particularly from Taiwan's NHIRD, TriNetX, and other global repositories, to optimize treatment strategies, refine risk prediction models, and guide real-world clinical decision-making in rheumatology.
The large heterogeneity of diseases leading to the challenge of research, such as systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and idiopathic inflammatory myositis (IIM), each exhibiting considerable variability in clinical manifestations, progression, and therapeutic responses. Diagnostic accuracy is often compromised by nonspecific presentations and the limited reliability of current biomarkers, incomplete imaging data sets, and then elevating the risk of misdiagnosis. These limitations are especially magnified in rare rheumatic diseases, where small patient populations restrict the feasibility of clinical trials and inflate the costs of multinational studies.
With the advances of biological and targeted synthetic medications emerging, prolonged follow-up studies are frequently hampered by high financial demands and significant patient attrition. Insufficient resources, particularly in the realm of rare diseases and innovative drug development, further restrict progress [2].
To overcome these challenges, global collaboration and the adoption of novel methodologies are essential. The large-scale, real-world data sources, such as Taiwan's NHIRD or multi-institutional platforms like TriNetX, can enhance disease surveillance, treatment evaluation, and patient stratification beyond what traditional RCTs can capture.
With the rapid emergence of biological therapies, randomized controlled trials (RCTs) remain a cornerstone for evaluating efficacy and safety in rheumatology. However, the strict inclusion and exclusion criteria in RCTs often limit their external validity, as real-world patient populations typically present with greater heterogeneity and multiple comorbidities [3]. In contrast, real-world evidence (RWE), derived from patient registries, national health insurance databases (e.g., Taiwan's NHIRD), and electronic healthcare records (EHRs) enables the assessment of treatment outcomes in diverse, large-scale, and long-term clinical settings, more closely mirroring routine practice [4].
Randomized controlled trials (RCTs) provide high-level causal evidence primarily because randomization effectively mitigates confounding variables within a carefully defined cohort. At the same time, real-world evidence (RWE) offers a broader perspective by capturing larger and more diverse patient populations, including those with rare or complex conditions. This expansive scope facilitates long-term monitoring of treatment effectiveness and adverse events that may be missed in the relatively short duration of many RCTs. Furthermore, RWE studies often have lower costs, allowing rapid responses to pressing clinical or policy-related questions [4].
Despite their robust internal validity, RCTs, commonly rely on smaller sample sizes, have shorter follow-up periods and higher operational costs, all of which reduce their ability to detect long-term outcomes or rare adverse events. Strict protocols may also exclude patients with multiple comorbidities or atypical disease presentations, thereby limiting the generalizability of RCT findings [3]. By contrast, RWE can suffer from variable data quality across different sources, raising the risks of misclassification, incomplete information, and selection bias. Consequently, rigorous patient matching and sophisticated statistical techniques (e.g., propensity score analysis) are crucial for ensuring that RWE produces reliable and accurate results [4].
The National Health Insurance Research Database (NHIRD), derived from Taiwan's National Health Insurance system, encompasses claims data for over 99.99% of the population, spanning outpatient and inpatient visits since 2000 [5]. Unlike many EHR-based systems that capture data primarily from specific hospital networks or healthcare systems, NHIRD offers a population-based data set, enhancing its representativeness. Since its inception, NHIRD has amassed extensive information on outpatient and inpatient services, prescription medications, and procedural claims, making it a comprehensive resource for large-scale epidemiological and health services research. By linking to other robust data sources—such as cancer registries and death records—the NHIRD enables longitudinal analyses and facilitates the study of both common and rare diseases in real-world clinical settings [5].
Several countries have developed comparable population-level data sets, such as the Clinical Practice Research Datalink (CPRD) in the United Kingdom, various national healthcare registries in Scandinavian countries (e.g., Denmark, Sweden), and the National Health Information Database (NHID) in South Korea. What sets the NHIRD apart is its nearly universal enrollment, minimal loss to follow up, and the capacity to investigate a vast array of topics ranging from pharmacoepidemiology to cost-effectiveness analyses. Additionally, NHIRD's scalability allows linkage with other databases, enriching research dimensions and improving data accuracy.
However, limitations remain. The database's reliance on administrative coding means data accuracy can be affected by coding practices, including potential issues like “upcoding” or coding errors [5]. Important unmeasured confounders, such as disease severity and lifestyle behaviors (e.g., smoking, alcohol use), are often absent, posing challenges for adjusting bias in research. NHIRD also excludes self-paid services, like cosmetic procedures, limiting its scope in certain areas [6]. Privacy and regulatory restrictions further complicate data access, as researchers must conduct onsite analysis at designated centers, with applications subjected to expert review. Despite these challenges, NHIRD remains a unique and invaluable population-based data set, serving as a cornerstone for advancing rheumatology and broader medical research through its comprehensive and representative data.
TriNetX is a global network that connects EHR data from over 130 healthcare organizations with biopharmaceutical companies to facilitate clinical trials while ensuring patient privacy through the use of aggregated data for feasibility assessments and recruitment.
Its key advantages include robust privacy safeguards, opportunities for collaboration between hospitals and pharmaceutical companies, and extensive global coverage, making it well suited for multi-institutional studies. Easy access and a friendly operating interface are also advantages of the database. After the adjustment in the recent platform, there are several collaborative networks with different purposes in the database to match the needs. However, limitations include the lack of standardized trial identifiers, which complicate the evaluation of system impact, and a focus on data queries with limited clarity regarding their influence on trial progression [7].
The National Health Insurance Research Database (NHIRD) and TriNetX exhibit distinct features and applications, as summarized in Table 1. NHIRD is population-based, providing comprehensive claims data, including diagnoses and prescriptions. It is primarily utilized for epidemiological research and health policy analysis, with robust data linkage to other government data sets. However, access to NHIRD is restricted to Taiwanese researchers or collaborators, and limited validation of certain diagnostic data highlights the need for improved accuracy [5, 6].
In contrast, TriNetX operates as a global federated network, connecting more than 130 healthcare organizations and offering aggregated patient data, such as counts of diagnoses and procedures, to ensure privacy. Its initial setting's primary focus is on supporting sponsor-initiated clinical trials, particularly for feasibility assessments and patient recruitment, though it does not offer data linkage capabilities. TriNetX has fewer access restrictions, varying by participating institutions, and no major data validation concerns have been reported. While both platforms serve valuable but distinct purposes, they differ in data coverage, level of detail, and access policies [7].
The NHIRD and TriNetX offer complementary strengths for rheumatologic research, each designed to address different investigative goals. NHIRD, with its nearly universal population coverage, excels in epidemiological and health services research, including analyses of disease prevalence, healthcare utilization, and policy impacts. In contrast, TriNetX leverages detailed EHR data, making it particularly suited for examining treatment effectiveness, disease progression, and patient stratification within diverse clinical settings.
For instance, NHIRD can capture nationwide trends in the incidence and management of rheumatoid arthritis, while TriNetX enables granular assessments of treatment adherence and clinical outcomes in well-defined patient subgroups. By integrating these two data sources, researchers can combine NHIRD's population-level breadth with TriNetX's clinical depth, employing cross-validation to corroborate key findings or using hybrid modeling approaches to explore disparities and outcomes across different cohorts. However, there is still a limitation in the combination of the studies between different claims or databases due to the validity differences and different definitions of raw data. Additionally, advanced machine learning or artificial intelligence methods can enhance both data integration and analysis, revealing novel patterns in rheumatologic care, guiding precision medicine, and informing evidence-based clinical and policy decisions.
Advancing rheumatologic research demands innovative strategies to navigate the complexities of these diseases. The NHIRD excels in population-level analyses, whereas TriNetX offers granular clinical insights, making their integration a powerful tool for comprehensive investigations. Combining these resources enables researchers to gain a deeper understanding of disease mechanisms, treatment outcomes, and healthcare systems. The combination with EHR database and population-based database mitigates the problem in external validity, misclassification, information, and selection biases from the full perspective analysis. This integrative approach holds the potential to transform rheumatologic research, fostering the development of personalized, data-driven care strategies.
C.C.C. and P.-C.S.: writing – original draft. S.B.Y., E.C.-C.L.: review and editing. S.B.Y., P.-C.S.: writing – review and editing.
期刊介绍:
The International Journal of Rheumatic Diseases (formerly APLAR Journal of Rheumatology) is the official journal of the Asia Pacific League of Associations for Rheumatology. The Journal accepts original articles on clinical or experimental research pertinent to the rheumatic diseases, work on connective tissue diseases and other immune and allergic disorders. The acceptance criteria for all papers are the quality and originality of the research and its significance to our readership. Except where otherwise stated, manuscripts are peer reviewed by two anonymous reviewers and the Editor.