{"title":"Can routinely collected primary healthcare data be used to assess Aboriginal children's health and wellbeing longitudinally? A retrospective analysis of electronic medical records from an Aboriginal community-controlled health service in Central Australia.","authors":"Catherine Lloyd-Johnsen, John Boffa, Vahab Baghbanian, Rachel Walpole, Shuaijun Guo, Sandra Eades, Anita D'Aprano, Sharon Goldfeld","doi":"10.23889/ijpds.v10i1.2704","DOIUrl":"10.23889/ijpds.v10i1.2704","url":null,"abstract":"<p><strong>Introduction: </strong>Electronic medical records (EMR) are an essential tool in modern healthcare, providing a centralised source of patient information. Longitudinal analysis of EMRs can identify opportunities for targeted interventions to improve health outcomes for children. However, the research value of EMRs is contingent on data quality and completeness.</p><p><strong>Methods: </strong>This retrospective cohort study used deidentified EMRs from all Aboriginal children born in 2015 who attended an Aboriginal-controlled health service in Central Australia over a 5-year period. The purpose of this study was to demonstrate the utility of EMRs in longitudinal research via presentation of three case study example analyses, and to evaluate the quality of the extracted dataset.</p><p><strong>Results: </strong>EMRs of 319 Aboriginal children (48.9% girls, 51.1% boys) were included in the analysis. These children visited the service an average of 19.9 times (min 2 - max 102). Attendance rates for routine well-child check-ups were highest at 0 to 8 weeks and 4 years of age (37.3% and 40.1% respectively). Among 12-month-olds with recorded haemoglobin levels, 43% were anaemic. Weight-for-age medians were comparable to World Health Organization (WHO) growth standards until 12 months age, thereafter Aboriginal girls tended to weigh more overtime. Data completeness varied: key variables (date of birth, sex and Aboriginal status) were 100% complete, while others like anthropometrics (up to 62.1%), birth weight (54.2%), gestational age (50.2%), and haemoglobin results (up to 34.1%) were less complete. Average accuracy (99.2%) and consistency of available data (100%) were high. However, crucial data on risk factors, maternal health, and family functioning were either not collected by the service, not provided to the service from external sources, or stored in inaccessible free-text fields.</p><p><strong>Conclusions: </strong>Missing data were the greatest limiting factor for reporting on the health and development of these children. To reap the benefit of utilising EMRs for longitudinal research, the service should continue encouraging families to attend their child's routine health assessments in the first years of life. Setting key data variables as mandatory at each visit may also help increase data completeness over time.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2704"},"PeriodicalIF":1.6,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12212024/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144555221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viviane D Lima, Bronhilda T Takeh, Neil Faught, Hasan Nathani, Jielin Zhu, Scott Emerson, Katerina Dolguikh, Jason Trigg, Kate A Salters, Rolando Barrios, Julio S G Montaner
{"title":"Development and validation of a mortality risk prediction index score for adults living with HIV and multiple chronic comorbidities.","authors":"Viviane D Lima, Bronhilda T Takeh, Neil Faught, Hasan Nathani, Jielin Zhu, Scott Emerson, Katerina Dolguikh, Jason Trigg, Kate A Salters, Rolando Barrios, Julio S G Montaner","doi":"10.23889/ijpds.v10i2.2926","DOIUrl":"10.23889/ijpds.v10i2.2926","url":null,"abstract":"<p><strong>Introduction: </strong>Aging while living with HIV poses new challenges in clinical management, mainly due to the onset of multiple chronic comorbidities. Population-specific risk prediction indices considering comorbidities and other risk factors are essential to comprehensively characterise disease burden among PLWH. We developed and validated a mortality risk prediction index (MRP<i>i</i>) to predict the risk of one-year all-cause mortality among people living with HIV (PLWH).</p><p><strong>Methods: </strong>Participants were ≥18 years and had initiated antiretroviral therapy (ART) between 01/2001 and 12/2018, in British Columbia, Canada. The index date was randomly selected between one-year post-ART initiation and the end of the follow-up. Participants were followed for at least one year from the index date until 12/2019, the last contact date, or the date of death (all-cause), whichever came first. The MRP<i>i</i> included 18 physical/mental comorbidities, demographic and clinical variables, and ranged from 0 (no risk) to 100 (highest risk).</p><p><strong>Results: </strong>The final model demonstrated the highest discrimination (c-statistic 0.8355, 95% CI: 0.8187-0.8523 in the training dataset and 0.7965, 95% CI: 0.7664-0.8266 in the test dataset). The comorbidities with the highest weights in the MRP<i>i</i> were substance use disorders, metastatic solid tumors and non-AIDs defining cancers. For example, for an MRP<i>i</i> of 30, the predicted one-year all-cause mortality was 0.2%, while an MRP<i>i</i> of 50 had a predicted mortality of 2.3%.</p><p><strong>Conclusions: </strong>The MRP<i>i</i> provides a promising tool to assess the risk of short-term mortality among PLWH in the modern ART era that can inform clinical practice and health policy decisions.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 2","pages":"2926"},"PeriodicalIF":1.6,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12212411/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144545179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Romana Burgess, Andy Boyd, Oliver Sp Davis, Louise Ac Millard, Mark Mumme, Sarah Robertson, Andy Skinner, Zhuoni Xiao, Anya Skatova
{"title":"Linking digital footprint data into longitudinal population studies.","authors":"Romana Burgess, Andy Boyd, Oliver Sp Davis, Louise Ac Millard, Mark Mumme, Sarah Robertson, Andy Skinner, Zhuoni Xiao, Anya Skatova","doi":"10.23889/ijpds.v10i1.2946","DOIUrl":"10.23889/ijpds.v10i1.2946","url":null,"abstract":"<p><strong>Background: </strong>Linking digital footprint data into longitudinal population studies (LPS) presents an opportunity to enrich our understanding of how digitally captured behaviours relate to health traits and disease. However, this linkage introduces significant methodological challenges that require systematic exploration.</p><p><strong>Objectives: </strong>To develop a robust framework for successful digital footprint linkage into LPS, informed by discussions from a workshop from the Digital Footprints Conference 2024.</p><p><strong>Methods: </strong>We propose a structured, four-stage framework to facilitate successful linkage of digital footprint data into LPS: (1) understand participant expectations and acceptability; (2) collect and link the data; (3) evaluate properties of the data; and (4) ensure secure and ethical access for research. This framework addresses the key methodological challenges identified at each stage, discussed through the lens of two LPS case studies: the Avon Longitudinal Study of Parents and Children and Generation Scotland.</p><p><strong>Results: </strong>Key methodological challenges identified include privacy and confidentiality concerns, reliance on third-party platforms, data quality issues like missing data and measurement error. We also emphasize the role of trusted research environments and synthetic datasets in enabling secure, privacy-sensitive data sharing for research.</p><p><strong>Conclusions: </strong>While the linkage digital footprint data to LPS remains in early stages, our framework provides a methodological foundation for overcoming current challenges. Through iterative refinement of these methods there is significant potential to advance population-level insights into health and wellbeing.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2946"},"PeriodicalIF":1.6,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12132027/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144217151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Individual, household structure, and socioeconomic predictors of COVID-19 testing and vaccination outcomes: a whole population linked data analysis.","authors":"Nicole Satherley, Andrew Sporle","doi":"10.23889/ijpds.v10i1.2930","DOIUrl":"10.23889/ijpds.v10i1.2930","url":null,"abstract":"<p><strong>Introduction: </strong>The COVID-19 pandemic produced social inequities in health outcomes between and within nations. Reported inequitable COVID-19 outcomes for ethnic minorities and indigenous peoples are likely to be associated in part because of poorer socioeconomic circumstances experienced by these populations. Understanding these associations within national populations is vital for future pandemic management.</p><p><strong>Objective: </strong>This study explores the social inequity of COVID-19 outcomes within New Zealand over the first 3 years of the pandemic. We aimed to identify policy amenable socioeconomic factors associated with COVID-19 outcomes while adjusting for relevant individual factors and household structure. We also aimed to examine whether ethnic group differences are smaller when accounting for these socioeconomic factors and household structure.</p><p><strong>Methods: </strong>Administrative individual-level data for the New Zealand population was analysed to assess COVID-19 health outcomes during 2020 - 2023. The association between individual (e.g. age, ethnicity, disability status), household structure (e.g. household composition) and socioeconomic (e.g. crowding, housing quality, deprivation) factors and four COVID-19 health outcomes - infection, hospitalisation, mortality, and vaccination status was assessed.</p><p><strong>Results: </strong>Indigenous peoples and ethnic minorities experienced worse outcomes across most COVID-19 outcomes. Adjusting for household structure and socioeconomic factors reduced but did not eliminate these inequities between ethnic groups. Housing issues including high housing mobility, poor quality housing, and household crowding were associated with worse outcomes, as were disability status, no primary health care enrolment, lower household income and older age. The size of these effects also differed for different health outcomes.</p><p><strong>Conclusions: </strong>Ethnic inequity was persistent and likely partly explained by policy-modifiable social factors, despite the relatively minor population health impacts of COVID-19 in New Zealand. We also demonstrate how a range of socioeconomic determinants predict COVID-19 outcomes in different ways.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2930"},"PeriodicalIF":1.6,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144162799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Scott, Yan Weigang, Marcella Ucci, Jessica Sheringham
{"title":"<i>Data Note</i>: Challenges when combining housing data from multiple sources to identify overcrowded households.","authors":"Laura Scott, Yan Weigang, Marcella Ucci, Jessica Sheringham","doi":"10.23889/ijpds.v8i2.2927","DOIUrl":"10.23889/ijpds.v8i2.2927","url":null,"abstract":"<p><strong>Background: </strong>This project in one urban local authority in London (England) sought to assess the feasibility of generating locally-derived indices of overcrowding using data available to local councils on the population and their homes.We merged data at household level using the Unique Property Reference Number from publicly available Energy Performance Certificates and commercial property platforms, with data available to councils on the population and their housing characteristics, drawn from multiple sources including council tax bands and council housing databases. Multiple imputation was used to address missing data. Using the dataset, it was possible to generate two indices of overcrowding for households with dependent children, based on the bedroom standard and the space standard, which could be compared with nationally derived estimates.</p><p><strong>Data challenges: </strong>We encountered three challenges with data. 1. Individuals in the population were excluded through linkage with household-level data. 2. Definitions of overcrowding are ambiguous and variably applied. 3. Many local areas face high proportions of missing household data, particularly numbers of bedrooms. We discuss how we addressed such problems and illustrate with a local example how they could affect estimates of overcrowding prevalence.</p><p><strong>Lessons learned: </strong>Further clarity is needed in how bedrooms are defined to compare overcrowding prevalence generated locally and nationally. Access to national records on bedroom numbers would facilitate local areas to identify overcrowding in their own populations. Despite these challenges, we demonstrate it is feasible to generate overcrowding indices that can be useful for researchers and local policy makers seeking to develop or evaluate strategies to address household overcrowding.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 2","pages":"2927"},"PeriodicalIF":1.6,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12093136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew A Jay, Kate Lewis, Difei Shi, Rebecca Langella, Tony Stone, Sorcha Ní Chobhthaigh, Ania Zylbersztejn, Ruth Blackburn, Katie Harron
{"title":"Open science and phenotyping in UK administrative health, education and social care data: the ECHILD phenotype code list repository.","authors":"Matthew A Jay, Kate Lewis, Difei Shi, Rebecca Langella, Tony Stone, Sorcha Ní Chobhthaigh, Ania Zylbersztejn, Ruth Blackburn, Katie Harron","doi":"10.23889/ijpds.v10i2.2943","DOIUrl":"https://doi.org/10.23889/ijpds.v10i2.2943","url":null,"abstract":"<p><p>Administrative health data, such as the Hospital Episode Statistics (HES), can be used to identify groups of people with a particular target condition, a process known as phenotyping. Clinical phenotypes are useful as exposures, covariates and outcomes in research studies using administrative data, including health data linked to other sources such as the Education and Child Health Insights from Linked Data (ECHILD) project. ECHILD brings together HES and other national health datasets with the National Pupil Database and children's social care data for all of England as a data asset that can be accessed by researchers at UK institutions. Because using linked administrative data is complex, the ECHILD team has created additional resources to improve the accessibility of ECHILD. One such initiative is the ECHILD Phenotype Code List Repository. The Repository is a fully open and searchable website containing phenotype code lists that can be used in ECHILD and beyond. As well as a primer on phenotyping, it includes summaries of each code list and R and Stata implementation scripts. The Repository was designed according to a set of principles to ensure that finding and using code lists is easy and standardised. The ECHILD Phenotype Code List Repository is a step forward in the findability and use of phenotype code lists in ECHILD and its constituent datasets.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 2","pages":"2943"},"PeriodicalIF":1.6,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076273/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144079932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max C Keuken, Jizzo R Bosdriesz, Anders Boyd, Elisabeth M den Boogert, Ivo K Joore, Nicole H T M Dukers-Muijrers, Gini van Rijckevorsel, Hannelore M Götz, Irene E Goverse, Mariska W F Petrignani, Stijn F H Raven, Susan van den Hof, Kirsten V C Wevers-de Boer, Maarten F Schim van der Loeff, Amy Matser
{"title":"Spatio-temporal forecasting of COVID-19 cases in the Netherlands for source and contact tracing.","authors":"Max C Keuken, Jizzo R Bosdriesz, Anders Boyd, Elisabeth M den Boogert, Ivo K Joore, Nicole H T M Dukers-Muijrers, Gini van Rijckevorsel, Hannelore M Götz, Irene E Goverse, Mariska W F Petrignani, Stijn F H Raven, Susan van den Hof, Kirsten V C Wevers-de Boer, Maarten F Schim van der Loeff, Amy Matser","doi":"10.23889/ijpds.v10i1.2703","DOIUrl":"https://doi.org/10.23889/ijpds.v10i1.2703","url":null,"abstract":"<p><p>Source and contact tracing (SCT) is a core public health measure that is used to contain the spread of infectious diseases. It aims to identify a source of infection, and to advise those who have been exposed to this source. Due to the rapid increases in incidence of COVID-19 in the Netherlands, the capacity to conduct a full SCT quickly became insufficient. Therefore, the public health services (PHS) might benefit from a restricted strategy targeted to geographical regions where (predicted) case-to-case transmission is high. In this study, we set out to develop a prediction model for the number of COVID-19 cases per postal code within the Netherlands using geographic and demographic features. The study population consists of individuals residing in one of the participating nine Dutch PHS regions who tested positive for SARS-CoV-2 between 1 June 2020 and 27 February 2021. Using a machine learning random forest regression model, we predicted the top 100 postal codes with the highest number of cases with an accuracy of 49% for the current week, 42% for next week, and 44% for two weeks from present. In addition, the age groups of 20-39 and 40-64 years had a higher prediction accuracy than groups outside these age ranges. The developed model provides a starting point for targeted preventive SCT efforts that incorporate geospatial and demographic characteristics of a neighbourhood. It should nonetheless be noted that during the early stages of the outbreak, the number of available datapoints needed to inform such models are likely insufficient. Given the accuracy and data requirements of the developed model, it is unlikely that this class of models can play a pivotal role in informing policy during the early phases of a future epidemic.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2703"},"PeriodicalIF":1.6,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12058245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144040266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kate M Miller, Felicity S Flack, Merran B Smith, Vicki Bennett, Carina Ecremen Marshall
{"title":"Discovering linked data collections through a new national metadata platform.","authors":"Kate M Miller, Felicity S Flack, Merran B Smith, Vicki Bennett, Carina Ecremen Marshall","doi":"10.23889/ijpds.v10i1.2461","DOIUrl":"https://doi.org/10.23889/ijpds.v10i1.2461","url":null,"abstract":"<p><strong>Background: </strong>Metadata plays a crucial role in the health research infrastructure ecosystem. Despite the abundance of metadata for data collections in Australia, the vast and diverse data custodian landscape poses challenges for linked data researchers to find relevant information for multiple data collections, often making it an arduous and time-intensive task.</p><p><strong>Methods: </strong>The project comprised three phases: an initial scoping exercise to understand the current state of metadata and related best practice; a national consultation involving researchers, data linkage staff and data custodians to develop a high-fidelity prototype of a metadata platform; and a final build and implementation phase. The platform underwent several prototyping and testing cycles to refine the digital experience.</p><p><strong>Results: </strong>Expert interviews confirmed that there is a wealth of metadata available, but it is difficult for researchers to access and evaluate. Consultations with researchers identified opportunities to standardise metadata across collections and provide a centralised platform to enhance the discoverability of data collections for research using linked data. High value platform features included searching, browsing and filtering capabilities, data item list metadata, standardised formats, sample data, and frequently asked questions. The final design and functionality reflected user consultations and data custodian input on feasibility.</p><p><strong>Conclusion: </strong>The Population Health Research Network developed a metadata platform to enable researchers to evaluate the suitability of Australian data collections for linked data projects more effectively. The platform has standardised the way in which metadata is presented for data collections nationally. Improved metadata quality, readability and accessibility will save time and enhance the quality of applications for linked data.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2461"},"PeriodicalIF":1.6,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12042732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144020126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amelia Jewell, Matthew Broadbent, Claire Delaney-Pope, Megan Pritchard, Hannah Woods, Robert Stewart
{"title":"Transparency in the existence, use, and output of a mental health data resource: a descriptive paper from the National Institute for Health and Care Research (NIHR) Maudsley Biomedical Research Centre (BRC) Clinical Record Interactive Search (CRIS) Platform.","authors":"Amelia Jewell, Matthew Broadbent, Claire Delaney-Pope, Megan Pritchard, Hannah Woods, Robert Stewart","doi":"10.23889/ijpds.v10i2.2945","DOIUrl":"https://doi.org/10.23889/ijpds.v10i2.2945","url":null,"abstract":"<p><strong>Background: </strong>Transparency in the use of routinely collected mental health data for research is essential in maintaining public support and trust, as well as for supporting the sharing of information and data resources amongst the academic community. The National Institute for Health and Care Research (NIHR) Maudsley Biomedical Research Centre (BRC) Clinical Records Interactive Search (CRIS) enables a case register of deidentified mental health records from the South London and Maudsley NHS Foundation Trust (SLaM). CRIS supports mental health research across the lifespan from children and adolescents to older adults.</p><p><strong>Aim: </strong>This paper aims to describe the activities which contribute to ensuring that transparency is maintained throughout the journey of data in CRIS: from data collection, through application in research, to dissemination of findings.</p><p><strong>Approach: </strong>A communications plan is in place to support Patient and Public Involvement (PPI) and transparency initiatives for all CRIS stakeholders, including patients and carers, academic users, and the general public. Activities can be divided into three categories of transparency: existence, use, and output.</p><p><strong>Discussion: </strong>There are challenges to maintaining transparency, including ensuring that activities are varied enough to reach all stakeholders, including harder to reach groups, and presenting information in a way that is appropriate for the relevant audience. However, greater transparency has led to more opportunities for researchers to engage with patients and the CRIS model is widely accepted by patients.</p><p><strong>Conclusion: </strong>This paper set out to describe CRIS communications and transparency activities. We believe the material covered will be of interest to other providers of routinely collected data for research.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"10 1","pages":"2945"},"PeriodicalIF":1.6,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12076277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144079829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data resource profile: a guide for constructing school-to-work sequence analysis trajectories using the longitudinal education outcomes (LEO) data.","authors":"Shivani Sickotra","doi":"10.23889/ijpds.v8i6.2953","DOIUrl":"10.23889/ijpds.v8i6.2953","url":null,"abstract":"<p><strong>Introduction: </strong>Sequence analysis is a powerful methodology for examining longitudinal school-to-work trajectories. Despite its growing use, there is limited guidance on preparing suitable datasets. This resource details the creation of a dataset specifically designed for sequence analysis, capturing yearly education and employment activity states for 556,182 individuals from England's 2010/11 school-leaver cohort.</p><p><strong>Methods: </strong>The dataset was constructed using the Department for Education's Longitudinal Education Outcomes (LEO) data. SQL was used to extract relevant variables, and data linkage and preprocessing was performed using R. Data processing was tailored to sequence analysis, including reducing the number of activity states and applying a hierarchy to integrate education and employment data.</p><p><strong>Results: </strong>The resulting dataset spans activities from the first non-compulsory state in 2011/12 until 2018/19, tracking trajectories from ages 16/17 to 23/24. The dataset was designed with the ability to subset school-leavers by their initial Combined Authority residence to aid in regional analysis of school-to-work trajectories. Individual-level socio-demographic characteristics that can be linked to the longitudinal activity histories were also built, alongside longitudinal geographic locations and employment earnings data. Additionally, the limitations of the developed data are discussed.</p><p><strong>Conclusion: </strong>This resource provides crucial guidance for researchers and practitioners who may require experience preparing input datasets for sequence analysis, addressing the current gap in available resources. By offering step-by-step instructions and shared code, it empowers users to recreate or adapt the dataset for their specific research needs. Its ability to subset by region further supports localised and comparative studies of school-to-work trajectories, making it a valuable tool for advancing existing research. The LEO data can be accessed by application through the Office for National Statistics Secure Research Service.</p>","PeriodicalId":36483,"journal":{"name":"International Journal of Population Data Science","volume":"8 6","pages":"2953"},"PeriodicalIF":1.6,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11935648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143711548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}