Kerstin Denecke, Octavio Rivera Romero, Carolyn Petersen, Marge Benham-Hutchins, Miguel Cabrer, Shauna Davies, Rebecca Grainger, Rada Hussein, Guillermo Lopez-Campos, Fernando Martin-Sanchez, Mollie McKillop, Mark Merolli, Talya Miron-Shatz, Jesús Daniel Trigo, Graham Wright, Rolf Wynn, Carol Hullin Lucay Cossio, Elia Gabarron
{"title":"Defining and Scoping Participatory Health Informatics: An eDelphi Study.","authors":"Kerstin Denecke, Octavio Rivera Romero, Carolyn Petersen, Marge Benham-Hutchins, Miguel Cabrer, Shauna Davies, Rebecca Grainger, Rada Hussein, Guillermo Lopez-Campos, Fernando Martin-Sanchez, Mollie McKillop, Mark Merolli, Talya Miron-Shatz, Jesús Daniel Trigo, Graham Wright, Rolf Wynn, Carol Hullin Lucay Cossio, Elia Gabarron","doi":"10.1055/a-2035-3008","DOIUrl":"https://doi.org/10.1055/a-2035-3008","url":null,"abstract":"<p><strong>Background: </strong>Health care has evolved to support the involvement of individuals in decision making by, for example, using mobile apps and wearables that may help empower people to actively participate in their treatment and health monitoring. While the term \"participatory health informatics\" (PHI) has emerged in literature to describe these activities, along with the use of social media for health purposes, the scope of the research field of PHI is not yet well defined.</p><p><strong>Objective: </strong>This article proposes a preliminary definition of PHI and defines the scope of the field.</p><p><strong>Methods: </strong>We used an adapted Delphi study design to gain consensus from participants on a definition developed from a previous review of literature. From the literature we derived a set of attributes describing PHI as comprising 18 characteristics, 14 aims, and 4 relations. We invited researchers, health professionals, and health informaticians to score these characteristics and aims of PHI and their relations to other fields over three survey rounds. In the first round participants were able to offer additional attributes for voting.</p><p><strong>Results: </strong>The first round had 44 participants, with 28 participants participating in all three rounds. These 28 participants were gender-balanced and comprised participants from industry, academia, and health sectors from all continents. Consensus was reached on 16 characteristics, 9 aims, and 6 related fields.</p><p><strong>Discussion: </strong>The consensus reached on attributes of PHI describe PHI as a multidisciplinary field that uses information technology and delivers tools with a focus on individual-centered care. It studies various effects of the use of such tools and technology. Its aims address the individuals in the role of patients, but also the health of a society as a whole. There are relationships to the fields of health informatics, digital health, medical informatics, and consumer health informatics.</p><p><strong>Conclusion: </strong>We have proposed a preliminary definition, aims, and relationships of PHI based on literature and expert consensus. These can begin to be used to support development of research priorities and outcomes measurements.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"90-99"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/67/87/10-1055-a-2035-3008.PMC10462430.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10139697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Alternative Application of Natural Language Processing to Express a Characteristic Feature of Diseases in Japanese Medical Records.","authors":"Yoshinori Yamanouchi, Taishi Nakamura, Tokunori Ikeda, Koichiro Usuku","doi":"10.1055/a-2039-3773","DOIUrl":"https://doi.org/10.1055/a-2039-3773","url":null,"abstract":"<p><strong>Background: </strong>Owing to the linguistic situation, Japanese natural language processing (NLP) requires morphological analyses for word segmentation using dictionary techniques.</p><p><strong>Objective: </strong>We aimed to clarify whether it can be substituted with an open-end discovery-based NLP (OD-NLP), which does not use any dictionary techniques.</p><p><strong>Methods: </strong>Clinical texts at the first medical visit were collected for comparison of OD-NLP with word dictionary-based-NLP (WD-NLP). Topics were generated in each document using a topic model, which later corresponded to the respective diseases determined in International Statistical Classification of Diseases and Related Health Problems 10 revision. The prediction accuracy and expressivity of each disease were examined in equivalent number of entities/words after filtration with either term frequency and inverse document frequency (TF-IDF) or dominance value (DMV).</p><p><strong>Results: </strong>In documents from 10,520 observed patients, 169,913 entities and 44,758 words were segmented using OD-NLP and WD-NLP, simultaneously. Without filtering, accuracy and recall levels were low, and there was no difference in the harmonic mean of the F-measure between NLPs. However, physicians reported OD-NLP contained more meaningful words than WD-NLP. When datasets were created in an equivalent number of entities/words with TF-IDF, F-measure in OD-NLP was higher than WD-NLP at lower thresholds. When the threshold increased, the number of datasets created decreased, resulting in increased values of F-measure, although the differences disappeared. Two datasets near the maximum threshold showing differences in F-measure were examined whether their topics were associated with diseases. The results showed that more diseases were found in OD-NLP at lower thresholds, indicating that the topics described characteristics of diseases. The superiority remained as much as that of TF-IDF when filtration was changed to DMV.</p><p><strong>Conclusion: </strong>The current findings prefer the use of OD-NLP to express characteristics of diseases from Japanese clinical texts and may help in the construction of document summaries and retrieval in clinical settings.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"110-118"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/2b/3b/10-1055-a-2039-3773.PMC10462427.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10141870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ibrahim Dalhatu, Chinedu Aniekwe, Adebobola Bashorun, Alhassan Abdulkadir, Emilio Dirlikov, Stephen Ohakanu, Oluwasanmi Adedokun, Ademola Oladipo, Ibrahim Jahun, Lisa Murie, Steven Yoon, Mubarak G Abdu-Aguye, Ahmed Sylvanus, Samuel Indyer, Isah Abbas, Mustapha Bello, Nannim Nalda, Matthias Alagi, Solomon Odafe, Sylvia Adebajo, Otse Ogorry, Murphy Akpu, Ifeanyi Okoye, Kunle Kakanfo, Amobi Andrew Onovo, Gregory Ashefor, Charles Nzelu, Akudo Ikpeazu, Gambo Aliyu, Tedd Ellerbrock, Mary Boyd, Kristen A Stafford, Mahesh Swaminathan
{"title":"From Paper Files to Web-Based Application for Data-Driven Monitoring of HIV Programs: Nigeria's Journey to a National Data Repository for Decision-Making and Patient Care.","authors":"Ibrahim Dalhatu, Chinedu Aniekwe, Adebobola Bashorun, Alhassan Abdulkadir, Emilio Dirlikov, Stephen Ohakanu, Oluwasanmi Adedokun, Ademola Oladipo, Ibrahim Jahun, Lisa Murie, Steven Yoon, Mubarak G Abdu-Aguye, Ahmed Sylvanus, Samuel Indyer, Isah Abbas, Mustapha Bello, Nannim Nalda, Matthias Alagi, Solomon Odafe, Sylvia Adebajo, Otse Ogorry, Murphy Akpu, Ifeanyi Okoye, Kunle Kakanfo, Amobi Andrew Onovo, Gregory Ashefor, Charles Nzelu, Akudo Ikpeazu, Gambo Aliyu, Tedd Ellerbrock, Mary Boyd, Kristen A Stafford, Mahesh Swaminathan","doi":"10.1055/s-0043-1768711","DOIUrl":"https://doi.org/10.1055/s-0043-1768711","url":null,"abstract":"<p><strong>Background: </strong>Timely and reliable data are crucial for clinical, epidemiologic, and program management decision making. Electronic health information systems provide platforms for managing large longitudinal patient records. Nigeria implemented the National Data Repository (NDR) to create a central data warehouse of all people living with human immunodeficiency virus (PLHIV) while providing useful functionalities to aid decision making at different levels of program implementation.</p><p><strong>Objective: </strong>We describe the Nigeria NDR and its development process, including its use for surveillance, research, and national HIV program monitoring toward achieving HIV epidemic control.</p><p><strong>Methods: </strong>Stakeholder engagement meetings were held in 2013 to gather information on data elements and vocabulary standards for reporting patient-level information, technical infrastructure, human capacity requirements, and information flow. Findings from these meetings guided the development of the NDR. An implementation guide provided common terminologies and data reporting structures for data exchange between the NDR and the electronic medical record (EMR) systems. Data from the EMR were encoded in extensible markup language and sent to the NDR over secure hypertext transfer protocol after going through a series of validation processes.</p><p><strong>Results: </strong>By June 30, 2021, the NDR had up-to-date records of 1,477,064 (94.4%) patients receiving HIV treatment across 1,985 health facilities, of which 1,266,512 (85.7%) patient records had fingerprint template data to support unique patient identification and record linkage to prevent registration of the same patient under different identities. Data from the NDR was used to support HIV program monitoring, case-based surveillance and production of products like the monthly lists of patients who have treatment interruptions and dashboards for monitoring HIV test and start.</p><p><strong>Conclusion: </strong>The NDR enabled the availability of reliable and timely data for surveillance, research, and HIV program monitoring to guide program improvements to accelerate progress toward epidemic control.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"130-139"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/b5/f9/10-1055-s-0043-1768711.PMC10462428.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10136836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rare Diseases in Hospital Information Systems-An Interoperable Methodology for Distributed Data Quality Assessments.","authors":"Kais Tahar, Tamara Martin, Yongli Mou, Raphael Verbuecheln, Holm Graessner, Dagmar Krefting","doi":"10.1055/a-2006-1018","DOIUrl":"https://doi.org/10.1055/a-2006-1018","url":null,"abstract":"<p><strong>Background: </strong>Multisite research networks such as the project \"Collaboration on Rare Diseases\" connect various hospitals to obtain sufficient data for clinical research. However, data quality (DQ) remains a challenge for the secondary use of data recorded in different health information systems. High levels of DQ as well as appropriate quality assessment methods are needed to support the reuse of such distributed data.</p><p><strong>Objectives: </strong>The aim of this work is the development of an interoperable methodology for assessing the quality of data recorded in heterogeneous sources to improve the quality of rare disease (RD) documentation and support clinical research.</p><p><strong>Methods: </strong>We first developed a conceptual framework for DQ assessment. Using this theoretical guidance, we implemented a software framework that provides appropriate tools for calculating DQ metrics and for generating local as well as cross-institutional reports. We further applied our methodology on synthetic data distributed across multiple hospitals using Personal Health Train. Finally, we used precision and recall as metrics to validate our implementation.</p><p><strong>Results: </strong>Four DQ dimensions were defined and represented as disjunct ontological categories. Based on these top dimensions, 9 DQ concepts, 10 DQ indicators, and 25 DQ parameters were developed and applied to different data sets. Randomly introduced DQ issues were all identified and reported automatically. The generated reports show the resulting DQ indicators and detected DQ issues.</p><p><strong>Conclusion: </strong>We have shown that our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The developed frameworks provide useful methods for interoperable and privacy-preserving assessments of DQ that meet the specified requirements. This study has demonstrated that our methodology is capable of detecting DQ issues such as ambiguity or implausibility of coded diagnoses. It can be used for DQ benchmarking to improve the quality of RD documentation and to support clinical research on distributed data.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"71-89"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10462432/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10138370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgos Koliopanos, Francisco Ojeda, Andreas Ziegler
{"title":"A Simple-to-Use R Package for Mimicking Study Data by Simulations.","authors":"Giorgos Koliopanos, Francisco Ojeda, Andreas Ziegler","doi":"10.1055/a-2048-7692","DOIUrl":"https://doi.org/10.1055/a-2048-7692","url":null,"abstract":"<p><strong>Background: </strong>Data protection policies might prohibit the transfer of existing study data to interested research groups. To overcome legal restrictions, simulated data can be transferred that mimic the structure but are different from the existing study data.</p><p><strong>Objectives: </strong>The aim of this work is to introduce the simple-to-use R package Mock Data Generation (modgo) that may be used for simulating data from existing study data for continuous, ordinal categorical, and dichotomous variables.</p><p><strong>Methods: </strong>The core is to combine rank inverse normal transformation with the calculation of a correlation matrix for all variables. Data can then be simulated from a multivariate normal and transferred back to the original scale of the variables. Unique features of modgo are that it allows to change the correlation between variables, to perform perturbation analysis, to handle multicenter data, and to change inclusion/exclusion criteria by selecting specific values of one or a set of variables. Simulation studies on real data demonstrate the validity and flexibility of modgo.</p><p><strong>Results: </strong>modgo mimicked the structure of the original study data. Results of modgo were similar with those from two other existing packages in standard simulation scenarios. modgo's flexibility was demonstrated on several expansions.</p><p><strong>Conclusion: </strong>The R package modgo is useful when existing study data may not be shared. Its perturbation expansion permits to simulate truly anonymized subjects. The expansion to multicenter studies can be used for validating prediction models. Additional expansions can support the unraveling of associations even in large study data and can be useful in power calculations.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 3-04","pages":"119-129"},"PeriodicalIF":1.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/75/40/10-1055-a-2048-7692.PMC10462429.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10492948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, Debbie Rankin
{"title":"Synthetic Tabular Data Evaluation in the Health Domain Covering Resemblance, Utility, and Privacy Dimensions.","authors":"Mikel Hernadez, Gorka Epelde, Ane Alberdi, Rodrigo Cilla, Debbie Rankin","doi":"10.1055/s-0042-1760247","DOIUrl":"https://doi.org/10.1055/s-0042-1760247","url":null,"abstract":"<p><strong>Background: </strong>Synthetic tabular data generation is a potentially valuable technology with great promise for data augmentation and privacy preservation. However, prior to adoption, an empirical assessment of generated synthetic tabular data is required across dimensions relevant to the target application to determine its efficacy. A lack of standardized and objective evaluation and benchmarking strategy for synthetic tabular data in the health domain has been found in the literature.</p><p><strong>Objective: </strong>The aim of this paper is to identify key dimensions, per dimension metrics, and methods for evaluating synthetic tabular data generated with different techniques and configurations for health domain application development and to provide a strategy to orchestrate them.</p><p><strong>Methods: </strong>Based on the literature, the resemblance, utility, and privacy dimensions have been prioritized, and a collection of metrics and methods for their evaluation are orchestrated into a complete evaluation pipeline. This way, a guided and comparative assessment of generated synthetic tabular data can be done, categorizing its quality into three categories (\"<i>Excellent,</i>\" \"<i>Good,</i>\" and \"<i>Poor</i>\"). Six health care-related datasets and four synthetic tabular data generation approaches have been chosen to conduct an analysis and evaluation to verify the utility of the proposed evaluation pipeline.</p><p><strong>Results: </strong>The synthetic tabular data generated with the four selected approaches has maintained resemblance, utility, and privacy for most datasets and synthetic tabular data generation approach combination. In several datasets, some approaches have outperformed others, while in other datasets, more than one approach has yielded the same performance.</p><p><strong>Conclusion: </strong>The results have shown that the proposed pipeline can effectively be used to evaluate and benchmark the synthetic tabular data generated by various synthetic tabular data generation approaches. Therefore, this pipeline can support the scientific community in selecting the most suitable synthetic tabular data generation approaches for their data and application of interest.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e19-e38"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/31/67/10-1055-s-0042-1760247.PMC10306449.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aligning Semantic Interoperability Frameworks with the FOXS Stack for FAIR Health Data.","authors":"John Meredith, Nicola Whitehead, Michael Dacey","doi":"10.1055/a-1993-8036","DOIUrl":"https://doi.org/10.1055/a-1993-8036","url":null,"abstract":"<p><strong>Background: </strong>FAIR Guiding Principles present a synergy with the use cases for digital health records, in that clinical data need to be found, accessible within a range of environments, and data must interoperate between systems and subsequently reused. The use of HL7 FHIR, openEHR, IHE XDS, and SNOMED CT (FOXS) together represents a specification to create an open digital health platform for modern health care applications.</p><p><strong>Objectives: </strong>To describe where logical FOXS components align to the European Open Science Cloud Interoperability Framework (EOSC-IF) reference architecture for semantic interoperability. This should provide a means of defining if FOXS aligns to FAIR principles and to establish the data models and structures that support longitudinal care records as being fit to underpin scientific research.</p><p><strong>Methods: </strong>The EOSC-IF Semantic View is a representation of semantic interoperability where meaning is preserved between systems and users. This was analyzed and cross-referenced with FOXS architectural components, mapping concepts, and objects that describe content such as catalogues and semantic artifacts.</p><p><strong>Results: </strong>Majority of conceptual Semantic View components were featured within the FOXS architecture. Semantic Business Objects are composed of a range of elements such as openEHR archetypes and templates, FHIR resources and profiles, SNOMED CT concepts, and XDS document identifiers. Semantic Functional Content comprises catalogues of metadata that were also supported by openEHR and FHIR tools.</p><p><strong>Conclusions: </strong>Despite some elements of EOSC-IF being vague (e.g., FAIR Digital Object), there was a broad conformance to the framework concepts and the components of a FOXS platform. This work supports a health-domain-specific view of semantic interoperability and how this may be achieved to support FAIR data for health research via a standardized framework.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e39-e46"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/a3/76/10-1055-a-1993-8036.PMC10306448.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9786736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henriette Rau, Dana Stahl, Anna-Juliana Reichel, Martin Bialke, Thomas Bahls, Wolfgang Hoffmann
{"title":"We Know What You Agreed To, Don't We?-Evaluating the Quality of Paper-Based Consents Forms and Their Digitalized Equivalent Using the Example of the Baltic Fracture Competence Centre Project.","authors":"Henriette Rau, Dana Stahl, Anna-Juliana Reichel, Martin Bialke, Thomas Bahls, Wolfgang Hoffmann","doi":"10.1055/s-0042-1760249","DOIUrl":"https://doi.org/10.1055/s-0042-1760249","url":null,"abstract":"<p><strong>Introduction: </strong>The informed consent is the legal basis for research with human subjects. Therefore, the consent form (CF) as legally binding document must be valid, that is, be completely filled-in stating the person's decision clearly and signed by the respective person. However, especially paper-based CFs might have quality issues and the transformation into machine-readable information could add to low quality. This paper evaluates the quality and arising quality issues of paper-based CFs using the example of the Baltic Fracture Competence Centre (BFCC) fracture registry. It also evaluates the impact of quality assurance (QA) measures including giving site-specific feedback. Finally, it answers the question whether manual data entry of patients' decisions by clinical staff leads to a significant error rate in digitalized paper-based CFs.</p><p><strong>Methods: </strong>Based on defined quality criteria, monthly QA including source data verification was conducted by two individual reviewers since the start of recruitment in December 2017. Basis for the analyses are the CFs collected from December 2017 until February 2019 (first recruitment period).</p><p><strong>Results: </strong>After conducting QA internally, the sudden increase of quality issues in May 2018 led to site-specific feedback reports and follow-up training regarding the CFs' quality starting in June 2018. Specific criteria and descriptions on how to correct the CFs helped in increasing the quality in a timely matter. Most common issues were missing pages, decisions regarding optional modules, and signature(s). Since patients' datasets without valid CFs must be deleted, QA helped in retaining 65 datasets for research so that the final datapool consisted of 840 (99.29%) patients.</p><p><strong>Conclusion: </strong>All quality issues could be assigned to one predefined criterion. Using the example of the BFCC fracture registry, CF-QA proved to significantly increase CF quality and help retain the number of available datasets for research. Consequently, the described quality indicators, criteria, and QA processes can be seen as the best practice approach.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e10-e18"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/05/82/10-1055-s-0042-1760249.PMC10306442.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9789345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khalid O Yusuf, Olga Miljukov, Anne Schoneberg, Sabine Hanß, Martin Wiesenfeldt, Melanie Stecher, Lazar Mitrov, Sina Marie Hopff, Sarah Steinbrecher, Florian Kurth, Thomas Bahmer, Stefan Schreiber, Daniel Pape, Anna-Lena Hofmann, Mirjam Kohls, Stefan Störk, Hans Christian Stubbe, Johannes J Tebbe, Johannes C Hellmuth, Johanna Erber, Lilian Krist, Siegbert Rieg, Lisa Pilgram, Jörg J Vehreschild, Jens-Peter Reese, Dagmar Krefting
{"title":"Consistency as a Data Quality Measure for German Corona Consensus Items Mapped from National Pandemic Cohort Network Data Collections.","authors":"Khalid O Yusuf, Olga Miljukov, Anne Schoneberg, Sabine Hanß, Martin Wiesenfeldt, Melanie Stecher, Lazar Mitrov, Sina Marie Hopff, Sarah Steinbrecher, Florian Kurth, Thomas Bahmer, Stefan Schreiber, Daniel Pape, Anna-Lena Hofmann, Mirjam Kohls, Stefan Störk, Hans Christian Stubbe, Johannes J Tebbe, Johannes C Hellmuth, Johanna Erber, Lilian Krist, Siegbert Rieg, Lisa Pilgram, Jörg J Vehreschild, Jens-Peter Reese, Dagmar Krefting","doi":"10.1055/a-2006-1086","DOIUrl":"https://doi.org/10.1055/a-2006-1086","url":null,"abstract":"<p><strong>Background: </strong>As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies.</p><p><strong>Objectives: </strong>The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models.</p><p><strong>Methods: </strong>All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their<i>-</i>defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source.</p><p><strong>Results: </strong>Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%.</p><p><strong>Conclusion: </strong>An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e47-e56"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/4d/05/10-1055-a-2006-1086.PMC10306447.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9842097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Targeted Data Quality Analysis for a Clinical Decision Support System for SIRS Detection in Critically Ill Pediatric Patients.","authors":"Erik Tute, Marcel Mast, Antje Wulff","doi":"10.1055/s-0042-1760238","DOIUrl":"https://doi.org/10.1055/s-0042-1760238","url":null,"abstract":"<p><strong>Background: </strong>Data quality issues can cause false decisions of clinical decision support systems (CDSSs). Analyzing local data quality has the potential to prevent data quality-related failure of CDSS adoption.</p><p><strong>Objectives: </strong>To define a shareable set of applicable measurement methods (MMs) for a targeted data quality assessment determining the suitability of local data for our CDSS.</p><p><strong>Methods: </strong>We derived task-specific MMs using four approaches: (1) a GUI-based data quality analysis using the open source tool <i>openCQA</i>. (2) Analyzing cases of known false CDSS decisions. (3) Data-driven learning on MM-results. (4) A systematic check to find blind spots in our set of MMs based on the <i>HIDQF</i> data quality framework. We expressed the derived data quality-related knowledge about the CDSS using the 5-tuple-formalization for MMs.</p><p><strong>Results: </strong>We identified some task-specific dataset characteristics that a targeted data quality assessment for our use case should inspect. Altogether, we defined 394 MMs organized in 13 data quality knowledge bases.</p><p><strong>Conclusions: </strong>We have created a set of shareable, applicable MMs that can support targeted data quality assessment for CDSS-based systemic inflammatory response syndrome (SIRS) detection in critically ill, pediatric patients. With the demonstrated approaches for deriving and expressing task-specific MMs, we intend to help promoting targeted data quality assessment as a commonly recognized usual part of research on data-consuming application systems in health care.</p>","PeriodicalId":49822,"journal":{"name":"Methods of Information in Medicine","volume":"62 S 01","pages":"e1-e9"},"PeriodicalIF":1.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/23/e5/10-1055-s-0042-1760238.PMC10306443.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10163000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}