病人识别和肿瘤识别管理:癌症多中心临床数据仓库的质量项目。

IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Cancer Informatics Pub Date : 2023-01-01 DOI:10.1177/11769351231172609

Karine Pallier, Olivier Prot, Simone Naldi, Francisco Silva, Thierry Denis, Olivier Giry, Sophie Leobon, Elise Deluche, Nicole Tubiana-Mathieu

{"title":"病人识别和肿瘤识别管理:癌症多中心临床数据仓库的质量项目。","authors":"Karine Pallier, Olivier Prot, Simone Naldi, Francisco Silva, Thierry Denis, Olivier Giry, Sophie Leobon, Elise Deluche, Nicole Tubiana-Mathieu","doi":"10.1177/11769351231172609","DOIUrl":null,"url":null,"abstract":"Background: The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.Purpose: To develop algorithms matching heterogeneous data to \"real\" patients and \"real\" tumors with respect to patient identification (PI) and tumor identification (TI).Methods: A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.Results: Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).Conclusion: The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":"22 ","pages":"11769351231172609"},"PeriodicalIF":2.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f9/25/10.1177_11769351231172609.PMC10201142.pdf","citationCount":"0","resultStr":"{\"title\":\"Patient Identification and Tumor Identification Management: Quality Program in a Cancer Multicentric Clinical Data Warehouse.\",\"authors\":\"Karine Pallier, Olivier Prot, Simone Naldi, Francisco Silva, Thierry Denis, Olivier Giry, Sophie Leobon, Elise Deluche, Nicole Tubiana-Mathieu\",\"doi\":\"10.1177/11769351231172609\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.Purpose: To develop algorithms matching heterogeneous data to \\\"real\\\" patients and \\\"real\\\" tumors with respect to patient identification (PI) and tumor identification (TI).Methods: A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.Results: Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).Conclusion: The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.\",\"PeriodicalId\":35418,\"journal\":{\"name\":\"Cancer Informatics\",\"volume\":\"22 \",\"pages\":\"11769351231172609\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/f9/25/10.1177_11769351231172609.PMC10201142.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/11769351231172609\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351231172609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景:区域实体瘤基础(RBST)是一个临床数据仓库，集中了法国2个部门5家卫生机构的癌症患者护理相关信息。目的:开发在患者识别(PI)和肿瘤识别(TI)方面将异构数据与“真实”患者和“真实”肿瘤匹配的算法。方法:采用java Neo4j编程的图形数据库构建约2万例患者的RBST数据。使用Levenshtein距离的PI算法基于识别患者的监管标准。TI算法基于6个特征:肿瘤的位置和侧边性、诊断日期、组织学、原发和转移状态。鉴于所收集数据的异构性质和语义，需要创建存储库(器官、同义词和组织学存储库)。TI算法使用Dice系数来匹配肿瘤。结果:如果患者的名字、姓氏、性别和出生日期/月/年完全一致，则患者匹配。这些参数的权重分别为28%、28%、21%和23%(年为18%，月为2.5%，日为2.5%)。该算法的灵敏度为99.69%(95%置信区间[CI][98.89%， 99.96%])，特异性为100% (95% CI[99.72%， 100%])。TI算法使用存储库，将权重分配给诊断日期和相关器官(分别为37.5%和37.5%)、侧边性(16%)、组织学(5%)和转移状态(4%)。该算法的灵敏度为71% (95% CI[62.68%， 78.25%])，特异性为100% (95% CI[94.31%， 100%])。结论:RBST包括PI和TI两种质量控制。它有助于实施横向结构和评估所提供护理的绩效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Patient Identification and Tumor Identification Management: Quality Program in a Cancer Multicentric Clinical Data Warehouse.

查看原文本刊更多论文

Patient Identification and Tumor Identification Management: Quality Program in a Cancer Multicentric Clinical Data Warehouse.

Background: The Regional Basis of Solid Tumor (RBST), a clinical data warehouse, centralizes information related to cancer patient care in 5 health establishments in 2 French departments.

Purpose: To develop algorithms matching heterogeneous data to "real" patients and "real" tumors with respect to patient identification (PI) and tumor identification (TI).

Methods: A graph database programed in java Neo4j was used to build the RBST with data from ~20 000 patients. The PI algorithm using the Levenshtein distance was based on the regulatory criteria identifying a patient. A TI algorithm was built on 6 characteristics: tumor location and laterality, date of diagnosis, histology, primary and metastatic status. Given the heterogeneous nature and semantics of the collected data, the creation of repositories (organ, synonym, and histology repositories) was required. The TI algorithm used the Dice coefficient to match tumors.

Results: Patients matched if there was complete agreement of the given name, surname, sex, and date/month/year of birth. These parameters were assigned weights of 28%, 28%, 21%, and 23% (with 18% for year, 2.5% for month, and 2.5% for day), respectively. The algorithm had a sensitivity of 99.69% (95% confidence interval [CI] [98.89%, 99.96%]) and a specificity of 100% (95% CI [99.72%, 100%]). The TI algorithm used repositories, weights were assigned to the diagnosis date and associated organ (37.5% and 37.5%, respectively), laterality (16%) histology (5%), and metastatic status (4%). This algorithm had a sensitivity of 71% (95% CI [62.68%, 78.25%]) and a specificity of 100% (95% CI [94.31%, 100%]).

Conclusion: The RBST encompasses 2 quality controls: PI and TI. It facilitates the implementation of transversal structuring and assessments of the performance of the provided care.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cancer Informatics Medicine-Oncology

CiteScore

3.00

自引率

5.00%

发文量

审稿时长

8 weeks

期刊介绍： The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.