{"title":"Disease Comorbidity Linkages between MEDLINE and Patient Data","authors":"Tejaswi Rohit Anupindi, P. Srinivasan","doi":"10.1109/ICHI.2017.48","DOIUrl":null,"url":null,"abstract":"This paper presents an analysis of a class of inferred links between MEDLINE and patient data. Records in the two datasets are linked via pairs of disease associations with a view to emphasizing disease comorbidities. In MEDLINE disease pairs are extracted by mining specific patterns such as MeSH disease term 1/etiology and MeSH disease term 2/complications. 701,780 pairs are extracted by our pattern set from a 2017 download of MEDLINE with close to 27 million records. The patient data, obtained from another study, has 6,088,553 disease cooccurrence pairs. Our methodology to infer connections involves mapping ICD9 codes and MeSH terms to UMLS concept ids followed by both exact and approximate matching strategies. The approximate matching strategy involves semantic relations present in the UMLS. We are able to connect 2,478,366 patient disease pairs encoded using 5 digit ICD9 codes to MEDLINE pairs (and therefore to the corresponding documents) and 536,685 MEDLINE disease pairs onto the patient disease pairs (and therefore implicitly to the corresponding patient records). While these numbers are large the percentages are between 43% and 77%. This indicates that other approaches for linking the two datasets would be of interest. Moreover, comorbidity is a particular viewpoint among many options. We suggest that the study of inferred links between biomedical datasets - especially between core datasets - is of great value in terms of enriching the biomedical web of knowledge.","PeriodicalId":263611,"journal":{"name":"2017 IEEE International Conference on Healthcare Informatics (ICHI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Healthcare Informatics (ICHI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICHI.2017.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper presents an analysis of a class of inferred links between MEDLINE and patient data. Records in the two datasets are linked via pairs of disease associations with a view to emphasizing disease comorbidities. In MEDLINE disease pairs are extracted by mining specific patterns such as MeSH disease term 1/etiology and MeSH disease term 2/complications. 701,780 pairs are extracted by our pattern set from a 2017 download of MEDLINE with close to 27 million records. The patient data, obtained from another study, has 6,088,553 disease cooccurrence pairs. Our methodology to infer connections involves mapping ICD9 codes and MeSH terms to UMLS concept ids followed by both exact and approximate matching strategies. The approximate matching strategy involves semantic relations present in the UMLS. We are able to connect 2,478,366 patient disease pairs encoded using 5 digit ICD9 codes to MEDLINE pairs (and therefore to the corresponding documents) and 536,685 MEDLINE disease pairs onto the patient disease pairs (and therefore implicitly to the corresponding patient records). While these numbers are large the percentages are between 43% and 77%. This indicates that other approaches for linking the two datasets would be of interest. Moreover, comorbidity is a particular viewpoint among many options. We suggest that the study of inferred links between biomedical datasets - especially between core datasets - is of great value in terms of enriching the biomedical web of knowledge.