João Cardoso, Leyla J Castro, Fajar J Ekaputra, Marie C Jacquemot, Marek Suchánek, Tomasz Miksa, José Borbinha
{"title":"DCSO: towards an ontology for machine-actionable data management plans.","authors":"João Cardoso, Leyla J Castro, Fajar J Ekaputra, Marie C Jacquemot, Marek Suchánek, Tomasz Miksa, José Borbinha","doi":"10.1186/s13326-022-00274-4","DOIUrl":"https://doi.org/10.1186/s13326-022-00274-4","url":null,"abstract":"<p><p>The concept of Data Management Plan (DMP) has emerged as a fundamental tool to help researchers through the systematical management of data. The Research Data Alliance DMP Common Standard (DCS) working group developed a set of universal concepts characterising a DMP so it can be represented as a machine-actionable artefact, i.e., machine-actionable Data Management Plan (maDMP). The technology-agnostic approach of the current maDMP specification: (i) does not explicitly link to related data models or ontologies, (ii) has no standardised way to describe controlled vocabularies, and (iii) is extensible but has no clear mechanism to distinguish between the core specification and its extensions.This paper reports on a community effort to create the DMP Common Standard Ontology (DCSO) as a serialisation of the DCS core concepts, with a particular focus on a detailed description of the components of the ontology. Our initial result shows that the proposed DCSO can become a suitable candidate for a reference serialisation of the DMP Common Standard.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"21"},"PeriodicalIF":1.9,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9327208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40625989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samar Binkheder, Heng-Yi Wu, Sara K Quinney, Shijun Zhang, Md Muntasir Zitu, Chien-Wei Chiang, Lei Wang, Josette Jones, Lang Li
{"title":"Correction: PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature.","authors":"Samar Binkheder, Heng-Yi Wu, Sara K Quinney, Shijun Zhang, Md Muntasir Zitu, Chien-Wei Chiang, Lei Wang, Josette Jones, Lang Li","doi":"10.1186/s13326-022-00275-3","DOIUrl":"https://doi.org/10.1186/s13326-022-00275-3","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"20"},"PeriodicalIF":1.9,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9297579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40539574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F de Keizer, Ronald Cornet
{"title":"Performance assessment of ontology matching systems for FAIR data.","authors":"Philip van Damme, Jesualdo Tomás Fernández-Breis, Nirupama Benis, Jose Antonio Miñarro-Gimenez, Nicolette F de Keizer, Ronald Cornet","doi":"10.1186/s13326-022-00273-5","DOIUrl":"https://doi.org/10.1186/s13326-022-00273-5","url":null,"abstract":"<p><strong>Background: </strong>Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision.</p><p><strong>Results: </strong>We included three ontologies (NCIt, SNOMED CT, ORDO) and three matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). We evaluated the performance of the matching systems against reference alignments from BioPortal and the Unified Medical Language System Metathesaurus (UMLS). Then, we analyzed the top-level ancestors of matched classes, to detect incorrect mappings without consulting a reference alignment. To detect such incorrect mappings, we manually matched semantically equivalent top-level classes of ontology pairs. AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS respectively. Using vote-based consensus alignments increased performance across the board. Evaluation with manually created top-level hierarchy mappings revealed that on average 90% of the mappings' classes belonged to top-level classes that matched.</p><p><strong>Conclusions: </strong>Our findings show that the included ontology matching systems automatically produced mappings that were modestly accurate according to our evaluation. The hierarchical analysis of mappings seems promising when no reference alignments are available. All in all, the systems show potential to be implemented as part of an ontology matching service for querying FAIR data. Future research should focus on developing methods for the evaluation of mappings used in such mapping services, leading to their implementation in a FAIR data ecosystem.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"19"},"PeriodicalIF":1.9,"publicationDate":"2022-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9284868/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40597376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting document graphs for inter sentence relation extraction","authors":"Hoang-Quynh Le, Duy-Cat Can, Nigel Collier","doi":"10.1186/s13326-022-00267-3","DOIUrl":"https://doi.org/10.1186/s13326-022-00267-3","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65845317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanchez-Graillet, Olivia, Witte, Christian, Grimm, Frank, Grautoff, Steffen, Ell, Basil, Cimiano, Philipp
{"title":"Synthesizing evidence from clinical trials with dynamic interactive argument trees","authors":"Sanchez-Graillet, Olivia, Witte, Christian, Grimm, Frank, Grautoff, Steffen, Ell, Basil, Cimiano, Philipp","doi":"10.1186/s13326-022-00270-8","DOIUrl":"https://doi.org/10.1186/s13326-022-00270-8","url":null,"abstract":"Evidence-based medicine propagates that medical/clinical decisions are made by taking into account high-quality evidence, most notably in the form of randomized clinical trials. Evidence-based decision-making requires aggregating the evidence available in multiple trials to reach –by means of systematic reviews– a conclusive recommendation on which treatment is best suited for a given patient population. However, it is challenging to produce systematic reviews to keep up with the ever-growing number of published clinical trials. Therefore, new computational approaches are necessary to support the creation of systematic reviews that include the most up-to-date evidence.We propose a method to synthesize the evidence available in clinical trials in an ad-hoc and on-demand manner by automatically arranging such evidence in the form of a hierarchical argument that recommends a therapy as being superior to some other therapy along a number of key dimensions corresponding to the clinical endpoints of interest. The method has also been implemented as a web tool that allows users to explore the effects of excluding different points of evidence, and indicating relative preferences on the endpoints. Through two use cases, our method was shown to be able to generate conclusions similar to the ones of published systematic reviews. To evaluate our method implemented as a web tool, we carried out a survey and usability analysis with medical professionals. The results show that the tool was perceived as being valuable, acknowledging its potential to inform clinical decision-making and to complement the information from existing medical guidelines. The method presented is a simple but yet effective argumentation-based method that contributes to support the synthesis of clinical trial evidence. A current limitation of the method is that it relies on a manually populated knowledge base. This problem could be alleviated by deploying natural language processing methods to extract the relevant information from publications.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"22 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivia Sanchez-Graillet, Christian Witte, Frank Grimm, P. Cimiano
{"title":"An annotated corpus of clinical trial publications supporting schema-based relational information extraction","authors":"Olivia Sanchez-Graillet, Christian Witte, Frank Grimm, P. Cimiano","doi":"10.1186/s13326-022-00271-7","DOIUrl":"https://doi.org/10.1186/s13326-022-00271-7","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43016385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas Emanuel Silva E Oliveira, Ana Carolina Peters, Adalniza Moura Pucca da Silva, Caroline Pilatti Gebeluca, Yohan Bonescki Gumiel, Lilian Mie Mukai Cintho, Deborah Ribeiro Carvalho, Sadid Al Hasan, Claudia Maria Cabral Moro
{"title":"SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks.","authors":"Lucas Emanuel Silva E Oliveira, Ana Carolina Peters, Adalniza Moura Pucca da Silva, Caroline Pilatti Gebeluca, Yohan Bonescki Gumiel, Lilian Mie Mukai Cintho, Deborah Ribeiro Carvalho, Sadid Al Hasan, Claudia Maria Cabral Moro","doi":"10.1186/s13326-022-00269-1","DOIUrl":"https://doi.org/10.1186/s13326-022-00269-1","url":null,"abstract":"<p><strong>Background: </strong>The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field.</p><p><strong>Methods: </strong>In this study, a semantically annotated corpus was developed using clinical text from multiple medical specialties, document types, and institutions. In addition, we present, (1) a survey listing common aspects, differences, and lessons learned from previous research, (2) a fine-grained annotation schema that can be replicated to guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations.</p><p><strong>Results: </strong>This study resulted in SemClinBr, a corpus that has 1000 clinical notes, labeled with 65,117 entities and 11,263 relations. In addition, both negation cues and medical abbreviation dictionaries were generated from the annotations. The average annotator agreement score varied from 0.71 (applying strict match) to 0.92 (considering a relaxed match) while accepting partial overlaps and hierarchically related semantic types. The extrinsic evaluation, when applying the corpus to two downstream NLP tasks, demonstrated the reliability and usefulness of annotations, with the systems achieving results that were consistent with the agreement scores.</p><p><strong>Conclusion: </strong>The SemClinBr corpus and other resources produced in this work can support clinical NLP studies, providing a common development and evaluation resource for the research community, boosting the utilization of EHRs in both clinical practice and biomedical research. To the best of our knowledge, SemClinBr is the first available Portuguese clinical corpus.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"13 1","pages":"13"},"PeriodicalIF":1.9,"publicationDate":"2022-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9080187/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10252310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Queralt-Rosinach, Núria, Kaliyaperumal, Rajaram, Bernabé, César H., Long, Qinqin, Joosten, Simone A., van der Wijk, Henk Jan, Flikkenschild, Erik L.A., Burger, Kees, Jacobsen, Annika, Mons, Barend, Roos, Marco
{"title":"Applying the FAIR principles to data in a hospital: challenges and opportunities in a pandemic","authors":"Queralt-Rosinach, Núria, Kaliyaperumal, Rajaram, Bernabé, César H., Long, Qinqin, Joosten, Simone A., van der Wijk, Henk Jan, Flikkenschild, Erik L.A., Burger, Kees, Jacobsen, Annika, Mons, Barend, Roos, Marco","doi":"10.1186/s13326-022-00263-7","DOIUrl":"https://doi.org/10.1186/s13326-022-00263-7","url":null,"abstract":"The COVID-19 pandemic has challenged healthcare systems and research worldwide. Data is collected all over the world and needs to be integrated and made available to other researchers quickly. However, the various heterogeneous information systems that are used in hospitals can result in fragmentation of health data over multiple data ‘silos’ that are not interoperable for analysis. Consequently, clinical observations in hospitalised patients are not prepared to be reused efficiently and timely. There is a need to adapt the research data management in hospitals to make COVID-19 observational patient data machine actionable, i.e. more Findable, Accessible, Interoperable and Reusable (FAIR) for humans and machines. We therefore applied the FAIR principles in the hospital to make patient data more FAIR. In this paper, we present our FAIR approach to transform COVID-19 observational patient data collected in the hospital into machine actionable digital objects to answer medical doctors’ research questions. With this objective, we conducted a coordinated FAIRification among stakeholders based on ontological models for data and metadata, and a FAIR based architecture that complements the existing data management. We applied FAIR Data Points for metadata exposure, turning investigational parameters into a FAIR dataset. We demonstrated that this dataset is machine actionable by means of three different computational activities: federated query of patient data along open existing knowledge sources across the world through the Semantic Web, implementing Web APIs for data query interoperability, and building applications on top of these FAIR patient data for FAIR data analytics in the hospital. Our work demonstrates that a FAIR research data management plan based on ontological models for data and metadata, open Science, Semantic Web technologies, and FAIR Data Points is providing data infrastructure in the hospital for machine actionable FAIR Digital Objects. This FAIR data is prepared to be reused for federated analysis, linkable to other FAIR data such as Linked Open Data, and reusable to develop software applications on top of them for hypothesis generation and knowledge discovery.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"90 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining health data elements under the HL7 development framework for metadata management","authors":"Yang, Zhe, Jiang, Kun, Lou, Miaomiao, Gong, Yang, Zhang, Lili, Liu, Jing, Bao, Xinyu, Liu, Danhong, Yang, Peng","doi":"10.1186/s13326-022-00265-5","DOIUrl":"https://doi.org/10.1186/s13326-022-00265-5","url":null,"abstract":"Health data from different specialties or domains generallly have diverse formats and meanings, which can cause semantic communication barriers when these data are exchanged among heterogeneous systems. As such, this study is intended to develop a national health concept data model (HCDM) and develop a corresponding system to facilitate healthcare data standardization and centralized metadata management. Based on 55 data sets (4640 data items) from 7 health business domains in China, a bottom-up approach was employed to build the structure and metadata for HCDM by referencing HL7 RIM. According to ISO/IEC 11179, a top-down approach was used to develop and standardize the data elements. HCDM adopted three-level architecture of class, attribute and data type, and consisted of 6 classes and 15 sub-classes. Each class had a set of descriptive attributes and every attribute was assigned a data type. 100 initial data elements (DEs) were extracted from HCDM and 144 general DEs were derived from corresponding initial DEs. Domain DEs were transformed by specializing general DEs using 12 controlled vocabularies which developed from HL7 vocabularies and actual health demands. A model-based system was successfully established to evaluate and manage the NHDD. HCDM provided a unified metadata reference for multi-source data standardization and management. This approach of defining health data elements was a feasible solution in healthcare information standardization to enable healthcare interoperability in China.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"24 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaliyaperumal, Rajaram, Wilkinson, Mark D., Moreno, Pablo Alarcón, Benis, Nirupama, Cornet, Ronald, dos Santos Vieira, Bruna, Dumontier, Michel, Bernabé, César Henrique, Jacobsen, Annika, Le Cornec, Clémence M. A., Godoy, Mario Prieto, Queralt-Rosinach, Núria, Schultze Kool, Leo J., Swertz, Morris A., van Damme, Philip, van der Velde, K. Joeri, Lalout, Nawel, Zhang, Shuxin, Roos, Marco
{"title":"Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data","authors":"Kaliyaperumal, Rajaram, Wilkinson, Mark D., Moreno, Pablo Alarcón, Benis, Nirupama, Cornet, Ronald, dos Santos Vieira, Bruna, Dumontier, Michel, Bernabé, César Henrique, Jacobsen, Annika, Le Cornec, Clémence M. A., Godoy, Mario Prieto, Queralt-Rosinach, Núria, Schultze Kool, Leo J., Swertz, Morris A., van Damme, Philip, van der Velde, K. Joeri, Lalout, Nawel, Zhang, Shuxin, Roos, Marco","doi":"10.1186/s13326-022-00264-6","DOIUrl":"https://doi.org/10.1186/s13326-022-00264-6","url":null,"abstract":"The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"32 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138538468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}