O. S. Pabón, Alberto Blázquez-Herranz, M. Torrente, A. R. González, M. Provencio, Ernestina Menasalvas Ruiz
{"title":"Extracting Cancer Treatments from Clinical Text written in Spanish: A Deep Learning Approach","authors":"O. S. Pabón, Alberto Blázquez-Herranz, M. Torrente, A. R. González, M. Provencio, Ernestina Menasalvas Ruiz","doi":"10.1109/DSAA53316.2021.9564137","DOIUrl":null,"url":null,"abstract":"Extracting accurate information about cancer patients' treatments is crucial to support clinical research, treatment planning, and to improve clinical care outcomes. However, treatment information resides in unstructured clinical text, making the task of data structuring especially challenging. Although several approaches have been proposed to extract treatments from clinical text, most of these proposals have focused on the English language. In this paper, we propose a deep learning-based approach to extract cancer treatments from clinical text written in Spanish. This approach uses a Bidirectional Long Short Memory (BiLSTM) neural net with a CRF layer to perform Named Entity Recognition. An annotated corpus from clinical text written about lung cancer patients is used to train the BiLSTM-based model. Performed tests have shown a performance of 90% in the F1-score, suggesting the feasibility of our approach to extract cancer treatments from clinical narratives.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA53316.2021.9564137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Extracting accurate information about cancer patients' treatments is crucial to support clinical research, treatment planning, and to improve clinical care outcomes. However, treatment information resides in unstructured clinical text, making the task of data structuring especially challenging. Although several approaches have been proposed to extract treatments from clinical text, most of these proposals have focused on the English language. In this paper, we propose a deep learning-based approach to extract cancer treatments from clinical text written in Spanish. This approach uses a Bidirectional Long Short Memory (BiLSTM) neural net with a CRF layer to perform Named Entity Recognition. An annotated corpus from clinical text written about lung cancer patients is used to train the BiLSTM-based model. Performed tests have shown a performance of 90% in the F1-score, suggesting the feasibility of our approach to extract cancer treatments from clinical narratives.