João Vitor Andrioli de Souza, Yohan Bonescki Gumiel, Lucas E. S. Oliveira, C. Moro
{"title":"基于条件随机场和语义组的临床葡萄牙语语料库命名实体识别","authors":"João Vitor Andrioli de Souza, Yohan Bonescki Gumiel, Lucas E. S. Oliveira, C. Moro","doi":"10.5753/SBCAS.2019.6269","DOIUrl":null,"url":null,"abstract":"Considering the difficulties of extracting entities from Electronic Health Records (EHR) texts in Portuguese, we explore the Conditional Random Fields (CRF) algorithm to build a Named Entity Recognition (NER) system based on a corpus of clinical Portuguese data annotated by experts. We acquaint the challenges and methods to classify Abbreviations, Disorders, Procedures and Chemicals within the texts. By selecting a meaningful set of features, and parameters with the best performance the results demonstrate that the method is promising and may support other biomedical tasks, nonetheless, further experiments with more features, different architectures and sophisticated preprocessing steps are needed.","PeriodicalId":229405,"journal":{"name":"Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2019)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Named Entity Recognition for Clinical Portuguese Corpus with Conditional Random Fields and Semantic Groups\",\"authors\":\"João Vitor Andrioli de Souza, Yohan Bonescki Gumiel, Lucas E. S. Oliveira, C. Moro\",\"doi\":\"10.5753/SBCAS.2019.6269\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Considering the difficulties of extracting entities from Electronic Health Records (EHR) texts in Portuguese, we explore the Conditional Random Fields (CRF) algorithm to build a Named Entity Recognition (NER) system based on a corpus of clinical Portuguese data annotated by experts. We acquaint the challenges and methods to classify Abbreviations, Disorders, Procedures and Chemicals within the texts. By selecting a meaningful set of features, and parameters with the best performance the results demonstrate that the method is promising and may support other biomedical tasks, nonetheless, further experiments with more features, different architectures and sophisticated preprocessing steps are needed.\",\"PeriodicalId\":229405,\"journal\":{\"name\":\"Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2019)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2019)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/SBCAS.2019.6269\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS 2019)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/SBCAS.2019.6269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Named Entity Recognition for Clinical Portuguese Corpus with Conditional Random Fields and Semantic Groups
Considering the difficulties of extracting entities from Electronic Health Records (EHR) texts in Portuguese, we explore the Conditional Random Fields (CRF) algorithm to build a Named Entity Recognition (NER) system based on a corpus of clinical Portuguese data annotated by experts. We acquaint the challenges and methods to classify Abbreviations, Disorders, Procedures and Chemicals within the texts. By selecting a meaningful set of features, and parameters with the best performance the results demonstrate that the method is promising and may support other biomedical tasks, nonetheless, further experiments with more features, different architectures and sophisticated preprocessing steps are needed.