{"title":"COVID-19科学论文的命名实体识别","authors":"A. Dao, Akiko Aizawa, Yuji Matsumoto","doi":"10.1145/3582768.3582786","DOIUrl":null,"url":null,"abstract":"Text mining techniques, especially named entity recognition (NER), play a vital role in supporting researchers for keeping track of hundred thousand of papers on COVID-19 related literature. Although a few research has been performed NER on COVID-19 scientific papers, very little is currently known concerning the behaviors of current entity recognition models in this new domain. Therefore, this ongoing study attempts to analyze current NER models’ performance and limitations on the CORD-19 dataset. By examining three NER models, this study showed that NER performance is improved with the similarity between the testing and pretraining data. When there are little manually annotated resources for COVID-19 NER exist, our analysis suggested that for training purposes, enhancing the dictionary for seed annotation is effective (not necessarily requiring costly human annotation).","PeriodicalId":315721,"journal":{"name":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Named Entity Recognition on COVID-19 Scientific Papers\",\"authors\":\"A. Dao, Akiko Aizawa, Yuji Matsumoto\",\"doi\":\"10.1145/3582768.3582786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining techniques, especially named entity recognition (NER), play a vital role in supporting researchers for keeping track of hundred thousand of papers on COVID-19 related literature. Although a few research has been performed NER on COVID-19 scientific papers, very little is currently known concerning the behaviors of current entity recognition models in this new domain. Therefore, this ongoing study attempts to analyze current NER models’ performance and limitations on the CORD-19 dataset. By examining three NER models, this study showed that NER performance is improved with the similarity between the testing and pretraining data. When there are little manually annotated resources for COVID-19 NER exist, our analysis suggested that for training purposes, enhancing the dictionary for seed annotation is effective (not necessarily requiring costly human annotation).\",\"PeriodicalId\":315721,\"journal\":{\"name\":\"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3582768.3582786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582768.3582786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Named Entity Recognition on COVID-19 Scientific Papers
Text mining techniques, especially named entity recognition (NER), play a vital role in supporting researchers for keeping track of hundred thousand of papers on COVID-19 related literature. Although a few research has been performed NER on COVID-19 scientific papers, very little is currently known concerning the behaviors of current entity recognition models in this new domain. Therefore, this ongoing study attempts to analyze current NER models’ performance and limitations on the CORD-19 dataset. By examining three NER models, this study showed that NER performance is improved with the similarity between the testing and pretraining data. When there are little manually annotated resources for COVID-19 NER exist, our analysis suggested that for training purposes, enhancing the dictionary for seed annotation is effective (not necessarily requiring costly human annotation).