Mufei Li, Yan Zhuang, Ke Chen, Lin Han, Xiangfeng Li, Yongtao wei, Xiangdong Zhu, Mingli Yang, Guangfu Yin, Jiangli Lin, Xingdong Zhang
{"title":"Enhancing named entity recognition with a novel BERT-BiLSTM-CRF-RC joint training model for biomedical materials database","authors":"Mufei Li, Yan Zhuang, Ke Chen, Lin Han, Xiangfeng Li, Yongtao wei, Xiangdong Zhu, Mingli Yang, Guangfu Yin, Jiangli Lin, Xingdong Zhang","doi":"10.1002/mgea.70001","DOIUrl":null,"url":null,"abstract":"<p>In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT-BiLSTM-CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named entities due to their lack of specialized vocabulary. Our model addresses these issues by integrating an RC mechanism, which helps refine fragmented results by enabling the model to more precisely identify entity boundaries without relying on an expert-annotated dictionary. Additionally, segmentation issues are further mitigated through a segmented combined voting- and positive-sample-coverage technique. We applied this model to develop a database for mesoporous bioactive glass (MBG). Furthermore, a classifier was developed to automatically detect the presence of pertinent information within paragraphs. For this study, 200 articles were searched using MBG-related keywords, and the data were split into a training set and a test set in a 9:1 ratio. A total of 492 paragraphs were automatically extracted for training, and 50 paragraphs were extracted for testing the model. The results demonstrate that our joint training model achieves an accuracy of 92.8% in named entity recognition, which is 4.3% higher than the 88.5% accuracy of the traditional BERT-BiLSTM-CRF model.</p>","PeriodicalId":100889,"journal":{"name":"Materials Genome Engineering Advances","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mgea.70001","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Genome Engineering Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mgea.70001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, we propose a novel joint training model for named entity recognition (NER) that combines BERT, BiLSTM, CRF, and a reading comprehension (RC) mechanism. Traditional BERT-BiLSTM-CRF models often struggle with inaccurate boundary detection and excessive fragmentation of named entities due to their lack of specialized vocabulary. Our model addresses these issues by integrating an RC mechanism, which helps refine fragmented results by enabling the model to more precisely identify entity boundaries without relying on an expert-annotated dictionary. Additionally, segmentation issues are further mitigated through a segmented combined voting- and positive-sample-coverage technique. We applied this model to develop a database for mesoporous bioactive glass (MBG). Furthermore, a classifier was developed to automatically detect the presence of pertinent information within paragraphs. For this study, 200 articles were searched using MBG-related keywords, and the data were split into a training set and a test set in a 9:1 ratio. A total of 492 paragraphs were automatically extracted for training, and 50 paragraphs were extracted for testing the model. The results demonstrate that our joint training model achieves an accuracy of 92.8% in named entity recognition, which is 4.3% higher than the 88.5% accuracy of the traditional BERT-BiLSTM-CRF model.