生物医学命名实体识别使用改进的绿色蟒蛇辅助Bi-GRU-based分层ResNet模型。

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-01-30 DOI:10.1186/s12859-024-06008-w

Ram Chandra Bhushan, Rakesh Kumar Donthi, Yojitha Chilukuri, Ulligaddala Srinivasarao, Polisetty Swetha

{"title":"生物医学命名实体识别使用改进的绿色蟒蛇辅助Bi-GRU-based分层ResNet模型。","authors":"Ram Chandra Bhushan, Rakesh Kumar Donthi, Yojitha Chilukuri, Ulligaddala Srinivasarao, Polisetty Swetha","doi":"10.1186/s12859-024-06008-w","DOIUrl":null,"url":null,"abstract":"Background: Biomedical text mining is a technique that extracts essential information from scientific articles using named entity recognition (NER). Traditional NER methods rely on dictionaries, rules, or curated corpora, which may not always be accessible. To overcome these challenges, deep learning (DL) methods have emerged. However, DL-based NER methods may need help identifying long-distance relationships within text and require significant annotated datasets.Results: This research has proposed a novel model to address the challenges in natural language processing. The Improved Green anaconda-assisted Bi-GRU based Hierarchical ResNet BNER model (IGa-BiHR BNERM) is the model. IGa-BiHR BNERM model has shown promising results in accurately identifying named entities. The MACCROBAT dataset was obtained from Kaggle and underwent several pre-processing steps such as Stop Word Filtering, WordNet processing, Removal of non-alphanumeric characters, stemming Segmentation, and Tokenization, which is standardized and improves its quality. The pre-processed text was fed into a feature extraction model like the Robustly Optimized BERT -Whole Word Masking model. This model provides word embeddings with semantic information. Then, the BNER process utilized an Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM).Conclusion: To improve the training phase of the IGa-BiHR BNERM, the Improved Green Anaconda Optimization technique was used to select optimal weight parameter coefficients for training the model parameters. After the model was tested using the MACCROBAT dataset, it outperformed previous models with a tremendous accuracy rate of 99.11%. This model effectively and accurately identifies biomedical names within the text, significantly advancing this field.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"34"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11780922/pdf/","citationCount":"0","resultStr":"{\"title\":\"Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.\",\"authors\":\"Ram Chandra Bhushan, Rakesh Kumar Donthi, Yojitha Chilukuri, Ulligaddala Srinivasarao, Polisetty Swetha\",\"doi\":\"10.1186/s12859-024-06008-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Biomedical text mining is a technique that extracts essential information from scientific articles using named entity recognition (NER). Traditional NER methods rely on dictionaries, rules, or curated corpora, which may not always be accessible. To overcome these challenges, deep learning (DL) methods have emerged. However, DL-based NER methods may need help identifying long-distance relationships within text and require significant annotated datasets.Results: This research has proposed a novel model to address the challenges in natural language processing. The Improved Green anaconda-assisted Bi-GRU based Hierarchical ResNet BNER model (IGa-BiHR BNERM) is the model. IGa-BiHR BNERM model has shown promising results in accurately identifying named entities. The MACCROBAT dataset was obtained from Kaggle and underwent several pre-processing steps such as Stop Word Filtering, WordNet processing, Removal of non-alphanumeric characters, stemming Segmentation, and Tokenization, which is standardized and improves its quality. The pre-processed text was fed into a feature extraction model like the Robustly Optimized BERT -Whole Word Masking model. This model provides word embeddings with semantic information. Then, the BNER process utilized an Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM).Conclusion: To improve the training phase of the IGa-BiHR BNERM, the Improved Green Anaconda Optimization technique was used to select optimal weight parameter coefficients for training the model parameters. After the model was tested using the MACCROBAT dataset, it outperformed previous models with a tremendous accuracy rate of 99.11%. This model effectively and accurately identifies biomedical names within the text, significantly advancing this field.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"34\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11780922/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-024-06008-w\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-06008-w","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：生物医学文本挖掘是一种利用命名实体识别（NER）从科学文章中提取重要信息的技术。传统的NER方法依赖于字典、规则或策划的语料库，这些语料库可能并不总是可访问的。为了克服这些挑战，深度学习（DL）方法应运而生。然而，基于dl的NER方法可能需要帮助识别文本中的远距离关系，并且需要大量带注释的数据集。结果：本研究提出了一种新的模型来解决自然语言处理中的挑战。基于改进的绿色蟒蛇辅助Bi-GRU的分层ResNet BNER模型（IGa-BiHR BNERM）是该模型。IGa-BiHR BNERM模型在准确识别命名实体方面显示出良好的结果。MACCROBAT数据集从Kaggle中获取，经过停止词过滤、WordNet处理、去除非字母数字字符、词干分割和Tokenization等预处理步骤，标准化并提高了数据质量。预处理后的文本被输入到一个特征提取模型中，如鲁棒优化BERT -全词掩蔽模型。该模型提供带有语义信息的词嵌入。然后，BNER过程利用改进的绿色蟒蛇辅助Bi-GRU-based分层ResNet BNER模型（IGa-BiHR BNERM）。结论：为了改善IGa-BiHR BNERM的训练阶段，采用改进的绿蟒蛇优化技术选择最优的权重参数系数进行模型参数的训练。在使用MACCROBAT数据集进行测试后，该模型的准确率达到了99.11%，大大优于之前的模型。该模型有效准确地识别文本中的生物医学名称，显著推进了该领域的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.

Background: Biomedical text mining is a technique that extracts essential information from scientific articles using named entity recognition (NER). Traditional NER methods rely on dictionaries, rules, or curated corpora, which may not always be accessible. To overcome these challenges, deep learning (DL) methods have emerged. However, DL-based NER methods may need help identifying long-distance relationships within text and require significant annotated datasets.

Results: This research has proposed a novel model to address the challenges in natural language processing. The Improved Green anaconda-assisted Bi-GRU based Hierarchical ResNet BNER model (IGa-BiHR BNERM) is the model. IGa-BiHR BNERM model has shown promising results in accurately identifying named entities. The MACCROBAT dataset was obtained from Kaggle and underwent several pre-processing steps such as Stop Word Filtering, WordNet processing, Removal of non-alphanumeric characters, stemming Segmentation, and Tokenization, which is standardized and improves its quality. The pre-processed text was fed into a feature extraction model like the Robustly Optimized BERT -Whole Word Masking model. This model provides word embeddings with semantic information. Then, the BNER process utilized an Improved Green Anaconda-assisted Bi-GRU-based Hierarchical ResNet BNER model (IGa-BiHR BNERM).

Conclusion: To improve the training phase of the IGa-BiHR BNERM, the Improved Green Anaconda Optimization technique was used to select optimal weight parameter coefficients for training the model parameters. After the model was tested using the MACCROBAT dataset, it outperformed previous models with a tremendous accuracy rate of 99.11%. This model effectively and accurately identifies biomedical names within the text, significantly advancing this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.