IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jin Zhang, Ziyue Zhang, Lobsang Yeshi, Dorje Tashi, Xiangshi Wang, Yuqing Cai, Yongbin Yu, Xiangxiang Wang, Nyima Tashi, Gadeng Luosang
{"title":"Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer","authors":"Jin Zhang,&nbsp;Ziyue Zhang,&nbsp;Lobsang Yeshi,&nbsp;Dorje Tashi,&nbsp;Xiangshi Wang,&nbsp;Yuqing Cai,&nbsp;Yongbin Yu,&nbsp;Xiangxiang Wang,&nbsp;Nyima Tashi,&nbsp;Gadeng Luosang","doi":"10.1049/cit2.70029","DOIUrl":null,"url":null,"abstract":"<p>Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information, failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi-level and multi-granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1148-1158"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70029","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70029","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种改进的嵌入表示方法——音节-词-句嵌入。通过利用不同粒度的特征,利用无尺度点积关注集中关键特征进行特征融合,将音节-词-句嵌入融入到特征表示中,增强了特征表示的专一性和多样性。我们在不同领域的数据集上评估了我们提出的模型。59%,与香草FLAT相比提高了1.24%。39%,比vanilla FLAT提高了1.34%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer

Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer

Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer

Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer

Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information, failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi-level and multi-granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CAAI Transactions on Intelligence Technology
CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
11.00
自引率
3.90%
发文量
134
审稿时长
35 weeks
期刊介绍: CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信