{"title":"Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer","authors":"Jin Zhang, Ziyue Zhang, Lobsang Yeshi, Dorje Tashi, Xiangshi Wang, Yuqing Cai, Yongbin Yu, Xiangxiang Wang, Nyima Tashi, Gadeng Luosang","doi":"10.1049/cit2.70029","DOIUrl":null,"url":null,"abstract":"<p>Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information, failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi-level and multi-granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1148-1158"},"PeriodicalIF":7.3000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70029","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cit2.70029","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information, failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi-level and multi-granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.