LFF-POS:一种处理低资源词性标注中词汇外词的语言融合方法

IF 1.9 Q2 MULTIDISCIPLINARY SCIENCES
MethodsX Pub Date : 2025-09-10 DOI:10.1016/j.mex.2025.103615
Muhammad Alfian , Umi Laili Yuhana , Daniel Siahaan , Harum Munazharoh , Eric Pardede
{"title":"LFF-POS:一种处理低资源词性标注中词汇外词的语言融合方法","authors":"Muhammad Alfian ,&nbsp;Umi Laili Yuhana ,&nbsp;Daniel Siahaan ,&nbsp;Harum Munazharoh ,&nbsp;Eric Pardede","doi":"10.1016/j.mex.2025.103615","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate part-of-speech (POS) tagging is needed for classroom learning evaluation in order to improve the quality of education. However, accurate POS tagging is hampered by the limited amount of training data and the high proportion of out-of-vocabulary (OOV) tokens. We present LFF-POS, a linguistic feature fusion method that overcomes these limitations for Indonesian. The procedure consists of four sequential steps: (1) tokenizing raw text; (2) extracting three complementary features; (3) merging the resulting vectors; (4) applying self-attention; and (4) training a BiLSTM sequence labeler. By combining the three features, LFF-POS improves tagging accuracy without relying on an external lexicon. Experimental results show that the combined features are able to improve the proposed model's ability to handle OOV words and achieve higher POS Tagging accuracy compared to baseline and existing methods.</div><div>OOV cannot be recognized by the model, thus reducing the accuracy of the POS Tagging model</div><div>This study aims to overcome OOV by combining linguistic features such as orthography, morphology, and characters to improve word representation</div><div>The LFF-POS has been proven to improve POS Tagging performance, especially OOV F1 Score by ±14% over baseline.</div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103615"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LFF-POS: A linguistic fusion method to handle out-of-vocabulary words in low-resource part-of-speech tagging\",\"authors\":\"Muhammad Alfian ,&nbsp;Umi Laili Yuhana ,&nbsp;Daniel Siahaan ,&nbsp;Harum Munazharoh ,&nbsp;Eric Pardede\",\"doi\":\"10.1016/j.mex.2025.103615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate part-of-speech (POS) tagging is needed for classroom learning evaluation in order to improve the quality of education. However, accurate POS tagging is hampered by the limited amount of training data and the high proportion of out-of-vocabulary (OOV) tokens. We present LFF-POS, a linguistic feature fusion method that overcomes these limitations for Indonesian. The procedure consists of four sequential steps: (1) tokenizing raw text; (2) extracting three complementary features; (3) merging the resulting vectors; (4) applying self-attention; and (4) training a BiLSTM sequence labeler. By combining the three features, LFF-POS improves tagging accuracy without relying on an external lexicon. Experimental results show that the combined features are able to improve the proposed model's ability to handle OOV words and achieve higher POS Tagging accuracy compared to baseline and existing methods.</div><div>OOV cannot be recognized by the model, thus reducing the accuracy of the POS Tagging model</div><div>This study aims to overcome OOV by combining linguistic features such as orthography, morphology, and characters to improve word representation</div><div>The LFF-POS has been proven to improve POS Tagging performance, especially OOV F1 Score by ±14% over baseline.</div></div>\",\"PeriodicalId\":18446,\"journal\":{\"name\":\"MethodsX\",\"volume\":\"15 \",\"pages\":\"Article 103615\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MethodsX\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215016125004595\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215016125004595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

为了提高教学质量,课堂学习评价需要准确的词性标注。然而,准确的词性标注受到训练数据数量有限和词汇外(OOV)标记比例高的阻碍。我们提出了LFF-POS,一种语言特征融合方法,克服了印尼语的这些限制。该过程由四个连续步骤组成:(1)对原始文本进行标记;(2)提取三个互补特征;(3)合并得到的矢量;(4)自我关注;(4)训练一个BiLSTM序列标记器。通过结合这三个特性,LFF-POS在不依赖外部词典的情况下提高了标注准确性。实验结果表明,与基线和现有方法相比,组合特征能够提高模型对OOV词的处理能力,并获得更高的词性标注精度。该研究旨在通过结合语言特征(如正字法、形态学和字符)来改善单词表示来克服OOV。LFF-POS已被证明可以提高POS标注性能,特别是OOV F1得分比基线提高了±14%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

LFF-POS: A linguistic fusion method to handle out-of-vocabulary words in low-resource part-of-speech tagging

LFF-POS: A linguistic fusion method to handle out-of-vocabulary words in low-resource part-of-speech tagging
Accurate part-of-speech (POS) tagging is needed for classroom learning evaluation in order to improve the quality of education. However, accurate POS tagging is hampered by the limited amount of training data and the high proportion of out-of-vocabulary (OOV) tokens. We present LFF-POS, a linguistic feature fusion method that overcomes these limitations for Indonesian. The procedure consists of four sequential steps: (1) tokenizing raw text; (2) extracting three complementary features; (3) merging the resulting vectors; (4) applying self-attention; and (4) training a BiLSTM sequence labeler. By combining the three features, LFF-POS improves tagging accuracy without relying on an external lexicon. Experimental results show that the combined features are able to improve the proposed model's ability to handle OOV words and achieve higher POS Tagging accuracy compared to baseline and existing methods.
OOV cannot be recognized by the model, thus reducing the accuracy of the POS Tagging model
This study aims to overcome OOV by combining linguistic features such as orthography, morphology, and characters to improve word representation
The LFF-POS has been proven to improve POS Tagging performance, especially OOV F1 Score by ±14% over baseline.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
MethodsX
MethodsX Health Professions-Medical Laboratory Technology
CiteScore
3.60
自引率
5.30%
发文量
314
审稿时长
7 weeks
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信