印尼语文本信息提取的语义角色标注:文献综述

Amelia Devi Putri Ariyanto, C. Fatichah, Diana Purwitasari
{"title":"印尼语文本信息提取的语义角色标注:文献综述","authors":"Amelia Devi Putri Ariyanto, C. Fatichah, Diana Purwitasari","doi":"10.1109/ISITIA59021.2023.10221008","DOIUrl":null,"url":null,"abstract":"The information extraction process includes Semantic Role Labeling (SRL) as one of its sub-tasks. SRL aims to determine the semantic role of each entity within a sentence by examining the meaning of the predicate. This helps construct the sentence structure by identifying the relationships between predicates and their corresponding arguments. SRL development is less common than Named Entity Recognition (NER) for information extraction because SRL annotation process is complicated, and labeling results are sometimes ambiguous. In event extraction problem, the use of NER alone is insufficient. Identifying location entities generated by NER is still inaccurate because geographic coordinates indicate locations irrelevant to actual events. On the other hand, SRL can detect locations precisely and in depth according to actual events. Even though the annotation process is complicated, the SRL can be adjusted according to the required domain and its ontology so that SRL can extract location entities down to the event level.. This research aims to offer a comprehensive analysis concerning the advancement of Semantic Role Labeling (SRL) for extracting information from Indonesian texts. Indonesian is a low-resource language with a different character from English and only has very little literature, so it is interesting to study. The papers used for the review process came from IEEE, Science Direct, and Google Scholar from 2013 to 2023, and 15 papers were found that matched the research objectives. The study results show that most papers use Indonesian-language news articles as their dataset because they use formal language, which usually has a good language structure. The methods used in SRLs are mostly rule-based. A weakness of the rule-based development method is that the rules are very dependent on a particular language or problem domain. Thus, further work can use a transformer-based deep learning approach to perform SRL on Indonesian-language texts.","PeriodicalId":116682,"journal":{"name":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Role Labeling for Information Extraction on Indonesian Texts: A Literature Review\",\"authors\":\"Amelia Devi Putri Ariyanto, C. Fatichah, Diana Purwitasari\",\"doi\":\"10.1109/ISITIA59021.2023.10221008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The information extraction process includes Semantic Role Labeling (SRL) as one of its sub-tasks. SRL aims to determine the semantic role of each entity within a sentence by examining the meaning of the predicate. This helps construct the sentence structure by identifying the relationships between predicates and their corresponding arguments. SRL development is less common than Named Entity Recognition (NER) for information extraction because SRL annotation process is complicated, and labeling results are sometimes ambiguous. In event extraction problem, the use of NER alone is insufficient. Identifying location entities generated by NER is still inaccurate because geographic coordinates indicate locations irrelevant to actual events. On the other hand, SRL can detect locations precisely and in depth according to actual events. Even though the annotation process is complicated, the SRL can be adjusted according to the required domain and its ontology so that SRL can extract location entities down to the event level.. This research aims to offer a comprehensive analysis concerning the advancement of Semantic Role Labeling (SRL) for extracting information from Indonesian texts. Indonesian is a low-resource language with a different character from English and only has very little literature, so it is interesting to study. The papers used for the review process came from IEEE, Science Direct, and Google Scholar from 2013 to 2023, and 15 papers were found that matched the research objectives. The study results show that most papers use Indonesian-language news articles as their dataset because they use formal language, which usually has a good language structure. The methods used in SRLs are mostly rule-based. A weakness of the rule-based development method is that the rules are very dependent on a particular language or problem domain. Thus, further work can use a transformer-based deep learning approach to perform SRL on Indonesian-language texts.\",\"PeriodicalId\":116682,\"journal\":{\"name\":\"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISITIA59021.2023.10221008\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Seminar on Intelligent Technology and Its Applications (ISITIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISITIA59021.2023.10221008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

信息提取过程包括语义角色标注(SRL)作为其子任务之一。SRL旨在通过检查谓词的含义来确定句子中每个实体的语义角色。这有助于通过识别谓词和它们对应的参数之间的关系来构建句子结构。在信息提取方面,SRL开发不如命名实体识别(NER)常见,因为SRL注释过程复杂,标注结果有时模棱两可。在事件提取问题中,仅使用NER是不够的。识别由NER生成的位置实体仍然不准确,因为地理坐标指示的位置与实际事件无关。另一方面,SRL可以根据实际事件精确而深入地探测位置。尽管注释过程很复杂,但是可以根据需要的领域及其本体对SRL进行调整,这样SRL就可以提取到事件级别的位置实体。本研究旨在全面分析语义角色标注(SRL)在印尼语文本信息提取中的进展。印尼语是一种资源匮乏的语言,与英语的特点不同,文献也很少,所以学习印尼语很有趣。评审过程中使用的论文来自2013 - 2023年的IEEE、Science Direct和Google Scholar,共发现符合研究目标的论文15篇。研究结果表明,大多数论文使用印尼语新闻文章作为他们的数据集,因为它们使用的是通常具有良好语言结构的形式语言。srl中使用的方法大多是基于规则的。基于规则的开发方法的一个缺点是规则非常依赖于特定的语言或问题领域。因此,进一步的工作可以使用基于转换器的深度学习方法对印尼语文本执行SRL。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Semantic Role Labeling for Information Extraction on Indonesian Texts: A Literature Review
The information extraction process includes Semantic Role Labeling (SRL) as one of its sub-tasks. SRL aims to determine the semantic role of each entity within a sentence by examining the meaning of the predicate. This helps construct the sentence structure by identifying the relationships between predicates and their corresponding arguments. SRL development is less common than Named Entity Recognition (NER) for information extraction because SRL annotation process is complicated, and labeling results are sometimes ambiguous. In event extraction problem, the use of NER alone is insufficient. Identifying location entities generated by NER is still inaccurate because geographic coordinates indicate locations irrelevant to actual events. On the other hand, SRL can detect locations precisely and in depth according to actual events. Even though the annotation process is complicated, the SRL can be adjusted according to the required domain and its ontology so that SRL can extract location entities down to the event level.. This research aims to offer a comprehensive analysis concerning the advancement of Semantic Role Labeling (SRL) for extracting information from Indonesian texts. Indonesian is a low-resource language with a different character from English and only has very little literature, so it is interesting to study. The papers used for the review process came from IEEE, Science Direct, and Google Scholar from 2013 to 2023, and 15 papers were found that matched the research objectives. The study results show that most papers use Indonesian-language news articles as their dataset because they use formal language, which usually has a good language structure. The methods used in SRLs are mostly rule-based. A weakness of the rule-based development method is that the rules are very dependent on a particular language or problem domain. Thus, further work can use a transformer-based deep learning approach to perform SRL on Indonesian-language texts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信