Online Incremental Learning Based on Crowdsourcing For Indonesian Ontology Relation Extraction

IF 3.4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Eunike Andriani Kardinata, Nur Aini Rakhmawati
{"title":"Online Incremental Learning Based on Crowdsourcing For Indonesian Ontology Relation Extraction","authors":"Eunike Andriani Kardinata, Nur Aini Rakhmawati","doi":"10.4114/intartif.vol26iss72pp124-136","DOIUrl":null,"url":null,"abstract":"Ontology is one form of structured representation of knowledge. Ontology is widely used and developed in information retrieval because of its ability to represent knowledge in a form that machines and humans can understand. With the increasing scale and complexity of ontology, there are more significant challenges in identifying extra-logical errors. Ontological development methods mostly use machine learning, which is at risk of missed extra-logical errors. To handle it, crowdsourcing is used, i.e. dividing a large job into several small jobs and hiring the masses to complete it. Data processing is usually done offline to take advantage of crowdsourcing, and batches are converted into online and incremental. Online incremental learning directly arranges an iterative model after a change is made by ensuring that the knowledge that has been obtained before is maintained. This study built an interactive medium to present the initial relationship between concept pairs. Crowdsourcing participants were asked to validate the relationship repeatedly until a specified accuracy value was reached. This study found that the crowdsourcing process was able to improve the model used in the relationship extraction process, from F1-Score 87.2% to 89.8%. Improvements using crowdsourcing achieve the same result as improvements by experts. Thus, crowdsourcing can correct extra-logical errors appropriately as an expert. In addition, it was also found that offline incremental learning using Random Forest resulted in higher model accuracy than incremental online learning using Mondrian Forest. The accuracy of the Random Forest model has a final accuracy of 90.6%, while the accuracy of the Mondrian Forest model is 89.7%. From these results, it was concluded that incremental online learning cannot provide better results than offline incremental learning to improve the meronymy relationship extraction process.","PeriodicalId":43470,"journal":{"name":"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4114/intartif.vol26iss72pp124-136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Ontology is one form of structured representation of knowledge. Ontology is widely used and developed in information retrieval because of its ability to represent knowledge in a form that machines and humans can understand. With the increasing scale and complexity of ontology, there are more significant challenges in identifying extra-logical errors. Ontological development methods mostly use machine learning, which is at risk of missed extra-logical errors. To handle it, crowdsourcing is used, i.e. dividing a large job into several small jobs and hiring the masses to complete it. Data processing is usually done offline to take advantage of crowdsourcing, and batches are converted into online and incremental. Online incremental learning directly arranges an iterative model after a change is made by ensuring that the knowledge that has been obtained before is maintained. This study built an interactive medium to present the initial relationship between concept pairs. Crowdsourcing participants were asked to validate the relationship repeatedly until a specified accuracy value was reached. This study found that the crowdsourcing process was able to improve the model used in the relationship extraction process, from F1-Score 87.2% to 89.8%. Improvements using crowdsourcing achieve the same result as improvements by experts. Thus, crowdsourcing can correct extra-logical errors appropriately as an expert. In addition, it was also found that offline incremental learning using Random Forest resulted in higher model accuracy than incremental online learning using Mondrian Forest. The accuracy of the Random Forest model has a final accuracy of 90.6%, while the accuracy of the Mondrian Forest model is 89.7%. From these results, it was concluded that incremental online learning cannot provide better results than offline incremental learning to improve the meronymy relationship extraction process.
基于众包的印尼语本体关系提取在线增量学习
本体是知识的一种结构化表示形式。本体能够以机器和人类都能理解的形式表示知识,因此在信息检索领域得到了广泛的应用和发展。随着本体规模和复杂性的不断增加,在识别逻辑外错误方面面临着越来越大的挑战。本体论开发方法大多使用机器学习,这有可能遗漏额外的逻辑错误。为了解决这个问题,使用了众包,即将一项大工作分成几个小工作,然后雇用大众来完成它。数据处理通常在线下进行,以利用众包的优势,并将批量转换为在线和增量。在线增量学习通过确保之前获得的知识得到维护,直接安排了变更后的迭代模型。本研究建立互动媒介来呈现概念对之间的初始关系。众包参与者被要求反复验证关系,直到达到指定的精度值。本研究发现,众包过程能够改善关系提取过程中使用的模型,从F1-Score的87.2%提高到89.8%。使用众包的改进与专家的改进效果相同。因此,作为专家,众包可以适当地纠正额外的逻辑错误。此外,还发现使用Random Forest的离线增量学习比使用Mondrian Forest的在线增量学习产生更高的模型精度。随机森林模型的最终准确率为90.6%,而蒙德里安森林模型的准确率为89.7%。从这些结果可以看出,增量式在线学习并不能提供比离线增量学习更好的效果来改善名称关系提取过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.00
自引率
0.00%
发文量
15
审稿时长
8 weeks
期刊介绍: Inteligencia Artificial is a quarterly journal promoted and sponsored by the Spanish Association for Artificial Intelligence. The journal publishes high-quality original research papers reporting theoretical or applied advances in all branches of Artificial Intelligence. The journal publishes high-quality original research papers reporting theoretical or applied advances in all branches of Artificial Intelligence. Particularly, the Journal welcomes: New approaches, techniques or methods to solve AI problems, which should include demonstrations of effectiveness oor improvement over existing methods. These demonstrations must be reproducible. Integration of different technologies or approaches to solve wide problems or belonging different areas. AI applications, which should describe in detail the problem or the scenario and the proposed solution, emphasizing its novelty and present a evaluation of the AI techniques that are applied. In addition to rapid publication and dissemination of unsolicited contributions, the journal is also committed to producing monographs, surveys or special issues on topics, methods or techniques of special relevance to the AI community. Inteligencia Artificial welcomes submissions written in English, Spaninsh or Portuguese. But at least, a title, summary and keywords in english should be included in each contribution.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信