Event extraction on Indonesian news article using multiclass categorization

M. L. Khodra
{"title":"Event extraction on Indonesian news article using multiclass categorization","authors":"M. L. Khodra","doi":"10.1109/ICAICTA.2015.7335365","DOIUrl":null,"url":null,"abstract":"Event extraction identifies who did what, when, where, why, and how, which is known as 5W1H. We aim to investigate event extraction on Indonesian news articles as multiclass-categorization problem, and apply statistical learning-based approach that treats event extraction as a sequence labeling problem under BIO (Begin Inside Outside) labeling scheme. Each token of input text will be classified into one of 13 predefined classes. Our contributions are providing 5W1H corpus, and the best technique to build model of event extraction. Our experiments show that C4.5 is better than AdaboostM1 although Adaboost can identify minority labels better than C4.5. In addition, C4.5 with all features gave the best Fmeasure of 0.666.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Event extraction identifies who did what, when, where, why, and how, which is known as 5W1H. We aim to investigate event extraction on Indonesian news articles as multiclass-categorization problem, and apply statistical learning-based approach that treats event extraction as a sequence labeling problem under BIO (Begin Inside Outside) labeling scheme. Each token of input text will be classified into one of 13 predefined classes. Our contributions are providing 5W1H corpus, and the best technique to build model of event extraction. Our experiments show that C4.5 is better than AdaboostM1 although Adaboost can identify minority labels better than C4.5. In addition, C4.5 with all features gave the best Fmeasure of 0.666.
基于多类分类的印尼语新闻文章事件提取
事件提取识别谁做了什么、何时、何地、为什么以及如何做,这被称为5W1H。我们的目标是研究印度尼西亚新闻文章的事件提取作为多类别分类问题,并应用基于统计学习的方法,将事件提取作为BIO (Begin Inside Outside)标记方案下的序列标记问题。输入文本的每个标记将被分类到13个预定义类中的一个。我们的贡献是提供5W1H语料库,以及构建事件抽取模型的最佳技术。我们的实验表明,尽管Adaboost可以比C4.5更好地识别少数标签,但C4.5优于AdaboostM1。此外,C4.5的所有特征给出了最好的Fmeasure为0.666。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信