使用条件随机场从印度尼西亚tweets中提取事件信息

2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) Pub Date : 2015-11-23 DOI:10.1109/ICAICTA.2015.7335383

F. Muhammad, M. L. Khodra

{"title":"使用条件随机场从印度尼西亚tweets中提取事件信息","authors":"F. Muhammad, M. L. Khodra","doi":"10.1109/ICAICTA.2015.7335383","DOIUrl":null,"url":null,"abstract":"Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Event information extraction from Indonesian tweets using conditional random field\",\"authors\":\"F. Muhammad, M. L. Khodra\",\"doi\":\"10.1109/ICAICTA.2015.7335383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.\",\"PeriodicalId\":319020,\"journal\":{\"name\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2015.7335383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

信息提取是从非结构化或半结构化文本中寻找结构化文本的过程。本研究的目的是建立一个专门针对印度尼西亚推文事件的信息提取系统。该系统主要由两个部分组成。第一部分从不相关的推文中过滤相关的推文。这部分只使用了基于规则的方法和额外的词包特征，并获得了86%的最佳准确率。第二部分是提取过程。通过实验，我们得到了多标记化方法、全特征集和一阶条件随机场对提取器模块的最佳组合。这种组合导致每个令牌的平均准确率为74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Event information extraction from Indonesian tweets using conditional random field

Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)

自引率

0.00%

发文量