{"title":"使用条件随机场从印度尼西亚tweets中提取事件信息","authors":"F. Muhammad, M. L. Khodra","doi":"10.1109/ICAICTA.2015.7335383","DOIUrl":null,"url":null,"abstract":"Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Event information extraction from Indonesian tweets using conditional random field\",\"authors\":\"F. Muhammad, M. L. Khodra\",\"doi\":\"10.1109/ICAICTA.2015.7335383\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.\",\"PeriodicalId\":319020,\"journal\":{\"name\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2015.7335383\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Event information extraction from Indonesian tweets using conditional random field
Information extraction is a process to find structured text from unstructured or semi-structured text. This research has an objective to build an information extraction system specialized for Events in Indonesian tweets. The system consists of two main parts. First part filters relevant tweet from irrelevant tweet. This part is only using a rule based approach with additional bag of words feature and gets the best accuracy of 86%. The second part is doing the extraction process. From our experiments, we get the best combination for extractor module by using multi token tokenization method, all feature set and 1st Order Conditional Random Field. This combination result in average accuracy of 74% per token.