Information Extraction of Traffic Condition from Social Media using Bidirectional LSTM-CNN

2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI) Pub Date : 2018-11-01 DOI:10.1109/ISRITI.2018.8864265

M. R. Alifi, S. Supangkat

{"title":"Information Extraction of Traffic Condition from Social Media using Bidirectional LSTM-CNN","authors":"M. R. Alifi, S. Supangkat","doi":"10.1109/ISRITI.2018.8864265","DOIUrl":null,"url":null,"abstract":"Twitter is social media that has become a source of information for real-time traffic condition in Indonesia. Generally, information contained on Twitter is used for shortterm needs, such as to find out the point of congestion at that time. Whereas, the information can be collected and processed for long-term needs, such as congestion-prone points mapping at certain times. This processed information is very useful for city stakeholders. Information extraction is needed to process information in the form of text from social media that is originally unstructured to be structured. NER technique can be applied to obtain entities that represent traffic condition. This study classifies entities into 11 classes with the BIO encoding scheme, namely B_TIME, I_TIME, B_LOCT, I_LOCT, B_COND, I_COND, B_CAUS, I_CAUS, B_WEAT, I_WEAT, B_MISC, I_MISC, and O. The defined classes represent entities of time, location, condition, cause, weather, miscellaneous, and others. This study proposes a solution in the form of model architecture design using deep learning approach. Bidirectional LSTM approach is used to handle word level. While the CNN approach is used to handle character level. The combination of these two deep learning methods accompanied by word embedding produces the value of F-measure 0.789.","PeriodicalId":162781,"journal":{"name":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI.2018.8864265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Twitter is social media that has become a source of information for real-time traffic condition in Indonesia. Generally, information contained on Twitter is used for shortterm needs, such as to find out the point of congestion at that time. Whereas, the information can be collected and processed for long-term needs, such as congestion-prone points mapping at certain times. This processed information is very useful for city stakeholders. Information extraction is needed to process information in the form of text from social media that is originally unstructured to be structured. NER technique can be applied to obtain entities that represent traffic condition. This study classifies entities into 11 classes with the BIO encoding scheme, namely B_TIME, I_TIME, B_LOCT, I_LOCT, B_COND, I_COND, B_CAUS, I_CAUS, B_WEAT, I_WEAT, B_MISC, I_MISC, and O. The defined classes represent entities of time, location, condition, cause, weather, miscellaneous, and others. This study proposes a solution in the form of model architecture design using deep learning approach. Bidirectional LSTM approach is used to handle word level. While the CNN approach is used to handle character level. The combination of these two deep learning methods accompanied by word embedding produces the value of F-measure 0.789.

查看原文本刊更多论文

基于双向LSTM-CNN的社交媒体交通状况信息提取

Twitter是一种社交媒体，已经成为印度尼西亚实时交通状况的信息来源。一般来说，Twitter上包含的信息用于短期需求，例如找出当时的拥堵点。然而，这些信息可以被收集和处理以满足长期需求，例如在特定时间绘制容易拥堵的点。这些经过处理的信息对城市利益相关者非常有用。信息提取需要将社交媒体中原本非结构化的文本形式的信息处理为结构化。NER技术可以用于获取代表交通状况的实体。本研究采用BIO编码方案将实体分为11个类，分别是B_TIME、I_TIME、B_LOCT、I_LOCT、B_COND、I_COND、b_cause、I_CAUS、B_WEAT、I_WEAT、B_MISC、I_MISC、o。定义的类代表时间、地点、条件、原因、天气、杂项等实体。本研究提出了一种利用深度学习方法进行模型架构设计的解决方案。采用双向LSTM方法处理字级。而CNN的方法用于处理字符级别。这两种深度学习方法结合词嵌入得到f测度值0.789。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)

自引率

0.00%

发文量