Information Extraction of Traffic Condition from Social Media using Bidirectional LSTM-CNN

M. R. Alifi, S. Supangkat
{"title":"Information Extraction of Traffic Condition from Social Media using Bidirectional LSTM-CNN","authors":"M. R. Alifi, S. Supangkat","doi":"10.1109/ISRITI.2018.8864265","DOIUrl":null,"url":null,"abstract":"Twitter is social media that has become a source of information for real-time traffic condition in Indonesia. Generally, information contained on Twitter is used for shortterm needs, such as to find out the point of congestion at that time. Whereas, the information can be collected and processed for long-term needs, such as congestion-prone points mapping at certain times. This processed information is very useful for city stakeholders. Information extraction is needed to process information in the form of text from social media that is originally unstructured to be structured. NER technique can be applied to obtain entities that represent traffic condition. This study classifies entities into 11 classes with the BIO encoding scheme, namely B_TIME, I_TIME, B_LOCT, I_LOCT, B_COND, I_COND, B_CAUS, I_CAUS, B_WEAT, I_WEAT, B_MISC, I_MISC, and O. The defined classes represent entities of time, location, condition, cause, weather, miscellaneous, and others. This study proposes a solution in the form of model architecture design using deep learning approach. Bidirectional LSTM approach is used to handle word level. While the CNN approach is used to handle character level. The combination of these two deep learning methods accompanied by word embedding produces the value of F-measure 0.789.","PeriodicalId":162781,"journal":{"name":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI.2018.8864265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Twitter is social media that has become a source of information for real-time traffic condition in Indonesia. Generally, information contained on Twitter is used for shortterm needs, such as to find out the point of congestion at that time. Whereas, the information can be collected and processed for long-term needs, such as congestion-prone points mapping at certain times. This processed information is very useful for city stakeholders. Information extraction is needed to process information in the form of text from social media that is originally unstructured to be structured. NER technique can be applied to obtain entities that represent traffic condition. This study classifies entities into 11 classes with the BIO encoding scheme, namely B_TIME, I_TIME, B_LOCT, I_LOCT, B_COND, I_COND, B_CAUS, I_CAUS, B_WEAT, I_WEAT, B_MISC, I_MISC, and O. The defined classes represent entities of time, location, condition, cause, weather, miscellaneous, and others. This study proposes a solution in the form of model architecture design using deep learning approach. Bidirectional LSTM approach is used to handle word level. While the CNN approach is used to handle character level. The combination of these two deep learning methods accompanied by word embedding produces the value of F-measure 0.789.
基于双向LSTM-CNN的社交媒体交通状况信息提取
Twitter是一种社交媒体,已经成为印度尼西亚实时交通状况的信息来源。一般来说,Twitter上包含的信息用于短期需求,例如找出当时的拥堵点。然而,这些信息可以被收集和处理以满足长期需求,例如在特定时间绘制容易拥堵的点。这些经过处理的信息对城市利益相关者非常有用。信息提取需要将社交媒体中原本非结构化的文本形式的信息处理为结构化。NER技术可以用于获取代表交通状况的实体。本研究采用BIO编码方案将实体分为11个类,分别是B_TIME、I_TIME、B_LOCT、I_LOCT、B_COND、I_COND、b_cause、I_CAUS、B_WEAT、I_WEAT、B_MISC、I_MISC、o。定义的类代表时间、地点、条件、原因、天气、杂项等实体。本研究提出了一种利用深度学习方法进行模型架构设计的解决方案。采用双向LSTM方法处理字级。而CNN的方法用于处理字符级别。这两种深度学习方法结合词嵌入得到f测度值0.789。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信