{"title":"Information Extraction of Traffic Condition from Social Media using Bidirectional LSTM-CNN","authors":"M. R. Alifi, S. Supangkat","doi":"10.1109/ISRITI.2018.8864265","DOIUrl":null,"url":null,"abstract":"Twitter is social media that has become a source of information for real-time traffic condition in Indonesia. Generally, information contained on Twitter is used for shortterm needs, such as to find out the point of congestion at that time. Whereas, the information can be collected and processed for long-term needs, such as congestion-prone points mapping at certain times. This processed information is very useful for city stakeholders. Information extraction is needed to process information in the form of text from social media that is originally unstructured to be structured. NER technique can be applied to obtain entities that represent traffic condition. This study classifies entities into 11 classes with the BIO encoding scheme, namely B_TIME, I_TIME, B_LOCT, I_LOCT, B_COND, I_COND, B_CAUS, I_CAUS, B_WEAT, I_WEAT, B_MISC, I_MISC, and O. The defined classes represent entities of time, location, condition, cause, weather, miscellaneous, and others. This study proposes a solution in the form of model architecture design using deep learning approach. Bidirectional LSTM approach is used to handle word level. While the CNN approach is used to handle character level. The combination of these two deep learning methods accompanied by word embedding produces the value of F-measure 0.789.","PeriodicalId":162781,"journal":{"name":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISRITI.2018.8864265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Twitter is social media that has become a source of information for real-time traffic condition in Indonesia. Generally, information contained on Twitter is used for shortterm needs, such as to find out the point of congestion at that time. Whereas, the information can be collected and processed for long-term needs, such as congestion-prone points mapping at certain times. This processed information is very useful for city stakeholders. Information extraction is needed to process information in the form of text from social media that is originally unstructured to be structured. NER technique can be applied to obtain entities that represent traffic condition. This study classifies entities into 11 classes with the BIO encoding scheme, namely B_TIME, I_TIME, B_LOCT, I_LOCT, B_COND, I_COND, B_CAUS, I_CAUS, B_WEAT, I_WEAT, B_MISC, I_MISC, and O. The defined classes represent entities of time, location, condition, cause, weather, miscellaneous, and others. This study proposes a solution in the form of model architecture design using deep learning approach. Bidirectional LSTM approach is used to handle word level. While the CNN approach is used to handle character level. The combination of these two deep learning methods accompanied by word embedding produces the value of F-measure 0.789.