Traffic information extraction from a blogging platform using knowledge-based approaches and bootstrapping

Jorge Leonid Aching Samatelo, Thiago B. F. de Oliveira, A. Bazzan
{"title":"Traffic information extraction from a blogging platform using knowledge-based approaches and bootstrapping","authors":"Jorge Leonid Aching Samatelo, Thiago B. F. de Oliveira, A. Bazzan","doi":"10.1109/CIVTS.2014.7009471","DOIUrl":null,"url":null,"abstract":"In this paper we propose a strategy to use messages posted in a blogging platform for real-time sensing of traffic-related information. Specifically, we use the data that appear in a blog, in Portuguese language, which is managed by a Brazilian daily newspaper on its online edition. We propose a framework based on two modules to infer the location and traffic condition from unstructured, non georeferenced short post in Portuguese. The first module relates to name-entity recognition (NER). It automatically recognizes three classes of named-entities (NEs) from the input post (LOCATION, STATUS and DATE). Here, a bootstrapping approach is used to expand the initially given list of locations, identifying new locations as well as locations corresponding to spelling variants and typographical errors of the known locations. The second module relates to relation extraction (RE). It extracts binary and ternary relations between such entities to obtain relevant traffic information. In our experiments, the NER module has yielded a F-measure of 96%, while the RE module resulted in 87%. Also, results show that our bootstrapping approach identifies 1;058 new locations when 10;000 short posts are analyzed.","PeriodicalId":283766,"journal":{"name":"2014 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Symposium on Computational Intelligence in Vehicles and Transportation Systems (CIVTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIVTS.2014.7009471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper we propose a strategy to use messages posted in a blogging platform for real-time sensing of traffic-related information. Specifically, we use the data that appear in a blog, in Portuguese language, which is managed by a Brazilian daily newspaper on its online edition. We propose a framework based on two modules to infer the location and traffic condition from unstructured, non georeferenced short post in Portuguese. The first module relates to name-entity recognition (NER). It automatically recognizes three classes of named-entities (NEs) from the input post (LOCATION, STATUS and DATE). Here, a bootstrapping approach is used to expand the initially given list of locations, identifying new locations as well as locations corresponding to spelling variants and typographical errors of the known locations. The second module relates to relation extraction (RE). It extracts binary and ternary relations between such entities to obtain relevant traffic information. In our experiments, the NER module has yielded a F-measure of 96%, while the RE module resulted in 87%. Also, results show that our bootstrapping approach identifies 1;058 new locations when 10;000 short posts are analyzed.
利用基于知识的方法和自举法从博客平台提取交通信息
在本文中,我们提出了一种利用博客平台上发布的消息来实时感知交通相关信息的策略。具体来说,我们使用葡萄牙语博客上的数据,该博客由一家巴西日报在其在线版上管理。我们提出了一个基于两个模块的框架,从葡萄牙语的非结构化、非地理参考的短帖子中推断位置和交通状况。第一个模块涉及名称实体识别(NER)。它自动从输入post (LOCATION、STATUS和DATE)中识别三类命名实体(ne)。这里,使用自举方法来扩展最初给定的位置列表,识别新位置以及与已知位置的拼写变体和印刷错误相对应的位置。第二个模块涉及关系提取(RE)。提取这些实体之间的二元和三元关系,从而获得相关的交通信息。在我们的实验中,NER模块的f值为96%,而RE模块的f值为87%。此外,结果表明,我们的自举方法在分析10,000个短帖子时识别出1,058个新位置。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信