{"title":"基于社交媒体地名识别的事件位置估计","authors":"M. Sagcan, P. Senkul","doi":"10.1109/ICDMW.2015.167","DOIUrl":null,"url":null,"abstract":"Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Toponym Recognition in Social Media for Estimating the Location of Events\",\"authors\":\"M. Sagcan, P. Senkul\",\"doi\":\"10.1109/ICDMW.2015.167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.\",\"PeriodicalId\":192888,\"journal\":{\"name\":\"2015 IEEE International Conference on Data Mining Workshop (ICDMW)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Data Mining Workshop (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2015.167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
摘要
Twitter和Facebook等社交媒体的突出导致了大量数据的收集,事件检测可以提供有用的结果。事件检测的一个重要方面是对被检测事件的位置估计。社交媒体提供了各种各样的位置线索,例如智能设备的地理注释、用户个人资料中的位置字段和消息内容。在这些线索中,消息内容需要更多的精力来处理,但它通常更具信息性。在本文中,我们专注于从社交媒体消息中提取地点名称,即地名识别。我们提出了一个混合系统,它使用基于规则和基于机器学习的技术从推文中提取地名。将条件随机场(Conditional Random Fields, CRF)作为机器学习工具,定义词性标签和连接窗口等特征,构建用于地名识别的条件随机场模型。在基于规则的部分中,使用正则表达式来定义一些地名识别模式,并提供简单的规范化级别,以便处理文本中的非正式性。实验结果表明,该方法比以往的研究方法具有更高的地名识别率。
Toponym Recognition in Social Media for Estimating the Location of Events
Prominence of social media such as Twitter and Facebook led to a huge collection of data over which event detection provides useful results. An important dimension of event detection is location estimation for detected events. Social media provides a variety of clues for location, such as geographical annotation from smart devices, location field in the user profile and the content of the message. Among these clues, message content needs more effort for processing, yet it is generally more informative. In this paper, we focus on extraction of location names, i.e., toponym recognition, from social media messages. We propose a a hybrid system, which uses both rule based and machine learning based techniques to extract toponyms from tweets. Conditional Random Fields (CRF) is used as the machine learning tool and features such as Part-of-Speech tags and conjunction window are defined in order to construct a CRF model for toponym recognition. In the rule based part, regular expressions are used in order to define some of the toponym recognition patterns as well as to provide a simple level of normalization in order to handle the informality in the text. Experimental results show that the proposed method has higher toponym recognition ratio in comparison to the previous studies.