推特上自然灾害事件的结构化信息提取

Web-KR '14 Pub Date : 2014-11-03 DOI:10.1145/2663792.2663794

Sandeep Panem, Manish Gupta, Vasudeva Varma

{"title":"推特上自然灾害事件的结构化信息提取","authors":"Sandeep Panem, Manish Gupta, Vasudeva Varma","doi":"10.1145/2663792.2663794","DOIUrl":null,"url":null,"abstract":"As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information. Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disaster events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema.\n We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twitter. Evaluation on ~58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ~0.6.","PeriodicalId":289794,"journal":{"name":"Web-KR '14","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"Structured Information Extraction from Natural Disaster Events on Twitter\",\"authors\":\"Sandeep Panem, Manish Gupta, Vasudeva Varma\",\"doi\":\"10.1145/2663792.2663794\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information. Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disaster events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema.\\n We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twitter. Evaluation on ~58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ~0.6.\",\"PeriodicalId\":289794,\"journal\":{\"name\":\"Web-KR '14\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Web-KR '14\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2663792.2663794\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Web-KR '14","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2663792.2663794","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

摘要

一旦自然灾害事件发生，用户就渴望更多地了解它们。然而，搜索引擎目前为与此类事件相关的查询提供了十个蓝色链接界面。如果向用户显示与此类查询相关的新事件的结构化摘要，则可以显著提高此类查询结果的相关性。这不仅可以减少用户点击获取相关信息的次数，还可以帮助用户获得更细粒度的属性级信息。Twitter是一个很好的资源，可以利用它来获取有关新自然灾害事件的这种细粒度结构化信息。此类事件在推特上的报道往往比其他新闻媒体早得多。然而，从tweet中提取这种结构化信息是具有挑战性的，因为:推文是嘈杂和模糊的;2. 对于各种类型的自然灾害事件，没有一个明确的模式;3.从非结构化文本中提取属性值对和事实并非易事;和4。很难在提取的属性和事件模式中的属性之间找到良好的映射。我们提出了提取属性值对的算法，并设计了将这些对映射到人工生成的自然灾害事件模式的新机制。除了tweet文本，我们还利用tweet中的URL链接中的文本来填充这样的模式。我们的模式本质上是暂时的，每当Twitter上的人类传感器输入新信息时，这些值就会更新。对约58000条tweet的20个事件的评估表明，我们的系统可以用约0.6的F1填充这些事件模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Structured Information Extraction from Natural Disaster Events on Twitter

As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information. Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disaster events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema. We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twitter. Evaluation on ~58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ~0.6.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Web-KR '14

自引率

0.00%

发文量