K. Pu, Oktie Hassanzadeh, Richard Drake, Renée J. Miller
{"title":"具有结构化实体的文本流的在线注释","authors":"K. Pu, Oktie Hassanzadeh, Richard Drake, Renée J. Miller","doi":"10.1145/1871437.1871446","DOIUrl":null,"url":null,"abstract":"We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Online annotation of text streams with structured entities\",\"authors\":\"K. Pu, Oktie Hassanzadeh, Richard Drake, Renée J. Miller\",\"doi\":\"10.1145/1871437.1871446\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.\",\"PeriodicalId\":310611,\"journal\":{\"name\":\"Proceedings of the 19th ACM international conference on Information and knowledge management\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1871437.1871446\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871437.1871446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Online annotation of text streams with structured entities
We propose a framework and algorithm for annotating unbounded text streams with entities of a structured database. The algorithm allows one to correlate unstructured and dirty text streams from sources such as emails, chats and blogs, to entities stored in structured databases. In contrast to previous work on entity extraction, our emphasis is on performing entity annotation in a completely online fashion. The algorithm continuously extracts important phrases and assigns to them top-k relevant entities. Our algorithm does so with a guarantee of constant time and space complexity for each additional word in the text stream, thus infinite text streams can be annotated. Our framework allows the online annotation algorithm to adapt to changing stream rate by self-adjusting multiple run-time parameters to reduce or improve the quality of annotation for fast or slow streams, respectively. The framework also allows the online annotation algorithm to incorporate query feedback to learn the user preference and personalize the annotation for individual users.