{"title":"检测技术新闻流中的热点话题","authors":"Bo You, Ming Liu, Bingquan Liu, Xiaolong Wang","doi":"10.1109/ICMLC.2012.6359678","DOIUrl":null,"url":null,"abstract":"Detecting hot topics with a fine granularity in technology news streams is an interesting and important problem given the large amount of reports and a relatively narrow range of topics. In this paper, a three-phase method is proposed. In the first phase, the document topic distribution vector is generated and keywords are extracted for each document using topic model pachinko allocation. In the second phase, the documents are clustered based on the document topic distribution vector obtained from the previous phase using affinity propagation. And in the last phase, actual events denoted by combinations of keywords within each cluster are found out using frequent pattern mining algorithms. We evaluate our approach on a collection of technology news reports from various sites in a fixed time period. T he results show that this method is effective.","PeriodicalId":128006,"journal":{"name":"2012 International Conference on Machine Learning and Cybernetics","volume":"52 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Detecting hot topics in technology news streams\",\"authors\":\"Bo You, Ming Liu, Bingquan Liu, Xiaolong Wang\",\"doi\":\"10.1109/ICMLC.2012.6359678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting hot topics with a fine granularity in technology news streams is an interesting and important problem given the large amount of reports and a relatively narrow range of topics. In this paper, a three-phase method is proposed. In the first phase, the document topic distribution vector is generated and keywords are extracted for each document using topic model pachinko allocation. In the second phase, the documents are clustered based on the document topic distribution vector obtained from the previous phase using affinity propagation. And in the last phase, actual events denoted by combinations of keywords within each cluster are found out using frequent pattern mining algorithms. We evaluate our approach on a collection of technology news reports from various sites in a fixed time period. T he results show that this method is effective.\",\"PeriodicalId\":128006,\"journal\":{\"name\":\"2012 International Conference on Machine Learning and Cybernetics\",\"volume\":\"52 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Machine Learning and Cybernetics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC.2012.6359678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2012.6359678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting hot topics with a fine granularity in technology news streams is an interesting and important problem given the large amount of reports and a relatively narrow range of topics. In this paper, a three-phase method is proposed. In the first phase, the document topic distribution vector is generated and keywords are extracted for each document using topic model pachinko allocation. In the second phase, the documents are clustered based on the document topic distribution vector obtained from the previous phase using affinity propagation. And in the last phase, actual events denoted by combinations of keywords within each cluster are found out using frequent pattern mining algorithms. We evaluate our approach on a collection of technology news reports from various sites in a fixed time period. T he results show that this method is effective.