{"title":"为文本流构建和可视化主题森林","authors":"Takayasu Fushimi, T. Satoh","doi":"10.1145/3106426.3106455","DOIUrl":null,"url":null,"abstract":"A great deal of such texts as news and blog articles, web pages, and scientific literature are posted on the web as time goes by, and are generally called time-series documents or text streams. For each document, some strongly or weakly relevant texts exist. Although such relevance is represented as citations among scientific literatures, trackback among blog articles, hyperlinks among Wikipedia articles or web pages and so on, the relevance among news articles is not always clearly specified. One easy way to build a similarity network is by calculating the similarity among news articles and making links among similar articles; however, adding information about the posted times of articles to a similarity network is difficult. To overcome this problem, we propose a framework that consists of two parts: 1) tree structures called Topic Forests and 2) their visualization. Topic Forests are constructed by semantically and temporally linking cohesive texts while preserving their posted order. We provide effective access for users to text streams by embedding Topic Forests over the polar coordinates with a technique called Polar Coordinate Embedding. From experimental evaluations using the actual text streams of news articles, we confirm that Topic Forests semantically and temporally maintain cohesiveness, and Polar Coordinate Embedding achieves effective accessibility.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Constructing and visualizing topic forests for text streams\",\"authors\":\"Takayasu Fushimi, T. Satoh\",\"doi\":\"10.1145/3106426.3106455\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A great deal of such texts as news and blog articles, web pages, and scientific literature are posted on the web as time goes by, and are generally called time-series documents or text streams. For each document, some strongly or weakly relevant texts exist. Although such relevance is represented as citations among scientific literatures, trackback among blog articles, hyperlinks among Wikipedia articles or web pages and so on, the relevance among news articles is not always clearly specified. One easy way to build a similarity network is by calculating the similarity among news articles and making links among similar articles; however, adding information about the posted times of articles to a similarity network is difficult. To overcome this problem, we propose a framework that consists of two parts: 1) tree structures called Topic Forests and 2) their visualization. Topic Forests are constructed by semantically and temporally linking cohesive texts while preserving their posted order. We provide effective access for users to text streams by embedding Topic Forests over the polar coordinates with a technique called Polar Coordinate Embedding. From experimental evaluations using the actual text streams of news articles, we confirm that Topic Forests semantically and temporally maintain cohesiveness, and Polar Coordinate Embedding achieves effective accessibility.\",\"PeriodicalId\":20685,\"journal\":{\"name\":\"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3106426.3106455\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Constructing and visualizing topic forests for text streams
A great deal of such texts as news and blog articles, web pages, and scientific literature are posted on the web as time goes by, and are generally called time-series documents or text streams. For each document, some strongly or weakly relevant texts exist. Although such relevance is represented as citations among scientific literatures, trackback among blog articles, hyperlinks among Wikipedia articles or web pages and so on, the relevance among news articles is not always clearly specified. One easy way to build a similarity network is by calculating the similarity among news articles and making links among similar articles; however, adding information about the posted times of articles to a similarity network is difficult. To overcome this problem, we propose a framework that consists of two parts: 1) tree structures called Topic Forests and 2) their visualization. Topic Forests are constructed by semantically and temporally linking cohesive texts while preserving their posted order. We provide effective access for users to text streams by embedding Topic Forests over the polar coordinates with a technique called Polar Coordinate Embedding. From experimental evaluations using the actual text streams of news articles, we confirm that Topic Forests semantically and temporally maintain cohesiveness, and Polar Coordinate Embedding achieves effective accessibility.