N. Miller, Pak Chung Wong, Mary Brewster, Harlan Foote
{"title":"TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system","authors":"N. Miller, Pak Chung Wong, Mary Brewster, Harlan Foote","doi":"10.5555/288216.288247","DOIUrl":null,"url":null,"abstract":"We present a novel approach to visualize and explore unstructured text. The underlying technology, called TOPIC-O-GRAPHY/sup TM/, applies wavelet transforms to a custom digital signal constructed from words within a document. The resultant multiresolution wavelet energy is used to analyze the characteristics of the narrative flow in the frequency domain, such as theme changes, which is then related to the overall thematic content of the text document using statistical methods. The thematic characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized text partitions to partitions consisting of a few words. Using this technology, we are developing a visualization system prototype known as TOPIC ISLANDS to browse a document, generate fuzzy document outlines, summarize text by levels of detail and according to user interests, define meaningful subdocuments, query text content, and provide summaries of topic evolution.","PeriodicalId":399113,"journal":{"name":"Proceedings Visualization '98 (Cat. No.98CB36276)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"89","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Visualization '98 (Cat. No.98CB36276)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/288216.288247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 89
Abstract
We present a novel approach to visualize and explore unstructured text. The underlying technology, called TOPIC-O-GRAPHY/sup TM/, applies wavelet transforms to a custom digital signal constructed from words within a document. The resultant multiresolution wavelet energy is used to analyze the characteristics of the narrative flow in the frequency domain, such as theme changes, which is then related to the overall thematic content of the text document using statistical methods. The thematic characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized text partitions to partitions consisting of a few words. Using this technology, we are developing a visualization system prototype known as TOPIC ISLANDS to browse a document, generate fuzzy document outlines, summarize text by levels of detail and according to user interests, define meaningful subdocuments, query text content, and provide summaries of topic evolution.