Lin-Shan Lee, Shun-Chuan Chen, Yuan Ho, Jia-fu Chen, Ming Li, T. Li
{"title":"An initial prototype system for Chinese spoken document understanding and organization for indexing/browsing and retrieval applications","authors":"Lin-Shan Lee, Shun-Chuan Chen, Yuan Ho, Jia-fu Chen, Ming Li, T. Li","doi":"10.1109/CHINSL.2004.1409653","DOIUrl":null,"url":null,"abstract":"The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.