An initial prototype system for Chinese spoken document understanding and organization for indexing/browsing and retrieval applications

2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI:10.1109/CHINSL.2004.1409653

Lin-Shan Lee, Shun-Chuan Chen, Yuan Ho, Jia-fu Chen, Ming Li, T. Li

{"title":"An initial prototype system for Chinese spoken document understanding and organization for indexing/browsing and retrieval applications","authors":"Lin-Shan Lee, Shun-Chuan Chen, Yuan Ho, Jia-fu Chen, Ming Li, T. Li","doi":"10.1109/CHINSL.2004.1409653","DOIUrl":null,"url":null,"abstract":"The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.

查看原文本刊更多论文

一个用于中文口语文档理解和组织索引/浏览和检索应用程序的初始原型系统

未来网络内容最具吸引力的形式将是多媒体。当包含语音信息时，它通常承载着内容的核心概念。因此，与多媒体内容相关联的口头文档很可能作为索引/浏览和检索的关键。然而，与书面文件不同，多媒体或语音信息通常只是音频/视频信号。它们很难索引、浏览或检索，因为用户在浏览过程中无法从头到尾地浏览它们。一种可能的方法是将音频/视频信号自动分割成短段落，每个段落都有一个中心概念或主题，然后以语音或文本形式自动为每个段落生成标题和/或摘要。在分段的短段落中描述的主题和中心概念可以进一步分析并组织成描述这些主题和中心概念之间关系的图形结构。因此，多媒体内容可以更有效地自动编入索引，并由用户根据标题、摘要和图形结构浏览和检索。我们把这称为对口语文件的理解和组织。本文以广播新闻为例，给出了实现该功能的初步原型系统。用于描述主题和中心概念之间关系的图形结构是基于概率潜在语义分析开发的二维树结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2004 International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量