An initial prototype system for Chinese spoken document understanding and organization for indexing/browsing and retrieval applications

Lin-Shan Lee, Shun-Chuan Chen, Yuan Ho, Jia-fu Chen, Ming Li, T. Li
{"title":"An initial prototype system for Chinese spoken document understanding and organization for indexing/browsing and retrieval applications","authors":"Lin-Shan Lee, Shun-Chuan Chen, Yuan Ho, Jia-fu Chen, Ming Li, T. Li","doi":"10.1109/CHINSL.2004.1409653","DOIUrl":null,"url":null,"abstract":"The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.
一个用于中文口语文档理解和组织索引/浏览和检索应用程序的初始原型系统
未来网络内容最具吸引力的形式将是多媒体。当包含语音信息时,它通常承载着内容的核心概念。因此,与多媒体内容相关联的口头文档很可能作为索引/浏览和检索的关键。然而,与书面文件不同,多媒体或语音信息通常只是音频/视频信号。它们很难索引、浏览或检索,因为用户在浏览过程中无法从头到尾地浏览它们。一种可能的方法是将音频/视频信号自动分割成短段落,每个段落都有一个中心概念或主题,然后以语音或文本形式自动为每个段落生成标题和/或摘要。在分段的短段落中描述的主题和中心概念可以进一步分析并组织成描述这些主题和中心概念之间关系的图形结构。因此,多媒体内容可以更有效地自动编入索引,并由用户根据标题、摘要和图形结构浏览和检索。我们把这称为对口语文件的理解和组织。本文以广播新闻为例,给出了实现该功能的初步原型系统。用于描述主题和中心概念之间关系的图形结构是基于概率潜在语义分析开发的二维树结构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信