Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia最新文献

Exploring Video Hyperlinking in Broadcast Media 探索广播媒体中的视频超链接

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814647

Maria Eskevich, Quoc-Minh Bui, Hoang-An Le, B. Huet

引用次数: 1

SAIVT-BNEWS: An Australian Broadcast News Video Dataset for Entity Extraction, and More 用于实体提取的澳大利亚广播新闻视频数据集，以及更多

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814653

David Dean

{"title":"SAIVT-BNEWS: An Australian Broadcast News Video Dataset for Entity Extraction, and More","authors":"David Dean","doi":"10.1145/2802558.2814653","DOIUrl":"https://doi.org/10.1145/2802558.2814653","url":null,"abstract":"Recently QUT have released a set of annotated broadcast news videos (SAIVT-BNEWS) that we have made available at our website (https://www.qut.edu.au/research/saivt). This presentation will outline the dataset itself, covering 50 or so short news clips surrounding a single political event with many entities appearing in multuple records, and cover interesting research that QUT has, is currently, and is interested in performing on this dataset in the future. This presentation will cover existing published research, including image processing tasks like face detection and clustering; and speech processing tasks (including the use of visual speech) like speech detection, speaker recognition, and speaker diarisation. We have also started very interesting research on fusing multiple sources of information, including metadata, OCR, faces, speech, and scene detection to improve the performance of many techniques, but with a focus on improving the automatic extraction of entities (people, places, companies and organisations) from large volumes of audio-visual data, and this will also be addressed in this talk. As this dataset is publicly available for free to all researchers, QUT hopes that other researchers will make use of, and improve upon this dataset as well.","PeriodicalId":115369,"journal":{"name":"Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127864010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation Data and Benchmarks for Cascaded Speech Recognition and Entity Extraction 级联语音识别和实体提取的评估数据和基准

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814646

Liyuan Zhou, H. Suominen, L. Hanlen

{"title":"Evaluation Data and Benchmarks for Cascaded Speech Recognition and Entity Extraction","authors":"Liyuan Zhou, H. Suominen, L. Hanlen","doi":"10.1145/2802558.2814646","DOIUrl":"https://doi.org/10.1145/2802558.2814646","url":null,"abstract":"During clinical handover, clinicians exchange information about the patients and the state of clinical management. To improve care safety and quality, both handover and its documentation have been standardized. Speech recognition and entity extraction provide a way to help health service providers to follow these standards by implementing the handover process as a structured form, whose headings guide the handover narrative, and the documentation process as proofing and sign-off of the automatically filled-out form. In this paper, we evaluate such systems. The form considers the sections of Handover nurse, Patient introduction, My shift, Medication, Appointments, and Future care, divided in 49 mutually exclusive headings to fill out with speech recognized and extracted entities. Our system correctly recognizes 10,244 out of 14,095 spoken words and regardless of 6,692 erroneous words, its error percentage is significantly smaller than for systems submitted to the CLEF eHealth Evaluation Lab 2015. In the extraction of 35 entities with training data (i.e., 14 headings were not present in the 101 expert-annotated training documents with 8,487 words in total), the system correctly extracts 2,375 out of 3,793 words in 50 test documents after calibration on 3,937 words in 50 validation documents. This translates to over 90% F1 in extracting information for the patient's age, current bed, current room, and given name and over 70% F1 for patient's admission reason/diagnosis and last name. F1 for filtering out irrelevant information is 78%. We have made the data publicly available for 201 handover cases together with processing results and code and proposed the extraction task for CLEF eHealth 2016.","PeriodicalId":115369,"journal":{"name":"Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia","volume":"127 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123574156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia 第三届多媒体演讲、语言与音频研讨会论文集

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558

G. Gravier, M. Larson, G. Jones, R. Ordelman

{"title":"Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia","authors":"G. Gravier, M. Larson, G. Jones, R. Ordelman","doi":"10.1145/2802558","DOIUrl":"https://doi.org/10.1145/2802558","url":null,"abstract":"Welcome to SLAM 2015 in Brisbane, Australia! \u0000 \u0000SLAM 2015 is the third edition of the series of SLAM workshops, with worldwide leading protagonists in the field of speech, language and audio processing applied to multimedia material or in a multimedia context. From the very beginning, the workshop is steered and patronized by the Special Interest Group of the International Speech Communication Association on Speech and Language in Multimedia. This year's edition follows this tradition. \u0000 \u0000SLAM is by nature interdisciplinary, existing at the intersection of multiple scientific communities: music and audio processing, speech processing, natural language processing and, of course, multimedia. After collocating the first two editions of SLAM with Interspeech, the premier international conference in the field of speech communication, we're very proud to hold SLAM 2015 with ACM Multimedia. This is in logical continuation from the preceding editions and reflects the fact that the focus of SLAM goes far beyond speech processing to genuinely account for the multiple facets of multimedia. Our long-term goal is to establish SLAM as a regular workshop, alternating between major speech and language conferences and major multimedia conferences, as a bridge between these domains. This year's edition is a first step in this direction and we are very grateful to ACM Multimedia General and Workshop chairs for their support in the development of SLAM in spite of possible interferences with the main conference. \u0000 \u0000The program in 2015 covers a wide range of problems related to SLAM topics, with contributions related to music, speech, language but also computer vision. To emphasize the links between audio, speech, language and multimedia, the workshop features a special session on video hyperlinking, as recently introduced in international benchmark initiatives such as MediaEval or TRECVid. The multimodal nature of the video hyperlinking task makes it an emblematic case study where the speech and language modalities are perfectly complemented by audio and vision. The session gathers contributions where audio and natural language processing are used for video hyperlinking, possibly in conjunction with image processing and computer vision. A panel discussion focused on discussing the past, present and future of hyperlinking will conclude the workshop. This panel will aim at an understanding of which approaches are most promising and how they can be evaluated. The goal is to shape research directions at the crossroad of the scientific communities involved in SLAM and to nurture future implementations of video hyperlinking benchmarks.","PeriodicalId":115369,"journal":{"name":"Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133174847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Acoustic Adaptation in Cross Database Audio Visual SHMM Training for Phonetic Spoken Term Detection 语音口语词检测跨库视听SHMM训练中的声学适应

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814648

Shahram Kalantari, David Dean, S. Sridharan, H. Ghaemmaghami, C. Fookes

引用次数: 0

Audio Information for Hyperlinking of TV Content 用于电视内容超链接的音频信息

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814643

P. Galuscáková, Pavel Pecina

引用次数: 4

Convenient Discovery of Archived Video Using Audiovisual Hyperlinking 利用视听超链接方便地发现存档视频

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814652

R. Ordelman, Robin Aly, Maria Eskevich, B. Huet, G. Jones

引用次数: 4

Predicting Music Popularity Patterns based on Musical Complexity and Early Stage Popularity 基于音乐复杂性和早期流行度的音乐流行模式预测

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814645

Junghyuk Lee, Jong-Seok Lee

引用次数: 15

Hierarchical Topic Models for Language-based Video Hyperlinking 基于语言的视频超链接的分层主题模型

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814642

A. Simon, R. Bois, G. Gravier, P. Sébillot, E. Morin, Marie-Francine Moens

引用次数: 5

Score Propagation Based on Similarity Shot Graph for Improving Visual Object Retrieval 基于相似镜头图的分数传播改进视觉对象检索

Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia Pub Date : 2015-10-30 DOI: 10.1145/2802558.2814644

J. M. Barrios, J. M. Saavedra

引用次数: 1