Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery最新文献

Video Analysis for Interactive Story Creation: The Sandmännchen Showcase 互动故事创作的视频分析:Sandmännchen展示

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423061

Miggi Zwicklbauer, W. Lamm, Martin Gordon, Konstantinos Apostolidis, Basil Philipp, V. Mezaris

引用次数: 5

Neural Style Transfer Based Voice Mimicking for Personalized Audio Stories 基于神经风格转移的个性化音频故事语音模仿

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423063

Syeda Maryam Fatima, Marina Shehzad, Syed Sami Murtuza, S. S. Raza

{"title":"Neural Style Transfer Based Voice Mimicking for Personalized Audio Stories","authors":"Syeda Maryam Fatima, Marina Shehzad, Syed Sami Murtuza, S. S. Raza","doi":"10.1145/3422839.3423063","DOIUrl":"https://doi.org/10.1145/3422839.3423063","url":null,"abstract":"This paper demonstrates a CNN based neural style transfer on audio dataset to make storytelling a personalized experience by asking users to record a few sentences that are used to mimic their voice. User audios are converted to spectrograms, the style of which is transferred to the spectrogram of a base voice narrating the story. This neural style transfer is similar to the style transfer on images. This approach stands out as it needs a small dataset and therefore, also takes less time to train the model. This project is intended specifically for children who prefer digital interaction and are also increasingly leaving behind the storytelling culture and for working parents who are not able to spend enough time with their children. By using a parent's initial recording to narrate a given story, it is designed to serve as a conjunction between storytelling and screen-time to incorporate children's interest through the implicit ethical themes of the stories, connecting children to their loved ones simultaneously ensuring an innocuous and meaningful learning experience.","PeriodicalId":270338,"journal":{"name":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129940240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

And, Action! Towards Leveraging Multimodal Patterns for Storytelling and Content Analysis ,行动!利用多模态模式进行故事叙述和内容分析

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423060

Natalie Parde

{"title":"And, Action! Towards Leveraging Multimodal Patterns for Storytelling and Content Analysis","authors":"Natalie Parde","doi":"10.1145/3422839.3423060","DOIUrl":"https://doi.org/10.1145/3422839.3423060","url":null,"abstract":"Humans perform intelligent tasks by productively leveraging relevant information from numerous sensory and experiential inputs, and recent scientific and hardware advances have made it increasingly possible for machines to attempt this as well. However, improved resource availability does not automatically give rise to humanlike performance in complex tasks [1]. In this talk, I discuss recent work towards three tasks that benefit from an elegant synthesis of linguistic and visual input: visual storytelling, visual question answering (VQA), and affective content analysis. I focus primarily on visual storytelling, a burgeoning task with the goal of generating coherent, sensible narratives for sequences of input images [2]. I analyze recent work in this area, and then introduce a novel visual storytelling approach that employs a hierarchical context-based network, with a co-attention mechanism that jointly attends to patterns in visual (image) and linguistic (description) input. Following this, I describe ongoing work in VQA, another inherently multimodal task with the goal of producing accurate, sensible answers to questions about images. I explore a formulation in which the VQA model generates unconstrained, free-form text, providing preliminary evidence that harnessing the linguistic patterns latent in language models results in competitive task performance [3]. Finally, I introduce some intriguing new work that investigates the utility of linguistic patterns in a task that is not inherently multimodal: analyzing the affective content of images. I close by suggesting some exciting future directions for each of these tasks as they pertain to multimodal media analysis.","PeriodicalId":270338,"journal":{"name":"Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130408925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Session details: Session 1: Video Analytics and Storytelling 会议详情:会议1:视频分析和讲故事

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3429509

V. Mezaris

引用次数: 0

Session details: Keynote & Invited Talks 会议详情:主题演讲和特邀演讲

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3429508

Raphael Troncy

引用次数: 0

Named Entity Recognition for Spoken Finnish 芬兰语口语的命名实体识别

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423066

Dejan Porjazovski, Juho Leinonen, M. Kurimo

引用次数: 6

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery 第二届人工智能智能电视内容制作、访问和交付国际研讨会论文集

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839

引用次数: 1

Predicting Your Future Audience's Popular Topics to Optimize TV Content Marketing Success 预测未来观众的热门话题，优化电视内容营销的成功

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423062

L. Nixon

引用次数: 6

Session details: Session 2: Video Annotation and Summarization 会话详细信息:会话2:视频注释和总结

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3429510

Jorma T. Laaksonen

引用次数: 0

Avoid Crowding in the Battlefield: Semantic Placement of Social Messages in Entertainment Programs 避免在战场上拥挤:娱乐节目中社会信息的语义放置

Proceedings of the 2nd International Workshop on AI for Smart TV Content Production, Access and Delivery Pub Date : 2020-10-12 DOI: 10.1145/3422839.3423065

Yashaswi Rauthan, Vatsala Singh, Rishabh Agrawal, Satej Kadlay, N. Pedanekar, Shirish S. Karande, Manasi Malik, Iaphi Tariang

引用次数: 1