{"title":"Development of a Filipino Speaker Diarization in Meeting Room Conversations","authors":"Angelica H. De La Cruz, Rodolfo C. Raga","doi":"10.1109/IALP48816.2019.9037733","DOIUrl":null,"url":null,"abstract":"Speaker diarization pertains to the process of determining speaker identity at a given time in an audio stream. It was first used for speech recognition and over time became useful in other applications such as video captioning and speech transcription. Recently, deep learning techniques have been applied to speaker diarization with considerable success, however, deep learning are conventionally data intensive and collecting large training samples can be difficult and expensive to collect especially for resource scarce languages. This study focused on investigating a speaker diarization approach for meeting room conversations in the Filipino language. To compensate for lack of resources, a one shot learning strategy was explored using Siamese neural network. Among the experiments conducted, the lowest diarization error rate yielded to 46%. There are, however, more parameters that can be tuned to improve the diarization results. To the best of our knowledge, no work in speaker diarization dedicated for Filipino language has yet been done.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"162 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP48816.2019.9037733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speaker diarization pertains to the process of determining speaker identity at a given time in an audio stream. It was first used for speech recognition and over time became useful in other applications such as video captioning and speech transcription. Recently, deep learning techniques have been applied to speaker diarization with considerable success, however, deep learning are conventionally data intensive and collecting large training samples can be difficult and expensive to collect especially for resource scarce languages. This study focused on investigating a speaker diarization approach for meeting room conversations in the Filipino language. To compensate for lack of resources, a one shot learning strategy was explored using Siamese neural network. Among the experiments conducted, the lowest diarization error rate yielded to 46%. There are, however, more parameters that can be tuned to improve the diarization results. To the best of our knowledge, no work in speaker diarization dedicated for Filipino language has yet been done.