{"title":"Çok Dilli Sesten Metne Çeviri Modelinin İnce Ayar Yapılarak Türkçe Dilindeki Başarısının Arttırılması Increasing Performance in Turkish by Finetuning of Multilingual Speech-to-Text Model","authors":"Ö. Mercan, Umut Özdil, Sükrü Ozan","doi":"10.1109/SIU55565.2022.9864728","DOIUrl":null,"url":null,"abstract":"This study was carried out with the aim of automatically translating phone calls between customers and customer representatives of a company. The dataset used in the study was created with audio files that were taken from open source platforms and reading of short texts in various contents by the company personnel. In addition to the labbeled data, approximately 28 thousand unlabeled data were labelled, and a total of 37534 audio data were prepared to be used in the training of the model that will translate from speech to text. The Wav2Vec2-XLSR-53 model which is a pre-trained model trained in 53 languages was fine-tuned with the our Turkish dataset. It has been obtained that it gives successful results in the speech to text performed on the data that is not used in model training and validation. The model was shared as open source on HugginFace to be used and tested for similar speech to text translation problems.","PeriodicalId":115446,"journal":{"name":"2022 30th Signal Processing and Communications Applications Conference (SIU)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU55565.2022.9864728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Çok Dilli Sesten Metne Çeviri Modelinin İnce Ayar Yapılarak Türkçe Dilindeki Başarısının Arttırılması Increasing Performance in Turkish by Finetuning of Multilingual Speech-to-Text Model
This study was carried out with the aim of automatically translating phone calls between customers and customer representatives of a company. The dataset used in the study was created with audio files that were taken from open source platforms and reading of short texts in various contents by the company personnel. In addition to the labbeled data, approximately 28 thousand unlabeled data were labelled, and a total of 37534 audio data were prepared to be used in the training of the model that will translate from speech to text. The Wav2Vec2-XLSR-53 model which is a pre-trained model trained in 53 languages was fine-tuned with the our Turkish dataset. It has been obtained that it gives successful results in the speech to text performed on the data that is not used in model training and validation. The model was shared as open source on HugginFace to be used and tested for similar speech to text translation problems.