Karnati Sai Shashank, N. P. Prasad, K. S. Reddy, L. Rao
{"title":"Upload Cricket Match Video to Generate Audio Commentary by YOLOv8 and Transformer","authors":"Karnati Sai Shashank, N. P. Prasad, K. S. Reddy, L. Rao","doi":"10.1109/ICSCSS57650.2023.10169522","DOIUrl":null,"url":null,"abstract":"The main purpose is to post cricket videos and create audio commentary. Make cricket video automatically generate audio commentary. The YOLOv8 model is used to extract the features from the image and is followed by Transformer-LSTM network to generate the response as text, which is then converted to audio. The proposed model serves variable length input data and consecutive outputs. In addition, the model can use timing information for predict the pitch and the length of the bowler's delivery and the batsman's shot selection, and the outcome of the ball. However, there is no standard data to perform those tasks. So, this study performs data collection to classification.","PeriodicalId":217957,"journal":{"name":"2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSCSS57650.2023.10169522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The main purpose is to post cricket videos and create audio commentary. Make cricket video automatically generate audio commentary. The YOLOv8 model is used to extract the features from the image and is followed by Transformer-LSTM network to generate the response as text, which is then converted to audio. The proposed model serves variable length input data and consecutive outputs. In addition, the model can use timing information for predict the pitch and the length of the bowler's delivery and the batsman's shot selection, and the outcome of the ball. However, there is no standard data to perform those tasks. So, this study performs data collection to classification.