{"title":"音频合成翻译与自动摘要(ASTA)","authors":"Jivin Varghese, Pakshal Ranawat, Ruvin Rodrigues, Phiroj Shaikh","doi":"10.1109/GCAT55367.2022.9971977","DOIUrl":null,"url":null,"abstract":"Availability of time has been a major issue in recent years for mankind. There has always been a huge demand for automation, since it can tremendously decrease time for doing menial tasks. This proposed project focuses on automation of text translation, summarization and speech synthesis which could reduce time required for reading books. In this paper, we present multiple machine learning models that synthesize text into speech and also into summarized text of Devanagari script. The main objective of the project is to conduct proper examination of the existing architecture of the text translation and summarization methodologies and to provide a robust system which is a cumulation of converting PDF files to audio files and also summarization of the PDF and translating into Devanagari text of the summarized English narrative. The architecture is called Audio Synthesis Translation and Auto-summarization (ASTA) and uses multiple models such as RNN sequence to sequence, NMT, Tacotron 2 and Waveglow. In addition to this, we use Google Vision OCR for text extraction from PDF. This system is an integration of multiple machine learning models and works as a pipelined system.","PeriodicalId":133597,"journal":{"name":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Audio Synthesis Translation and Auto-Summarization (ASTA)\",\"authors\":\"Jivin Varghese, Pakshal Ranawat, Ruvin Rodrigues, Phiroj Shaikh\",\"doi\":\"10.1109/GCAT55367.2022.9971977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Availability of time has been a major issue in recent years for mankind. There has always been a huge demand for automation, since it can tremendously decrease time for doing menial tasks. This proposed project focuses on automation of text translation, summarization and speech synthesis which could reduce time required for reading books. In this paper, we present multiple machine learning models that synthesize text into speech and also into summarized text of Devanagari script. The main objective of the project is to conduct proper examination of the existing architecture of the text translation and summarization methodologies and to provide a robust system which is a cumulation of converting PDF files to audio files and also summarization of the PDF and translating into Devanagari text of the summarized English narrative. The architecture is called Audio Synthesis Translation and Auto-summarization (ASTA) and uses multiple models such as RNN sequence to sequence, NMT, Tacotron 2 and Waveglow. In addition to this, we use Google Vision OCR for text extraction from PDF. This system is an integration of multiple machine learning models and works as a pipelined system.\",\"PeriodicalId\":133597,\"journal\":{\"name\":\"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GCAT55367.2022.9971977\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GCAT55367.2022.9971977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Audio Synthesis Translation and Auto-Summarization (ASTA)
Availability of time has been a major issue in recent years for mankind. There has always been a huge demand for automation, since it can tremendously decrease time for doing menial tasks. This proposed project focuses on automation of text translation, summarization and speech synthesis which could reduce time required for reading books. In this paper, we present multiple machine learning models that synthesize text into speech and also into summarized text of Devanagari script. The main objective of the project is to conduct proper examination of the existing architecture of the text translation and summarization methodologies and to provide a robust system which is a cumulation of converting PDF files to audio files and also summarization of the PDF and translating into Devanagari text of the summarized English narrative. The architecture is called Audio Synthesis Translation and Auto-summarization (ASTA) and uses multiple models such as RNN sequence to sequence, NMT, Tacotron 2 and Waveglow. In addition to this, we use Google Vision OCR for text extraction from PDF. This system is an integration of multiple machine learning models and works as a pipelined system.