S. Sharan, A. Dev, Poonam Bansal, Shweta A. Bansal, S. Agrawal
{"title":"Sphinx-Based Evaluation of Efficient Acoustic Modeling Parameters for LibriSpeech Corpus","authors":"S. Sharan, A. Dev, Poonam Bansal, Shweta A. Bansal, S. Agrawal","doi":"10.1109/AIST55798.2022.10064750","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064750","url":null,"abstract":"In this paper we are assessing the efficient parameters i.e., the number of senones and number of gaussian densities for a well-known audiobook corpus \"LibriSpeech\" based Automatic Speech Recognition System (ASR) using the open-source tool Sphinx. Sphinx is a Hidden Markov Model (HMM) based offline large vocabulary language and speaker independent continuous ASR system with a support for low-resource handheld devices. We have trained the acoustic model by varying the parameters and examined the quality of the models using Word Error Rate (WER). The best achieved WER of the model is observed as 9.5% with 2000 senones and 64 gaussian distributions.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126834060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Abu Hanif, Harpreet Kaur, Manik Rakhra, Ashutosh Kumar Singh
{"title":"Role of CBIR In a Different fields-An Empirical Review","authors":"Md Abu Hanif, Harpreet Kaur, Manik Rakhra, Ashutosh Kumar Singh","doi":"10.1109/AIST55798.2022.10064825","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064825","url":null,"abstract":"According to its many applications in remote sensing, agriculture, healthcare, e-commerce, artificial intelligence (AI), and machine learning (ML), as well as other fields, Content Based Image Retrieval (CBIR) continues to be a popular research area. It is frequently used to search through a sizable image library and obtain images that are comparable to the query image in a significant way (QI). Indeed, a crucial part of the CBIR model is the principal dimensionality reduction technique, which seeks to collect both high- and low-level characteristics. Caused of the growing necessity of searching clinic images for diagnostic applications, and image archiving, in addition to networks of communication, the medical sector is expanding CBMIR in addition to standard computer vision (PACS). Recent developments in deep learning (DL) models allow for the efficient building of CBIR models across all industries. The medical profession is expanding retrieval of medical images depending on their content (CBMIR) in addition to generic computer vision to successfully search hospital PACS. In the past few decades, productivity in the agriculture sector has decreased. An increase in plant diseases was discovered to be the biggest factor. This research describes the Content-Based Image Retrieval (CBIR) methodology, which is used for the identification and categorization of agricultural, medical, artificial intelligence, and machine-learning objects. Here, how to use CBIR in all industries will be demonstrated.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131993241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on ASR Systems for Dysarthric Speech","authors":"K. Bharti, P. Das","doi":"10.1109/AIST55798.2022.10065162","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065162","url":null,"abstract":"Recently Automatic Speech Recognition (ASR) has been widely overblown with many applications and assistance but orally challenged people, such as people with disordered speech, can’t get much benefits. Speech technologies are very useful on a daily basis to assist people with speech disorders. Dysarthria is a neurological speech disorder caused by significant injury in the left hemisphere of the brain. Dysarthric people have difficulty in the movement of speech-related muscles. As a result of strain on their speech muscles, individuals with dysarthria are able to generate limited speech data for analysis.In order to recognize speech of dysarthria sufferers, a robust technique is needed that can cope with extreme irregularity and narrow training data. This survey details a brief understanding of dysarthric speech characteristics and behavior. It also presents several attempts that have been made to make robust ASR systems for dysarthric speech.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132560728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Deep Learning Approaches for Detection of Brain Tumours using MRI","authors":"Samriddha Sinha, Amar Saraswat, Shweta A. Bansal","doi":"10.1109/AIST55798.2022.10064794","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064794","url":null,"abstract":"A significant health problem that can be fatal is a brain tumour, if it not detected and cured at the right time. Therefore, early tumour detection is essential for arranging therapy as soon as possible. One of the most important factors in neurosurgery is the identification of brain tumour boundaries. Among the most serious reasons for death in humans is a brain tumour, which is an abnormal development of brain cells. A technique for detecting brain tumours can identify early-stage tumours. Magnetic Resonance Imaging (MRI) segmentation of brain tumours is the field's dominant research topic these days. Finding the precise dimensions and position of brain tumour monitoring is a very helpful procedure. These Content-based Image Retrieval (CBIR) techniques are now widely used in the automatic diagnosis of disease using MR imaging, mammography, and other sources. This gap can be addressed utilising the deep learning feature extraction technique and the innovative edge detection method, bringing accuracy noticeably closer to the manual results of a human evaluator as part of the goal of sustainable development through innovation. This paper provides the in-depth survey of the several techniques used by many researchers and concludes that the best strategy to identify the region of interest is Fuzzy C-Mean Algorithm among possible automated segmented techniques.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132262253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention based Multi Modal Learning for Audio Visual Speech Recognition","authors":"L. Kumar, D. Renuka, S. Rose, M.C. Shunmugapriya","doi":"10.1109/AIST55798.2022.10065019","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065019","url":null,"abstract":"In recent years, multimodal fusion using deep learning has proliferated in various tasks such as emotion recognition, and speech recognition by drastically enhancing the performance of the overall system. However, the existing unimodal audio speech recognition system has various challenges in handling ambient noise, and varied pronunciations, and is inaccessible to hearing-impaired people. To address these limitations in audio-based speech recognizers, this paper exploits an idea of an intermediary level fusion framework using multimodal information from audio as well as visual movements. We analyzed the performance of the transformer-based audio-visual model for noisy audio. We accessed the model across two benchmark datasets namely LRS2 and Grid. Overall, we identified that multimodal learning for speech offers a better WER compared to other baseline systems.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128663423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognition Of Handwritten English Character Using Convolutional Neural Network","authors":"Sapna Katoch, Manik Rakhra, Dalwinder Singh","doi":"10.1109/AIST55798.2022.10064860","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064860","url":null,"abstract":"In the domain of computer vision and image processing, one of the most active and difficult study fields is handwritten character recognition. It may be used as a reading tool for bank checks, for identifying characters on forms, and for a slew of other purposes. The optical character recognition of the papers is similar to documents produced by hand by a human. This OCR is put to use to improve the simplification of the process of character translation, which may be obtained from a broad range of file types, such as image and word document files. Researchers have made tremendous progress in HCR by making use of vast amounts of raw data and new breakthroughs in Deep Learning and Machine Learning algorithms. The fundamental purpose of this research paper is to give a solution for several techniques of handwriting recognition. These methods include the usage of touch input through a mobile screen as well as the use of an image file. CNN is used to identify characters in a test dataset in this work. Work on CNNs' capacity to detect characters from a picture dataset and their accuracy of recognition will be examined. Characters are recognized by CNN by comparing and contrasting their shapes and distinguishing characteristics. The dataset A_Z Handwritten was used to test our CNN implementation's handwriting accuracy and model gives the 100% result to recognize the character.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133247803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Manipuri Tonal Contrast Disambiguation Using Acoustic Features","authors":"Thiyam Susma Devi, P. Das","doi":"10.1109/AIST55798.2022.10065089","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065089","url":null,"abstract":"Manipuri is a low resource tonal language of the Tibeto-Burman language family. Preliminary studies confirm that there are two tones in the Manipuri language: Level tone and Falling tone. For such tonal languages, features that characterize the tone distinctly are essential for developing a robust speech recognition systems. The existing tone-based methods have not studied or analyzed Manipuri tones in this context. Therefore, in this work, we carried out an acoustic feature analysis of the Manipuri speech samples. Firstly, we extend the existing ManiTo dataset containing 3000 samples of isolated Manipuri tonal contrast word by including additional 3000 samples. Secondly, the proposed work extracts ten selected features from each utterance present in the given speech samples. These features are further analyzed for their ability to distinguish the above two mentioned tones. The results validate that our selected features can efficiently differentiate the tones in the Manipuri language.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122000953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nur Alifah Megat Abd Mana, Lim Chee Chin, H. Yazid, C. Y. Fook
{"title":"Evaluation of Contact Lens Data Acquisition Approaches using Enhancement Techniques","authors":"Nur Alifah Megat Abd Mana, Lim Chee Chin, H. Yazid, C. Y. Fook","doi":"10.1109/AIST55798.2022.10065211","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065211","url":null,"abstract":"Contact lenses can be helpful to improve the quality of human life. The inspection process plays a big role to produce good quality contact lens products. However, there is a challenge to detecting the defects in contact lenses during the production line. The transparent type of silicone hydrogel contact lens is one of the most difficult to detect the defects inside it. The primary purpose of this paper is to examine the differences in quality images between four different data acquisition approaches based on two image enhancement techniques, Gaussian blurring and Contrast Limited Adaptive Histogram Equalization (CLAHE). Acquiring a clear and good-quality image, required a specific experimental setup which consists of a high-resolution camera lens and also the right position of the camera stand and camera angle. Based on performance metrics for both enhancement techniques, Approach 2 showed better performance compared to other approaches when the result from Gaussian blurring showed the highest value of PSNR (29.02321), lowest values of MSE (81.42533), and lowest value AMBE (-0.55510). While for the CLAHE method, the result showed the highest value of PSNR (28.50377), the lowest value of MSE (91.77044), and the lowest value of AMBE (-0.05532). This proves that approach 2 provides a better quality image due to less noise.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"431 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123440700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformed Deep Spatio Temporal-Features with Fused Distance for Efficient Video Retrieval","authors":"A. Banerjee, Ela Kumar, Ravinder M","doi":"10.1109/AIST55798.2022.10064821","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064821","url":null,"abstract":"For the goal of video retrieval, this research proposes wavelet transformations on deep spatiotemporal characteristics. The component-wise similarities between the query video feature and prototype video feature are calculated because level 1 wavelets extract two components from any signal or feature vector. The ultimate dissimilarity for determining the top 1 and top 5 accuracy is created by fusing these differences. The outcomes demonstrate that the suggested technique performs better than a baseline strategy. The following strategy for improvement can be investigated further by employing fast learning networks that are trained on the training sets of both data sets to provide better classification of the query as well as the prototype feature vectors, which would enhance the retrieval accuracy.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123899578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Basic design for the implementation of automatic surveillance system on helmet detection","authors":"Mogalraj Kushal Dath, Manik Rakhra, Dalwinder Singh, Ashutosh Kumar Singh, Rajesh Banala","doi":"10.1109/AIST55798.2022.10065367","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065367","url":null,"abstract":"Deep learning has lately received acclaim for its success in certain fields, such as digital image pattern recognition and feature extraction. These methods have been used by researchers to solve a variety of issues, such as the detection of traffic violations specifically for motorcycle riders who are not wearing helmets in video surveillance. In this paper, we proposed a basic implementation and design steps of detecting two-wheeler bike rider wearing helmet utilizing a compatible and faster deep learning approach known as Single Shot Detector (SSD) in Linux operating system. We used and created a customized dataset of images by taking screenshots of a surveillance video of CCTV from a legal source. The traffic police can use this system to monitor the vehicles passing through specific surveillance nodes. After further implementation, number plates of vehicles could automatically be logged in a database which may be helpful in narrowing down options during a crime investigation.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129714764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}