2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)最新文献

筛选
英文 中文
Sphinx-Based Evaluation of Efficient Acoustic Modeling Parameters for LibriSpeech Corpus 基于sphinx的librisspeech语料库高效声学建模参数评价
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10064750
S. Sharan, A. Dev, Poonam Bansal, Shweta A. Bansal, S. Agrawal
{"title":"Sphinx-Based Evaluation of Efficient Acoustic Modeling Parameters for LibriSpeech Corpus","authors":"S. Sharan, A. Dev, Poonam Bansal, Shweta A. Bansal, S. Agrawal","doi":"10.1109/AIST55798.2022.10064750","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064750","url":null,"abstract":"In this paper we are assessing the efficient parameters i.e., the number of senones and number of gaussian densities for a well-known audiobook corpus \"LibriSpeech\" based Automatic Speech Recognition System (ASR) using the open-source tool Sphinx. Sphinx is a Hidden Markov Model (HMM) based offline large vocabulary language and speaker independent continuous ASR system with a support for low-resource handheld devices. We have trained the acoustic model by varying the parameters and examined the quality of the models using Word Error Rate (WER). The best achieved WER of the model is observed as 9.5% with 2000 senones and 64 gaussian distributions.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126834060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Role of CBIR In a Different fields-An Empirical Review cir在不同领域的作用——一个实证回顾
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10064825
Md Abu Hanif, Harpreet Kaur, Manik Rakhra, Ashutosh Kumar Singh
{"title":"Role of CBIR In a Different fields-An Empirical Review","authors":"Md Abu Hanif, Harpreet Kaur, Manik Rakhra, Ashutosh Kumar Singh","doi":"10.1109/AIST55798.2022.10064825","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064825","url":null,"abstract":"According to its many applications in remote sensing, agriculture, healthcare, e-commerce, artificial intelligence (AI), and machine learning (ML), as well as other fields, Content Based Image Retrieval (CBIR) continues to be a popular research area. It is frequently used to search through a sizable image library and obtain images that are comparable to the query image in a significant way (QI). Indeed, a crucial part of the CBIR model is the principal dimensionality reduction technique, which seeks to collect both high- and low-level characteristics. Caused of the growing necessity of searching clinic images for diagnostic applications, and image archiving, in addition to networks of communication, the medical sector is expanding CBMIR in addition to standard computer vision (PACS). Recent developments in deep learning (DL) models allow for the efficient building of CBIR models across all industries. The medical profession is expanding retrieval of medical images depending on their content (CBMIR) in addition to generic computer vision to successfully search hospital PACS. In the past few decades, productivity in the agriculture sector has decreased. An increase in plant diseases was discovered to be the biggest factor. This research describes the Content-Based Image Retrieval (CBIR) methodology, which is used for the identification and categorization of agricultural, medical, artificial intelligence, and machine-learning objects. Here, how to use CBIR in all industries will be demonstrated.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131993241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Survey on ASR Systems for Dysarthric Speech 困难言语的ASR系统研究进展
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10065162
K. Bharti, P. Das
{"title":"A Survey on ASR Systems for Dysarthric Speech","authors":"K. Bharti, P. Das","doi":"10.1109/AIST55798.2022.10065162","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065162","url":null,"abstract":"Recently Automatic Speech Recognition (ASR) has been widely overblown with many applications and assistance but orally challenged people, such as people with disordered speech, can’t get much benefits. Speech technologies are very useful on a daily basis to assist people with speech disorders. Dysarthria is a neurological speech disorder caused by significant injury in the left hemisphere of the brain. Dysarthric people have difficulty in the movement of speech-related muscles. As a result of strain on their speech muscles, individuals with dysarthria are able to generate limited speech data for analysis.In order to recognize speech of dysarthria sufferers, a robust technique is needed that can cope with extreme irregularity and narrow training data. This survey details a brief understanding of dysarthric speech characteristics and behavior. It also presents several attempts that have been made to make robust ASR systems for dysarthric speech.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132560728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation of Deep Learning Approaches for Detection of Brain Tumours using MRI 利用MRI检测脑肿瘤的深度学习方法评估
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10064794
Samriddha Sinha, Amar Saraswat, Shweta A. Bansal
{"title":"Evaluation of Deep Learning Approaches for Detection of Brain Tumours using MRI","authors":"Samriddha Sinha, Amar Saraswat, Shweta A. Bansal","doi":"10.1109/AIST55798.2022.10064794","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064794","url":null,"abstract":"A significant health problem that can be fatal is a brain tumour, if it not detected and cured at the right time. Therefore, early tumour detection is essential for arranging therapy as soon as possible. One of the most important factors in neurosurgery is the identification of brain tumour boundaries. Among the most serious reasons for death in humans is a brain tumour, which is an abnormal development of brain cells. A technique for detecting brain tumours can identify early-stage tumours. Magnetic Resonance Imaging (MRI) segmentation of brain tumours is the field's dominant research topic these days. Finding the precise dimensions and position of brain tumour monitoring is a very helpful procedure. These Content-based Image Retrieval (CBIR) techniques are now widely used in the automatic diagnosis of disease using MR imaging, mammography, and other sources. This gap can be addressed utilising the deep learning feature extraction technique and the innovative edge detection method, bringing accuracy noticeably closer to the manual results of a human evaluator as part of the goal of sustainable development through innovation. This paper provides the in-depth survey of the several techniques used by many researchers and concludes that the best strategy to identify the region of interest is Fuzzy C-Mean Algorithm among possible automated segmented techniques.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132262253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention based Multi Modal Learning for Audio Visual Speech Recognition 基于注意的多模态学习用于视听语音识别
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10065019
L. Kumar, D. Renuka, S. Rose, M.C. Shunmugapriya
{"title":"Attention based Multi Modal Learning for Audio Visual Speech Recognition","authors":"L. Kumar, D. Renuka, S. Rose, M.C. Shunmugapriya","doi":"10.1109/AIST55798.2022.10065019","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065019","url":null,"abstract":"In recent years, multimodal fusion using deep learning has proliferated in various tasks such as emotion recognition, and speech recognition by drastically enhancing the performance of the overall system. However, the existing unimodal audio speech recognition system has various challenges in handling ambient noise, and varied pronunciations, and is inaccessible to hearing-impaired people. To address these limitations in audio-based speech recognizers, this paper exploits an idea of an intermediary level fusion framework using multimodal information from audio as well as visual movements. We analyzed the performance of the transformer-based audio-visual model for noisy audio. We accessed the model across two benchmark datasets namely LRS2 and Grid. Overall, we identified that multimodal learning for speech offers a better WER compared to other baseline systems.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128663423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition Of Handwritten English Character Using Convolutional Neural Network 基于卷积神经网络的手写体英文字符识别
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10064860
Sapna Katoch, Manik Rakhra, Dalwinder Singh
{"title":"Recognition Of Handwritten English Character Using Convolutional Neural Network","authors":"Sapna Katoch, Manik Rakhra, Dalwinder Singh","doi":"10.1109/AIST55798.2022.10064860","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064860","url":null,"abstract":"In the domain of computer vision and image processing, one of the most active and difficult study fields is handwritten character recognition. It may be used as a reading tool for bank checks, for identifying characters on forms, and for a slew of other purposes. The optical character recognition of the papers is similar to documents produced by hand by a human. This OCR is put to use to improve the simplification of the process of character translation, which may be obtained from a broad range of file types, such as image and word document files. Researchers have made tremendous progress in HCR by making use of vast amounts of raw data and new breakthroughs in Deep Learning and Machine Learning algorithms. The fundamental purpose of this research paper is to give a solution for several techniques of handwriting recognition. These methods include the usage of touch input through a mobile screen as well as the use of an image file. CNN is used to identify characters in a test dataset in this work. Work on CNNs' capacity to detect characters from a picture dataset and their accuracy of recognition will be examined. Characters are recognized by CNN by comparing and contrasting their shapes and distinguishing characteristics. The dataset A_Z Handwritten was used to test our CNN implementation's handwriting accuracy and model gives the 100% result to recognize the character.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133247803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Manipuri Tonal Contrast Disambiguation Using Acoustic Features 基于声学特征的曼尼普尔语音调对比消歧研究
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10065089
Thiyam Susma Devi, P. Das
{"title":"Towards Manipuri Tonal Contrast Disambiguation Using Acoustic Features","authors":"Thiyam Susma Devi, P. Das","doi":"10.1109/AIST55798.2022.10065089","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065089","url":null,"abstract":"Manipuri is a low resource tonal language of the Tibeto-Burman language family. Preliminary studies confirm that there are two tones in the Manipuri language: Level tone and Falling tone. For such tonal languages, features that characterize the tone distinctly are essential for developing a robust speech recognition systems. The existing tone-based methods have not studied or analyzed Manipuri tones in this context. Therefore, in this work, we carried out an acoustic feature analysis of the Manipuri speech samples. Firstly, we extend the existing ManiTo dataset containing 3000 samples of isolated Manipuri tonal contrast word by including additional 3000 samples. Secondly, the proposed work extracts ten selected features from each utterance present in the given speech samples. These features are further analyzed for their ability to distinguish the above two mentioned tones. The results validate that our selected features can efficiently differentiate the tones in the Manipuri language.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122000953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Contact Lens Data Acquisition Approaches using Enhancement Techniques 使用增强技术评估隐形眼镜数据采集方法
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10065211
Nur Alifah Megat Abd Mana, Lim Chee Chin, H. Yazid, C. Y. Fook
{"title":"Evaluation of Contact Lens Data Acquisition Approaches using Enhancement Techniques","authors":"Nur Alifah Megat Abd Mana, Lim Chee Chin, H. Yazid, C. Y. Fook","doi":"10.1109/AIST55798.2022.10065211","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065211","url":null,"abstract":"Contact lenses can be helpful to improve the quality of human life. The inspection process plays a big role to produce good quality contact lens products. However, there is a challenge to detecting the defects in contact lenses during the production line. The transparent type of silicone hydrogel contact lens is one of the most difficult to detect the defects inside it. The primary purpose of this paper is to examine the differences in quality images between four different data acquisition approaches based on two image enhancement techniques, Gaussian blurring and Contrast Limited Adaptive Histogram Equalization (CLAHE). Acquiring a clear and good-quality image, required a specific experimental setup which consists of a high-resolution camera lens and also the right position of the camera stand and camera angle. Based on performance metrics for both enhancement techniques, Approach 2 showed better performance compared to other approaches when the result from Gaussian blurring showed the highest value of PSNR (29.02321), lowest values of MSE (81.42533), and lowest value AMBE (-0.55510). While for the CLAHE method, the result showed the highest value of PSNR (28.50377), the lowest value of MSE (91.77044), and the lowest value of AMBE (-0.05532). This proves that approach 2 provides a better quality image due to less noise.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"431 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123440700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformed Deep Spatio Temporal-Features with Fused Distance for Efficient Video Retrieval 融合深度时空特征的高效视频检索
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10064821
A. Banerjee, Ela Kumar, Ravinder M
{"title":"Transformed Deep Spatio Temporal-Features with Fused Distance for Efficient Video Retrieval","authors":"A. Banerjee, Ela Kumar, Ravinder M","doi":"10.1109/AIST55798.2022.10064821","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10064821","url":null,"abstract":"For the goal of video retrieval, this research proposes wavelet transformations on deep spatiotemporal characteristics. The component-wise similarities between the query video feature and prototype video feature are calculated because level 1 wavelets extract two components from any signal or feature vector. The ultimate dissimilarity for determining the top 1 and top 5 accuracy is created by fusing these differences. The outcomes demonstrate that the suggested technique performs better than a baseline strategy. The following strategy for improvement can be investigated further by employing fast learning networks that are trained on the training sets of both data sets to provide better classification of the query as well as the prototype feature vectors, which would enhance the retrieval accuracy.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123899578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Basic design for the implementation of automatic surveillance system on helmet detection 基本设计实现了自动监控系统对头盔的检测
2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST) Pub Date : 2022-12-09 DOI: 10.1109/AIST55798.2022.10065367
Mogalraj Kushal Dath, Manik Rakhra, Dalwinder Singh, Ashutosh Kumar Singh, Rajesh Banala
{"title":"Basic design for the implementation of automatic surveillance system on helmet detection","authors":"Mogalraj Kushal Dath, Manik Rakhra, Dalwinder Singh, Ashutosh Kumar Singh, Rajesh Banala","doi":"10.1109/AIST55798.2022.10065367","DOIUrl":"https://doi.org/10.1109/AIST55798.2022.10065367","url":null,"abstract":"Deep learning has lately received acclaim for its success in certain fields, such as digital image pattern recognition and feature extraction. These methods have been used by researchers to solve a variety of issues, such as the detection of traffic violations specifically for motorcycle riders who are not wearing helmets in video surveillance. In this paper, we proposed a basic implementation and design steps of detecting two-wheeler bike rider wearing helmet utilizing a compatible and faster deep learning approach known as Single Shot Detector (SSD) in Linux operating system. We used and created a customized dataset of images by taking screenshots of a surveillance video of CCTV from a legal source. The traffic police can use this system to monitor the vehicles passing through specific surveillance nodes. After further implementation, number plates of vehicles could automatically be logged in a database which may be helpful in narrowing down options during a crime investigation.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129714764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信