Sai Bharath Chandra Gutha, M. Shaik, Tejas Udayakumar, Ajit Ashok Saunshikhar
{"title":"Improved Feed Forward Attention Mechanism in Bidirectional Recurrent Neural Networks for Robust Sequence Classification","authors":"Sai Bharath Chandra Gutha, M. Shaik, Tejas Udayakumar, Ajit Ashok Saunshikhar","doi":"10.1109/SPCOM50965.2020.9179606","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179606","url":null,"abstract":"Feed Forward Attention (FFA) in Recurrent Neural Networks (RNNs) is a popular attention mechanism to classify sequential data. In Bidirectional RNNs (BiRNNs), FFA concatenates hidden states from forward and backward layers to compute unscaled logits and normalized attention weights at each time step and softmax is applied to the weighted sum of logits to compute posterior probabilities. Such concatenation corresponds to the addition of individual unnormalized attention weights and unscaled logits from forward and backward layers. In this paper, we present a novel attention mechanism called the Improved Feed Forward Attention Mechanism (IFFA), that computes the probabilities and normalized attention weights separately for forward and backward layers without concatenating the hidden states. Finally, weighted probabilities are computed at each time step and averaged across time. Our experimental results show IFFA outperforming FFA in diverse classification tasks such as speech accent, emotion and whisper classification.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering tendency assessment for datasets having inter-cluster density variations","authors":"Dheeraj Kumar, J. Bezdek","doi":"10.1109/SPCOM50965.2020.9179608","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179608","url":null,"abstract":"Clustering tendency assessment, i.e., determining if a dataset has any inherent clusters, and if so, how many clusters, k to seek is a crucial pre-clustering task. The visual assessment of tendency (VAT) and improved visual assessment of tendency (iVAT) algorithms provide a visual way to assess cluster tendency of a dataset by reordering the pairwise dissimilarity matrix so that potential clusters are displayed as dark blocks along the diagonal in the image of the reordered dissimilarity matrix. VAT and iVAT, being distance-based schemes, fail to perform well for datasets consisting of clusters characterized by different density levels. In this paper, we introduce two new members of the VAT family of algorithms: Locally Scaled VAT (LSVAT) and Locally Scaled iVAT (LS-iVAT), which produces better iVAT images for data having inter-cluster density variations. Numerical experiments comparing the proposed novel approach with baseline VAT/iVAT as well as spectral clustering and density-based clustering algorithms establish that LS-VAT and LS-iVAT are superior to the comparable algorithms in terms of clustering quality.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114672661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. V. Kumar, R. Sundar, Tilak Purohit, V. Ramasubramanian
{"title":"End-to-end audio-scene classification from raw audio: Multi time-frequency resolution CNN architecture for efficient representation learning","authors":"T. V. Kumar, R. Sundar, Tilak Purohit, V. Ramasubramanian","doi":"10.1109/SPCOM50965.2020.9179600","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179600","url":null,"abstract":"We propose and study a novel multi-temporal CNN architecture for end-to-end ‘audio-scene classification’ (ASC) from raw audio signal. Conventional CNNs use a fixed size kernel (whether for image or 1-d signal classification) which corresponds to applying a filter bank, where each filter has a fixed time-frequency resolution (i.e., fixed duration impulse response and a fixed band-width frequency response), importantly with a specific time-frequency trade-off. In contrast, in a way to allow for multiple time-frequency resolutions, we use a multi-temporal CNN architecture having multiple kernel branches (up to 12 branches) each of different lengths, thereby allowing for multiple filter banks with different time-frequency resolution to process the input raw audio signal and create feature-maps (e.g. ranging from very narrow-band to very wide-band spectrographic maps in steps of fine time-frequency resolutions) corresponding to different time-frequency trade-offs. Applying this architecture to end-to-end audio-scene classification is shown to offer consistent and significant performance enhancements (e.g. 11-15% absolute in accuracy for the multi-temporal case of 12 branches) over the conventional single-temporal CNN and also outperform state-of the-art results for this task.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124422531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Social Signals Using Deep LSTM-based Recurrent Neural Networks","authors":"Himanshu Joshi, Ananya Verma, Amrita Mishra","doi":"10.1109/SPCOM50965.2020.9179516","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179516","url":null,"abstract":"Non-linguistic speech cues aid expression of various emotions in human communication. In this work, we demonstrate the application of deep long short-term memory (LSTM) recurrent neural networks for frame-wise detection and classification of laughter and filler vocalizations in speech data. Further, we propose a novel approach to perform classification by incorporating cluster information as an additional feature wherein the clusters in the dataset are extracted via a k-means clustering algorithm. Extensive simulation results demonstrate that the proposed approach achieves significant improvement over the conventional LSTM-based classification methods. Also, the performance of deep LSTM models obtained by stacking LSTMs, is studied. Lastly, for classification of the temporally correlated speech data considered in this work, a comparison with popular machine learning-based techniques validates the superiority of the proposed LSTM-based scheme.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114594738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying Cultural Music using Melodic Features","authors":"Amruta Vidwans, Prateek Verma, P. Rao","doi":"10.1109/SPCOM50965.2020.9179597","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179597","url":null,"abstract":"We present melody based classification of musical styles by exploiting pitch and energy based characteristics computed on the audio signal. Three prominent musical styles were chosen which have improvisation as an integral part with similar melodic principles, theme, and structure of concerts namely, Hindustani, Carnatic and Turkish music. Listeners of one or more of these genres can discriminate these entirely based on the melodic style. The resynthesized melody of music pieces that share the underlying raga/makam, removing any singer cues, was used to validate our hypothesis that style distinction is embedded in the melody. Our automatic method is based on finding a set of highly discriminatory features, motivated by musicological knowledge, to capture distinct characteristics of the melodic contour. The nature of transitions in the pitch contour, presence of microtonal notes and the dynamic variations in the vocal energy are exploited. The automatically classified style labels are found to correlate well with the judgments of human listeners. The melody based features when combined with timbre based features, were found to improve the classification performance on the music metadata based genre labels.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114638986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low Complexity Detector with Near-ML Performance for Generalized Differential Spatial Modulation","authors":"Deepak Jose, S. Sameer","doi":"10.1109/SPCOM50965.2020.9179552","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179552","url":null,"abstract":"Detection of most differential modulation schemes involves the past and present received symbols only. Differential spatial modulation (DSM) is a scheme where an extra degree of freedom for transmitting information is available in the form of transmit antennas. A variant of this scheme called generalized differential scheme for spatial modulation (GD-SM) has a power allocation strategy to improve the error performance as compared to DSM. But the maximum likelihood (ML) detector for this scheme is computationally intensive for higher order modulation. We propose a low complexity detection strategy that makes use of the correlation between the channel coefficients during successive time slots to detect the activated antenna followed by the decoding of the M-ary phase shift keying (MPSK) symbol transmitted through that antenna. The proposed detector achieves a tremendous reduction in complexity close to 83% compared to the ML detector, but with a negligible penalty in error performance.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122317618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-target hybrid CTC-Attentional Decoder for joint phoneme-grapheme recognition","authors":"Shreekantha Nadig, V. Ramasubramanian, Sachit Rao","doi":"10.1109/SPCOM50965.2020.9179603","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179603","url":null,"abstract":"In traditional Automatic Speech Recognition (ASR) systems, such as HMM-based architectures, words are predicted using either phonemes or graphemes as sub-word units. In this paper, we explore such joint phoneme-grapheme decoding using an Encoder-Decoder network with hybrid Connectionist Temporal Classification (CTC) and Attention mechanism. The Encoder network is shared between two Attentional Decoders which individually learn to predict phonemes and graphemes from a unique Encoder representation. This Encoder and multi-decoder network is trained in a multi-task setting to minimize the prediction error for both phoneme and grapheme sequences. We also implement the phoneme decoder at an intermediate layer of Encoder and demonstrate performance benefits to such an architecture. By carrying out various experiments on different architectural choices, we demonstrate, using the TIMIT and Librispeech 100 hours datasets, that with this approach, an improvement in performance than the baseline independent phoneme and grapheme recognition systems can be achieved.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115459723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Luminance Channel Based Camera Model Identification","authors":"Nayan Moni Baishya, P. Bora","doi":"10.1109/SPCOM50965.2020.9179564","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179564","url":null,"abstract":"Camera model identification is an active research problem because of its importance in investigating the source and the authenticity of an image. Traditional camera model identification methods are based on strategies to extract the low-level traces left by the image acquisition pipeline of a camera on an image. One such intrinsic and camera-specific trace is the sensor pattern noise (SPN). The SPN is roughly approximated from the noise-residual obtained by performing high-pass filtering on an image. The noise-residual of an image also contains information about other types of noises. The extraction of the noise-residuals is generally performed on a single primary color channel, like the green channel of an image. However, the performance of a channel in the YCbCr color space is never explored. In this paper, we have proposed a novel camera model identification method based on convolutional neural network, where the noise-residuals are extracted from the luminance (Y) channel of the images. A constrained convolutional layer learns data-driven high-pass filters to extract the noise-residuals and the following layers learn a feature representation for the classification task. We have conducted experiments with multiple class combinations from the Dresden image database. The experimental results show the effectiveness of the Y channel for camera model identification both in terms of classification accuracy and convergence of the network.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116144963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SPCOM 2020 Contents","authors":"","doi":"10.1109/spcom50965.2020.9179531","DOIUrl":"https://doi.org/10.1109/spcom50965.2020.9179531","url":null,"abstract":"","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115094572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, M. SrinivasaRaghavanK., Dhanya Eledath, V. Ramasubramanian
{"title":"Semi-supervised learning for acoustic model retraining: Handling speech data with noisy transcript","authors":"Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, M. SrinivasaRaghavanK., Dhanya Eledath, V. Ramasubramanian","doi":"10.1109/SPCOM50965.2020.9179517","DOIUrl":"https://doi.org/10.1109/SPCOM50965.2020.9179517","url":null,"abstract":"We address the problem of retraining a seed acoustic model from a large corpus which is associated with noisy labeling. We propose a forced-alignment likelihood and fuzzy string matching score based iterative selection of the corpus data to retrain the acoustic model in an order of increasing degree of noise in the transcript, yielding a succession of enhanced acoustic models, offering progressively lower error rates on an held-out test data. We show results in terms of PER (phoneme-error-rate) on a large broadcast news data from a national broadcast network containing multiple languages of transcribed-speech, demonstrating the strong utility of such an approach for training of acoustic models from noisy-transcript.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"7 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120847663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}