{"title":"Generalized convolution concept based on DCT","authors":"P. Korohoda, A. Dabrowski","doi":"10.5281/ZENODO.38260","DOIUrl":"https://doi.org/10.5281/ZENODO.38260","url":null,"abstract":"A generalized approach to the so-called product filtering of digital signals valid for a wide class of linear invertible transformations is presented in this paper. Product type of digital filtering consists in multiplication of the transformed signal with some selectivity function in the transform domain and is in this paper interpreted as a generalized convolution process in the primary domain. Our considerations are based on the observation that the block-wise product filtering of digital signals can be performed by means of multiplication of a block of samples of the transformed signal with some function in a domain of any invertible transformation just in the same way as it is usually done in the frequency domain after the Fourier transformation. The only (sufficient) condition for a suitable forward transformation is the existence of the inverse transformation. The presented idea of the generalized product filtering and the generalized convolution has been confronted with a family of the DCT transformations and the Karhunen-Loeve transformation. For the DCT -III the convolution formula has been derived.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128101737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decision time horizon for music genre classification using short time features","authors":"P. Ahrendt, Anders Meng, J. Larsen","doi":"10.5281/ZENODO.38612","DOIUrl":"https://doi.org/10.5281/ZENODO.38612","url":null,"abstract":"In this paper music genre classification has been explored with special emphasis on the decision time horizon and ranking of tapped-delay-line short-time features. Late information fusion as e.g. majority voting is compared with techniques of early information fusion1 such as dynamic PCA (DPCA). The most frequently suggested features in the literature were employed including melfrequency cepstral coefficients (MFCC), linear prediction coefficients (LPC), zero-crossing rate (ZCR), and MPEG-7 features. To rank the importance of the short time features consensus sensitivity analysis is applied. A Gaussian classifier (GC) with full covariance structure and a linear neural network (NN) classifier are used.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121526203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tree crown extraction using marked point processes","authors":"G. Perrin, X. Descombes, J. Zerubia","doi":"10.5281/ZENODO.38194","DOIUrl":"https://doi.org/10.5281/ZENODO.38194","url":null,"abstract":"In this paper we aim at extracting tree crowns from remotely sensed images. Our approach is to consider that these images are some realizations of a marked point process. The first step is to define the geometrical objects that design the trees, and the density of the process. Then, we use a Reversible Jump MCMC1 dynamics and a simulated annealing to get the maximum a posteriori estimator of the tree crown distribution on the image. Transitions of the Markov chain are managed by some specific proposition kernels. Results are shown on aerial images of poplars provided by IFN.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance analysis of the Aurora large vocabulary baseline system","authors":"N. Parihar, J. Picone, D. Pearce, H. Hirsch","doi":"10.5281/ZENODO.38362","DOIUrl":"https://doi.org/10.5281/ZENODO.38362","url":null,"abstract":"In this paper, we present the design and analysis of the baseline recognition system used for ETSI Aurora large vocabulary (ALV) evaluation. The experimental paradigm is presented along with the results from a number of experiments designed to minimize the computational requirements for the system. The ALV baseline system achieved a WER of 14.0% on the standard 5K Wall Street Journal task, and required 4 xRT for training and 15 xRT for decoding (on an 800 MHz Pentium processor). It is shown that increasing the sampling frequency from 8 kHz to 16 kHz improves performance significantly only for the noisy test conditions. Utterance detection resulted in significant improvements only on the noisy conditions for the mismatched training conditions. Use of the DSR standard VQ-based compression algorithm did not result in a significant degradation. The model mismatch and microphone mismatch resulted in a relative increase in WER by 300% and 200%, respectively.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite set DSP, with applications to DNA sequences","authors":"R. Pearson, G. Gonye, M. Gabbouj","doi":"10.5281/ZENODO.38278","DOIUrl":"https://doi.org/10.5281/ZENODO.38278","url":null,"abstract":"Regular substructures in DNA sequences are important in a number of biological problems including promoter analysis, the detection of recurring anomalies in tumor cells, and the study of certain genetic diseases like fragile-X mental retardation. This paper considers signal processing problems relevant to the analysis of regular or semi-regular structure in DNA sequences that must address the fundamental issue of working with unordered, finite value sets.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126395211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic and accurate pitch marking of speech signal using an expert system based on logical combinations of different algorithms outputs","authors":"K. Ashouri, M. Savoji","doi":"10.5281/ZENODO.38173","DOIUrl":"https://doi.org/10.5281/ZENODO.38173","url":null,"abstract":"An expert system comprising a new pitch marking algorithm based on the estimation of the ideal excitation signal, using energy equalization of harmonics of the fundamental frequency present in speech, and three other competent tools is devised and explained in this paper. This expert system uses simple logical combinations of these tools outputs. The behaviour of a human expert is taken into account in developing the post-processing that is necessary to complete each tool and to further improve the results of their combinations. It is noted that, in most cases, combining the results of the new tool and the Childers method, itself based on what goes on behind hand marking by a human expert, is satisfactory. However, accurate and complete pitch marking is best achieved with all four outputs at the expense of some higher processing time.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126450392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear OFDM precoder design for multiuser wireless communications using cutoff rate optimization","authors":"Y. Rong, S. Vorobyov, A. Gershman","doi":"10.5281/ZENODO.38562","DOIUrl":"https://doi.org/10.5281/ZENODO.38562","url":null,"abstract":"Multiuser wireless communications based on orthogonal frequency division multiplexing (OFDM) technique have two pronouncing advantages. First of all, the equalizer design at the receiver is facilitated by converting the frequency selective fading channel into parallel flat fading channels. Moreover, by providing each user with a non-intersecting fraction of the available number of subcarriers, multiple-access interference (MAI) can be mitigated. However, a serious drawback in this communication scheme is that some subcarriers may be subject to deep fading in the frequency domain. In this paper, a linear precoding technique is proposed in order to solve this problem. The design of our linear precoder is based on the cutoff rate criterion and, in contrast to other existing precoding techniques, only the knowledge of the average relative power and the multipath delays is required at the transmitter.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114969285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind restoration of binary signals using a line spectrum fitting approach","authors":"J. Vía, I. Santamaría, M. Lázaro","doi":"10.5281/ZENODO.38321","DOIUrl":"https://doi.org/10.5281/ZENODO.38321","url":null,"abstract":"In this paper we present a new blind equalization algorithm that exploits the parallelism between the probability density function (PDF) of a random variable and a power spectral density (PSD). By using the PDF/PSD analogy, instead of minimizing the distance between the PDF of the input signal and the PDF at the output of the equalizer (an information-theoretic criterion), we solve a line spectrum fitting problem (a second-order statistics criterion) in a transformed domain. For a binary input, we use the fact that the ideal autocorrelation matrix in the transformed domain has rank 2 to develop batch and online projection-based algorithms. Numerical simulations demonstrate the performance of the proposed technique in comparison to batch cumulant-based methods as well as to conventional online blind algorithms such as the constant modulus algorithm (CMA).","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130727531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combination of phone N-grams for a MPEG-7-based spoken document retrieval system","authors":"N. Moreau, Hyoung‐Gook Kim, T. Sikora","doi":"10.5281/ZENODO.38262","DOIUrl":"https://doi.org/10.5281/ZENODO.38262","url":null,"abstract":"In this paper, we present a phone-based approach of spoken document retrieval (SDR), developed in the framework of the emerging MPEG-7 standard. The audio part of MPEG-7 aims at standardizing the indexing of audio documents. It encloses a SpokenContent tool that provides a description framework of the semantic content of speech signals. In the context of MPEG-7, we propose an indexing and retrieval method that uses phonetic information only and a vector space IR model. Different strategies based on the use of phone N-gram indexing terms are experimented.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134497077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tracking of extended size targets in H.264 compressed video using the probabilistic data association filter","authors":"Vimal Thilak, C. Creusere","doi":"10.5281/ZENODO.38608","DOIUrl":"https://doi.org/10.5281/ZENODO.38608","url":null,"abstract":"Object detection and tracking play a significant role in critical applications such as video monitoring and remote surveillance. These systems employ compression to efficiently utilize the available bandwidth. An example of an efficient compression solution to low bit rate video applications is the recently proposed H.264/AVC video coding standard. In particular, H.264/AVC has been optimized for transmission over wireless channels making it an attractive candidate for use in remote surveillance systems. In this paper, we propose an algorithm that exploits motion vectors generated by the H.264 encoder for object detection and tracking. Experimental results demonstrate the effectiveness of the proposed method to detect and track objects in real video sequences.","PeriodicalId":347658,"journal":{"name":"2004 12th European Signal Processing Conference","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}