{"title":"Interpolated distanced bigram language models for robust word clustering","authors":"N. Bassiou, Constantine Kotropoulos","doi":"10.1109/NSIP.2005.1502228","DOIUrl":null,"url":null,"abstract":"Summary form only given. Two methods for interpolating the distanced bigram language model are examined which take into account pairs of words that appear at varying distances within a context. The language models under study yield a lower perplexity than the baseline bigram model. A word clustering algorithm based on mutual information with robust estimates of the mean vector and the covariance matrix is employed in the proposed interpolated language model. The word clusters obtained by using the aforementioned language model are proved more meaningful than the word clusters derived using the baseline bigram.","PeriodicalId":250223,"journal":{"name":"NSIP 2005. Abstracts. IEEE-Eurasip Nonlinear Signal and Image Processing, 2005.","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NSIP 2005. Abstracts. IEEE-Eurasip Nonlinear Signal and Image Processing, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NSIP.2005.1502228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Summary form only given. Two methods for interpolating the distanced bigram language model are examined which take into account pairs of words that appear at varying distances within a context. The language models under study yield a lower perplexity than the baseline bigram model. A word clustering algorithm based on mutual information with robust estimates of the mean vector and the covariance matrix is employed in the proposed interpolated language model. The word clusters obtained by using the aforementioned language model are proved more meaningful than the word clusters derived using the baseline bigram.