{"title":"A word-to-phrase statistical translation model","authors":"Marcello Federico, N. Bertoldi","doi":"10.1145/1115686.1115687","DOIUrl":"https://doi.org/10.1145/1115686.1115687","url":null,"abstract":"This article addresses the development of statistical models for phrase-based machine translation (MT) which extend a popular word-alignment model proposed by IBM in the early 90s. A novel decoding algorithm is directly derived from the optimization criterion which defines the statistical MT approach. Efficiency in decoding is achieved by applying dynamic programming, pruning strategies, and word reordering constraints. It is known that translation performance can be boosted by exploiting phrase (or multiword) translation pairs automatically extracted from a parallel corpus. New phrase-based models are obtained by introducing extra multiwords in the target language vocabulary and by estimating the corresponding parameters from either: (i) a word-based model, (ii) phrase-based statistics computed on the parallel corpus, or (iii) the interpolation of the two previous estimates. Word-based and phrase-based MT models are evaluated on a traveling domain task in two translation directions: Chinese-English (12k-word vocabulary) and Italian-English (16k-word vocabulary). Phrase-based models show Bleu score improvements over the word-based model by 19% and 13% relative, respectively.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125971731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefanie Tomko, T. Harris, Arthur R. Toth, James Sanders, Alexander I. Rudnicky, R. Rosenfeld
{"title":"Towards efficient human machine speech communication: The speech graffiti project","authors":"Stefanie Tomko, T. Harris, Arthur R. Toth, James Sanders, Alexander I. Rudnicky, R. Rosenfeld","doi":"10.1145/1075389.1075391","DOIUrl":"https://doi.org/10.1145/1075389.1075391","url":null,"abstract":"This research investigates the design and performance of the Speech Graffiti interface for spoken interaction with simple machines. Speech Graffiti is a standardized interface designed to address issues inherent in the current state-of-the-art in spoken dialog systems such as high word-error rates and the difficulty of developing natural language systems. This article describes the general characteristics of Speech Graffiti, provides examples of its use, and describes other aspects of the system such as the development toolkit. We also present results from a user study comparing Speech Graffiti with a natural language dialog system. These results show that users rated Speech Graffiti significantly better in several assessment categories. Participants completed approximately the same number of tasks with both systems, and although Speech Graffiti users often took more turns to complete tasks than natural language interface users, they completed tasks in slightly less time.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133435328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic summarization of voicemail messages using lexical and prosodic features","authors":"K. Koumpis, S. Renals","doi":"10.1145/1075389.1075390","DOIUrl":"https://doi.org/10.1145/1075389.1075390","url":null,"abstract":"This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129388217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Web-based models for natural language processing","authors":"Mirella Lapata, Frank Keller","doi":"10.1145/1075389.1075392","DOIUrl":"https://doi.org/10.1145/1075389.1075392","url":null,"abstract":"Previous work demonstrated that Web counts can be used to approximate bigram counts, suggesting that Web-based frequencies should be useful for a wide variety of Natural Language Processing (NLP) tasks. However, only a limited number of tasks have so far been tested using Web-scale data sets. The present article overcomes this limitation by systematically investigating the performance of Web-based models for several NLP tasks, covering both syntax and semantics, both generation and analysis, and a wider range of n-grams and parts of speech than have been previously explored. For the majority of our tasks, we find that simple, unsupervised models perform better when n-gram counts are obtained from the Web rather than from a large corpus. In some cases, performance can be improved further by using backoff or interpolation techniques that combine Web counts and corpus counts. However, unsupervised Web-based models generally fail to outperform supervised state-of-the-art models trained on smaller corpora. We argue that Web-based models should therefore be used as a baseline for, rather than an alternative to, standard supervised models.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124196593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voice fonts for individuality representation and transformation","authors":"Ashish Verma, Arun Kumar","doi":"10.1145/1075389.1075393","DOIUrl":"https://doi.org/10.1145/1075389.1075393","url":null,"abstract":"Speaker individuality transformation is used to modify the speech signal's characteristics so that it sounds as if it is spoken by another speaker. Previous methods for individuality transformation use mapping functions which depend upon a pair of speakers. We introduce the paradigm of voice fonts to represent the individuality of a speaker, independent of other speakers. Several objective and subjective tests are conducted to evaluate the performance of the approaches proposed for the voice fonts paradigm. The results show that the voice fonts paradigm enables independent representation of a speaker's individuality and produces equally good quality of transformed speech compared to previous approaches. This independent representation will be useful in important applications which were not possible with previous methods.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130459547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An information-theoretic measure to evaluate parsing difficulty across treebanks","authors":"A. Corazza, A. Lavelli, G. Satta","doi":"10.1145/2407736.2407737","DOIUrl":"https://doi.org/10.1145/2407736.2407737","url":null,"abstract":"With the growing interest in statistical parsing, special attention has recently been devoted to the problem of comparing different treebanks to assess which languages or domains are more difficult to parse relative to a given model. A common methodology for comparing parsing difficulty across treebanks is based on the use of the standard labeled precision and recall measures. As an alternative, in this article we propose an information-theoretic measure, called the expected conditional cross-entropy (ECC). One important advantage with respect to standard performance measures is that ECC can be directly expressed as a function of the parameters of the model. We evaluate ECC across several treebanks for English, French, German, and Italian, and show that ECC is an effective measure of parsing difficulty, with an increase in ECC always accompanied by a degradation in parsing accuracy.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130916406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual and active learning-based affect-sensing from virtual drama improvisation","authors":"Li Zhang","doi":"10.1145/2407736.2407738","DOIUrl":"https://doi.org/10.1145/2407736.2407738","url":null,"abstract":"Affect interpretation from open-ended drama improvisation is a challenging task. This article describes experiments in using latent semantic analysis to identify discussion themes and potential target audiences for those improvisational inputs without strong affect indicators. A context-based affect-detection is also implemented using a supervised neural network with the consideration of emotional contexts of most intended audiences, sentence types, and interpersonal relationships. In order to go beyond the constraints of predefined scenarios and improve the system's robustness, min-margin-based active learning is implemented. This active learning algorithm also shows great potential in dealing with imbalanced affect classifications. Evaluation results indicated that the context-based affect detection achieved an averaged precision of 0.826 and an averaged recall of 0.813 for affect detection of the test inputs from the Crohn's disease scenario using three emotion labels: positive, negative, and neutral, and an averaged precision of 0.868 and an average recall of 0.876 for the test inputs from the school bullying scenario. Moreover, experimental evaluation on a benchmark data set for active learning demonstrated that active learning was able to greatly reduce human annotation efforts for the training of affect detection, and also showed promising robustness in dealing with open-ended example inputs beyond the improvisation of the chosen scenarios.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125119688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A statistical model for near-synonym choice","authors":"D. Inkpen","doi":"10.1145/1187415.1187417","DOIUrl":"https://doi.org/10.1145/1187415.1187417","url":null,"abstract":"We present an unsupervised statistical method for automatic choice of near-synonyms when the context is given. The method uses the Web as a corpus to compute scores based on mutual information. Our evaluation experiments show that this method performs better than two previous methods on the same task. We also describe experiments in using supervised learning for this task. We present an application to an intelligent thesaurus. This work is also useful in machine translation and natural language generation.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127898013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised models for morpheme segmentation and morphology learning","authors":"Mathias Creutz, K. Lagus","doi":"10.1145/1187415.1187418","DOIUrl":"https://doi.org/10.1145/1187415.1187418","url":null,"abstract":"We present a model family called Morfessor for the unsupervised induction of a simple morphology from raw text data. The model is formulated in a probabilistic maximum a posteriori framework. Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes. A lexicon of word segments, called morphs, is induced from the data. The lexicon stores information about both the usage and form of the morphs. Several instances of the model are evaluated quantitatively in a morpheme segmentation task on different sized sets of Finnish as well as English data. Morfessor is shown to perform very well compared to a widely known benchmark algorithm, in particular on Finnish data.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126899569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Author verification by linguistic profiling: An exploration of the parameter space","authors":"H. V. Halteren","doi":"10.1145/1187415.1187416","DOIUrl":"https://doi.org/10.1145/1187415.1187416","url":null,"abstract":"This article explores the effects of parameter settings in linguistic profiling, a technique in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts. Although the technique proves to be quite effective for authorship verification, with the best overall parameter settings yielding an equal error rate of 3% on a test corpus of student essays, the optimal parameters vary greatly depending on author and evaluation criterion.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122291081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}