{"title":"Analyzing the BBC Voices data: Contemporary English dialect areas and their characteristic lexical variants","authors":"Martijn Wieling, Clive Upton, Ann Thompson","doi":"10.1093/llc/fqt009","DOIUrl":"https://doi.org/10.1093/llc/fqt009","url":null,"abstract":"This study investigates data from the BBC Voices project which contains a large amount of vernacular data collected by the BBC between 2004 and 2005. This project was designed primarily to collect information on vernacular speech around the United Kingdom for broadcasting purposes. As part of the project, a web-based questionnaire was created, to which tens of thousands of people supplied their way of denoting thirty-eight concepts which were known to exhibit marked lexical variation. Along with their variants, those responding to the online prompts provided information on their age, gender, and —significantly for this study— their location, this being recorded by means of their postcode. In this study we focus on the relative frequency of the top-ten variants for all concepts in every postcode area. By using hierarchical spectral partitioning of bipartite graphs, we are able to identify four contemporary geographical dialect areas together with their characteristic lexical variants. Even though these variants can be said to characterize their respective geographical area, they also occur in other areas, and not all people in a certain region use the characteristic variant. This supports the view that dialect regions are not clearly defined by strict borders, but are fuzzy at best. Introduction In 2004 and 2005, the British Broadcasting Corporation conducted a large-scale survey in order to obtain a contemporary view of English dialectal variation. People visiting a speciallyconstructed website were invited to offer their variants for thirty-eight concepts that were known to exhibit marked lexical variation. Along with their lexical use, informants were asked to provide details of their age, gender, and geographical (post-coded) location. Upwards of 29,000 people participated in this project (“BBC Voices”) to a greater or lesser degree, resulting in a substantial electronic dataset as a consequence. As dialectologists we are interested in investigating geographical structure which might be present in our data. Given the enormous size of the Voices lexical dataset (containing more than 700,000 responses in total), we use quantitative methods from dialectometry to provide an aggregate view of the contemporary English dialectal landscape. Dialectometry originated in the 1970’s (Séguy, 1973) to provide a more objective method of identifying dialect differences than by “cherry-picking” the features which support the analysis one wishes to settle on (Nerbonne, 2009). Unfortunately, dialectometry has not been received very favorably by some traditional dialectologists, as aggregate analyses obscure the importance of individual linguistic features, on which they are required to focus for their often philologically-directed purposes. Consequently, there have been a number of attempts to develop quantitative methods which enable the identification of characteristic linguistic variables. For example, Shackleton (2007) uses cluster analysis and principal com","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129100732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language and gender in Congressional speech","authors":"Bei Yu","doi":"10.1093/llc/fqs073","DOIUrl":"https://doi.org/10.1093/llc/fqs073","url":null,"abstract":"This study draws from a large corpus of Congressional speeches from the 101st to the 110th Congress (1989–2008), to examine gender differences in language use in a setting of political debates. Female legislators’ speeches demonstrated characteristics of both a feminine language style (e.g. more use of emotion words, fewer articles) and a masculine one (e.g. more nouns and long words, fewer personal pronouns). A trend analysis found that these gender differences have consistently existed in the Congressional speeches over the past 20 years, regardless of the topic of debate. The findings lend support to the argument that gender differences in language use persist in professional settings like the floor of Congress.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128344726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative Methods in Corpus-Based Translation Studies","authors":"Sara Laviosa","doi":"10.1093/llc/fqt003","DOIUrl":"https://doi.org/10.1093/llc/fqt003","url":null,"abstract":"Firstly, Lidun Hareide and Knut Hofland describe through practical advice the compilation process of The Norwegian Spanish Parallel Corpus (NSPC) created at the University of Bergen (Norway), as well as preliminary findings from ongoing and planned research based on it. The corpus is primarily constructed for research in Translation Studies, and is built to be roughly comparable to the Spanish-English P-ACTRES corpus.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127321394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noise Channels. Glitch and Error in Digital Culture. Peter Krapp","authors":"I. Moradi","doi":"10.1093/llc/fqt004","DOIUrl":"https://doi.org/10.1093/llc/fqt004","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127397629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Secret Life of Pronouns. What Our Words Say About Us","authors":"J. Nerbonne","doi":"10.1093/llc/fqt006","DOIUrl":"https://doi.org/10.1093/llc/fqt006","url":null,"abstract":"THE SECRET CODE & MEANINGS OF THE GREEK ALPHABET LETTERS ... Mon, 25 Mar 2019 03:58:00 GMT Presentation to the 2nd IAPTI CONFERENCE Saturday 21st September 2014, Athens. An attempt to to share with you a story, some hi-story of my mother tongue, GREEK and stress how important was the contribution of this language to science, the arts, L.G Alexander Longman_English_Grammar.pdf | Tira Nur ... Tue, 12 Mar 2019 04:13:00 GMT Purdue OWL // Purdue Writing Lab Notes for Matthew –Chapter 6 (Page 1 of 6) Tue, 26 Mar 2019 12:18:00 GMT 1 Notes for Matthew –Chapter 6 (Page 1 of 6) Introduction– We are in the middle of one speech given by Jesus covering Chapters 5-7 1. The central theme is all about keeping God on the “throne of your heart”. 5 Easy Ways to Create Secret Codes and Ciphers wikiHow Mon, 25 Mar 2019 16:08:00 GMT How to Create Secret Codes and Ciphers. Codes are a way of altering a message so the original meaning is hidden. Generally, this requires a code book or word. Ciphers are processes that are applied to a message to hide or encipher... James W. Pennebaker Wikipedia Tue, 26 Mar 2019 21:36:00 GMT James W. Pennebaker (born March 2, 1950) is an American social psychologist. He is the Centennial Liberal Arts Professor of Psychology at the University of Texas at Austin and a member of the Academy of Distinguished Teachers. His research focuses on the relationship between natural language use, health, and social behavior, most recently \"how ...","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127619359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facilitating Access to the Web of Data. A Guide for Librarians. David Stuart","authors":"Timo Borst","doi":"10.1093/llc/fqt005","DOIUrl":"https://doi.org/10.1093/llc/fqt005","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126657980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Rezapour, S. M. Fakhrahmad, M. Sadreddini, M. Z. Jahromi
{"title":"An accurate word sense disambiguation system based on weighted lexical features","authors":"A. Rezapour, S. M. Fakhrahmad, M. Sadreddini, M. Z. Jahromi","doi":"10.1093/llc/fqs074","DOIUrl":"https://doi.org/10.1093/llc/fqs074","url":null,"abstract":"One of the major challenges in the process of machine translation is word sense disambiguation (WSD), which is defined as choosing the correct meaning of a multi-meaning word in a text. Supervised learning methods are usually used to solve this problem. The disambiguation task is performed using the statistics of the translated documents (as training data) or dual corpora of source and target languages. In this article, we present a supervised learning method for WSD, which is based on K-nearest neighbor algorithm. As the first step, we extract two sets of features: the set of words that have occurred frequently in the text and the set of words surrounding the ambiguous word. In order to improve the classification accuracy, we perform a feature selection process and then propose a feature weighting strategy to tune the classifier. In order to show that the proposed schemes are not language dependent, we apply the suggested schemes to two sets of data, i.e. English and Persian corpora. The evaluation results show that the feature selection and feature weighting strategies have a significant effect on the accuracy of the classification system. The results are also encouraging compared with the state of the art.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transparent aggregation of variables with Individual Differences Scaling","authors":"T. Ruette, D. Speelman","doi":"10.1093/llc/fqt011","DOIUrl":"https://doi.org/10.1093/llc/fqt011","url":null,"abstract":"Although the aggregation of many linguistic variables has provided new insights into the structure of language varieties, aggregation studies have been criticized for obscuring the behavior of individual input variables. Previous solutions to this criticism consisted of extensive post-hoc calculations, simple correlation measures, or highly complex algorithms. We think that these solutions can be improved. Therefore, the current article proposes a creative use of Individual Differences Scaling (INDSCAL) as an alternative, more straightforward solution. INDSCAL is a branch of Multidimensional Scaling, which is currently the preferred dimension reduction technique for most aggregation studies. The link to the existing methodology and the simplicity of its rationale are the main advantages of INDSCAL. The article introduces INDSCAL by means of a non-linguistic example, a discussion of the mathematical properties, and a case study on the lexical convergence between Belgian and Netherlandic Dutch in a corpus of language from 1950 and 1990. The case study shows how INDSCAL reproduces the results of a typical aggregation study, but elegantly keeps open the possibility of investigating the behavior of individual variables.","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"8 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124359294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corpus Linguistics: Method, Theory and Practice. Tony McEnery and Andrew Hardie","authors":"Paul Thompson","doi":"10.1093/llc/fqt010","DOIUrl":"https://doi.org/10.1093/llc/fqt010","url":null,"abstract":"","PeriodicalId":235034,"journal":{"name":"Lit. Linguistic Comput.","volume":"40 2-3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120896308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}