{"title":"Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology","authors":"Martijn Wieling, J. Nerbonne","doi":"10.3115/1708124.1708129","DOIUrl":"https://doi.org/10.3115/1708124.1708129","url":null,"abstract":"In this study we used bipartite spectral graph partitioning to simultaneously cluster varieties and sound correspondences in Dutch dialect data. While clustering geographical varieties with respect to their pronunciation is not new, the simultaneous identification of the sound correspondences giving rise to the geographical clustering presents a novel opportunity in dialectometry. Earlier methods aggregated sound differences and clustered on the basis of aggregate differences. The determination of the significant sound correspondences which co-varied with cluster membership was carried out on a post hoc basis. Bipartite spectral graph clustering simultaneously seeks groups of individual sound correspondences which are associated, even while seeking groups of sites which share sound correspondences. We show that the application of this method results in clear and sensible geographical groupings and discuss the concomitant sound correspondences.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133989859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social (distributed) language modeling, clustering and dialectometry","authors":"David Ellis","doi":"10.3115/1708124.1708126","DOIUrl":"https://doi.org/10.3115/1708124.1708126","url":null,"abstract":"We present ongoing work in a scalable, distributed implementation of over 200 million individual language models, each capturing a single user's dialect in a given language (multilingual users have several models). These have a variety of practical applications, ranging from spam detection to speech recognition, and dialectometrical methods on the social graph. Users should be able to view any content in their language (even if it is spoken by a small population), and to browse our site with appropriately translated interface (automatically generated, for locales with little crowd-sourced community effort).","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117186511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions","authors":"Linlin Li, C. Sporleder","doi":"10.3115/1708124.1708139","DOIUrl":"https://doi.org/10.3115/1708124.1708139","url":null,"abstract":"We present a graph-based model for representing the lexical cohesion of a discourse. In the graph structure, vertices correspond to the content words of a text and edges connecting pairs of words encode how closely the words are related semantically. We show that such a structure can be used to distinguish literal and non-literal usages of multi-word expressions.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121491508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative analysis of treebanks using frequent subtree mining methods","authors":"S. Martens","doi":"10.3115/1708124.1708140","DOIUrl":"https://doi.org/10.3115/1708124.1708140","url":null,"abstract":"The first task of statistical computational linguistics, or any other type of data-driven processing of language, is the extraction of counts and distributions of phenomena. This is much more difficult for the type of complex structured data found in treebanks and in corpora with sophisticated annotation than for tokenized texts. Recent developments in data mining, particularly in the extraction of frequent subtrees from treebanks, offer some solutions. We have applied a modified version of the TreeMiner algorithm to a small treebank and present some promising results.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123262043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Sinha, R. K. Pan, N. Yadav, M. Vahia, I. Mahadevan
{"title":"Network analysis reveals structure indicative of syntax in the corpus of undeciphered Indus civilization inscriptions","authors":"S. Sinha, R. K. Pan, N. Yadav, M. Vahia, I. Mahadevan","doi":"10.3115/1708124.1708128","DOIUrl":"https://doi.org/10.3115/1708124.1708128","url":null,"abstract":"Archaeological excavations in the sites of the Indus Valley civilization (2500-1900 BCE) in Pakistan and northwestern India have unearthed a large number of artifacts with inscriptions made up of hundreds of distinct signs. To date, there is no generally accepted decipherment of these sign sequences, and there have been suggestions that the signs could be non-linguistic. Here we apply complex network analysis techniques on the data-base of available Indus inscriptions, with the aim of detecting patterns indicative of syntactic structure in this sign system. Our results show the presence of regularities, e.g., in the segmentation trees of the sequences, that suggest the existence of a grammar underlying the construction of the sequences.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126612173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Ramage, Anna N. Rafferty, Christopher D. Manning
{"title":"Random Walks for Text Semantic Similarity","authors":"Daniel Ramage, Anna N. Rafferty, Christopher D. Manning","doi":"10.3115/1708124.1708131","DOIUrl":"https://doi.org/10.3115/1708124.1708131","url":null,"abstract":"Many tasks in NLP stand to benefit from robust measures of semantic similarity for units above the level of individual words. Rich semantic resources such as WordNet provide local semantic information at the lexical level. However, effectively combining this information to compute scores for phrases or sentences is an open problem. Our algorithm aggregates local relatedness information via a random walk over a graph constructed from an underlying lexical resource. The stationary distribution of the graph walk forms a \"semantic signature\" that can be compared to another such distribution to get a relat-edness score for texts. On a paraphrase recognition task, the algorithm achieves an 18.5% relative reduction in error rate over a vector-space baseline. We also show that the graph walk similarity between texts has complementary value as a feature for recognizing textual entailment, improving on a competitive baseline system.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133100810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classifying Japanese Polysemous Verbs based on Fuzzy C-means Clustering","authors":"Yoshimi Suzuki, Fumiyo Fukumoto","doi":"10.3115/1708124.1708132","DOIUrl":"https://doi.org/10.3115/1708124.1708132","url":null,"abstract":"This paper presents a method for classifying Japanese polysemous verbs using an algorithm to identify overlapping nodes with more than one cluster. The algorithm is a graph-based unsupervised clustering algorithm, which combines a generalized modularity function, spectral mapping, and fuzzy clustering technique. The modularity function for measuring cluster structure is calculated based on the frequency distributions over verb frames with selectional preferences. Evaluations are made on two sets of verbs including polysemies.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133389490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring semantic relatedness with vector space models and random walks","authors":"Amac Herdagdelen, K. Erk, Marco Baroni","doi":"10.3115/1708124.1708134","DOIUrl":"https://doi.org/10.3115/1708124.1708134","url":null,"abstract":"Both vector space models and graph random walk models can be used to determine similarity between concepts. Noting that vectors can be regarded as local views of a graph, we directly compare vector space models and graph random walk models on standard tasks of predicting human similarity ratings, concept categorization, and semantic priming, varying the size of the dataset from which vector space and graph are extracted.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131359164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based Event Coreference Resolution","authors":"Zheng Chen, Heng Ji","doi":"10.3115/1708124.1708135","DOIUrl":"https://doi.org/10.3115/1708124.1708135","url":null,"abstract":"In this paper, we address the problem of event coreference resolution as specified in the Automatic Content Extraction (ACE) program. In contrast to entity coreference resolution, event coreference resolution has not received great attention from researchers. In this paper, we first demonstrate the diverse scenarios of event coreference by an example. We then model event coreference resolution as a spectral graph clustering problem and evaluate the clustering algorithm on ground truth event mentions using ECM F-Measure. We obtain the ECM-F scores of 0.8363 and 0.8312 respectively by using two methods for computing coreference matrices.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122961688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce","authors":"D. Rao, David Yarowsky","doi":"10.3115/1708124.1708137","DOIUrl":"https://doi.org/10.3115/1708124.1708137","url":null,"abstract":"Label Propagation, a standard algorithm for semi-supervised classification, suffers from scalability issues involving memory and computation when used with large-scale graphs from real-world datasets. In this paper we approach Label Propagation as solution to a system of linear equations which can be implemented as a scalable parallel algorithm using the map-reduce framework. In addition to semi-supervised classification, this approach to Label Propagation allows us to adapt the algorithm to make it usable for ranking on graphs and derive the theoretical connection between Label Propagation and PageRank. We provide empirical evidence to that effect using two natural language tasks -- lexical relat-edness and polarity induction. The version of the Label Propagation algorithm presented here scales linearly in the size of the data with a constant main memory requirement, in contrast to the quadratic cost of both in traditional approaches.","PeriodicalId":359354,"journal":{"name":"Workshop on Graph-based Methods for Natural Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130863153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}