NEWS@ACLPub Date : 2015-07-01DOI: 10.18653/v1/W15-3903
G. Kondrak
{"title":"How do you spell that? A journey through word representations","authors":"G. Kondrak","doi":"10.18653/v1/W15-3903","DOIUrl":"https://doi.org/10.18653/v1/W15-3903","url":null,"abstract":"Languages are made up of words, which in turn consist of smaller units such as letters, phonemes, morphemes and syllables. Words exist independently of writing, as abstract entities shared among the speakers of a language. Those abstract entities have various representations, which in turn may have different realizations. Orthographic forms, phonetic transcriptions, alternative transliterations, and even sound-wave spectrograms are all related by referring to the same abstract word and they all convey information about its pronunciation. In this talk, I will discuss the lessons learned and insights gained from a number of research projects related to the transliteration task in which I participated.","PeriodicalId":189654,"journal":{"name":"NEWS@ACL","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122465569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@ACLPub Date : 2015-07-01DOI: 10.18653/v1/W15-3913
Yu-Chun Wang, Chun-Kai Wu, Richard Tzong-Han Tsai
{"title":"NCU IISR English-Korean and English-Chinese Named Entity Transliteration Using Different Grapheme Segmentation Approaches","authors":"Yu-Chun Wang, Chun-Kai Wu, Richard Tzong-Han Tsai","doi":"10.18653/v1/W15-3913","DOIUrl":"https://doi.org/10.18653/v1/W15-3913","url":null,"abstract":"This paper describes our approach to English-Korean and English-Chinese transliteration task of NEWS 2015. We use different grapheme segmentation approaches on source and target languages to train several transliteration models based on the M2M-aligner and DirecTL+, a string transduction model. Then, we use two reranking techniques based on string similarity and web co-occurrence to select the best transliteration among the prediction results from the different models. Our English-Korean standard and non-standard runs achieve 0.4482 and 0.5067 in top-1 accuracy respectively, and our English-Chinese standard runs achieves 0.2925 in top-1 accuracy.","PeriodicalId":189654,"journal":{"name":"NEWS@ACL","volume":" 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120832463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@ACLPub Date : 2015-07-01DOI: 10.18653/v1/W15-3906
Livy Real, Alexandre Rademaker
{"title":"HAREM and Klue: how to put two tagsets for named entities annotation together","authors":"Livy Real, Alexandre Rademaker","doi":"10.18653/v1/W15-3906","DOIUrl":"https://doi.org/10.18653/v1/W15-3906","url":null,"abstract":"This paper describes an undergoing experiment to compare two tagsets for Named Entities (NE) annotation. We compared Klue 2 tagset, developed by IBM Research, with HAREM tagset, developed for tagging the Portuguese corpora used in Second HAREM competition. From this report, we expected to evaluate our methodology for comparison and to survey the problems that arise from it.","PeriodicalId":189654,"journal":{"name":"NEWS@ACL","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127392030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@ACLPub Date : 2015-05-19DOI: 10.18653/v1/W15-3904
C. D. Santos, Victor Guimarães
{"title":"Boosting Named Entity Recognition with Neural Character Embeddings","authors":"C. D. Santos, Victor Guimarães","doi":"10.18653/v1/W15-3904","DOIUrl":"https://doi.org/10.18653/v1/W15-3904","url":null,"abstract":"Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level representations (embeddings) to perform sequential classification. We perform an extensive number of experiments using two annotated corpora in two different languages: HAREM I corpus, which contains texts in Portuguese; and the SPA CoNLL-2002 corpus, which contains texts in Spanish. Our experimental results shade light on the contribution of neural character embeddings for NER. Moreover, we demonstrate that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features. For the HAREM I corpus, CharWNN outperforms the state-of-the-art system by 7.9 points in the F1-score for the total scenario (ten NE classes), and by 7.2 points in the F1 for the selective scenario (five NE classes).","PeriodicalId":189654,"journal":{"name":"NEWS@ACL","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116130742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@ACLPub Date : 1900-01-01DOI: 10.18653/v1/W17-2710
Natalie Ahn
{"title":"Inducing Event Types and Roles in Reverse: Using Function to Discover Theme","authors":"Natalie Ahn","doi":"10.18653/v1/W17-2710","DOIUrl":"https://doi.org/10.18653/v1/W17-2710","url":null,"abstract":"With growing interest in automated event extraction, there is an increasing need to overcome the labor costs of hand-written event templates, entity lists, and annotated corpora. In the last few years, more inductive approaches have emerged, seeking to discover unknown event types and roles in raw text. The main recent efforts use probabilistic generative models, as in topic modeling, which are formally concise but do not always yield stable or easily interpretable results. We argue that event schema induction can benefit from greater structure in the process and in linguistic features that distinguish words’ functions and themes. To maximize our use of limited data, we reverse the typical schema induction steps and introduce new similarity measures, building an intuitive process for inducing the structure of unknown events.","PeriodicalId":189654,"journal":{"name":"NEWS@ACL","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129159638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NEWS@ACLPub Date : 1900-01-01DOI: 10.18653/v1/W18-2401
J. Andrew
{"title":"Automatic Extraction of Entities and Relation from Legal Documents","authors":"J. Andrew","doi":"10.18653/v1/W18-2401","DOIUrl":"https://doi.org/10.18653/v1/W18-2401","url":null,"abstract":"In recent years, the journalists and computer sciences speak to each other to identify useful technologies which would help them in extracting useful information. This is called “computational Journalism”. In this paper, we present a method that will enable the journalists to automatically identifies and annotates entities such as names of people, organizations, role and functions of people in legal documents; the relationship between these entities are also explored. The system uses a combination of both statistical and rule based technique. The statistical method used is Conditional Random Fields and for the rule based technique, document and language specific regular expressions are used.","PeriodicalId":189654,"journal":{"name":"NEWS@ACL","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129536922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}