NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119391
S. Strassel, A. Mitchell
{"title":"Multilingual Resources for Entity Extraction","authors":"S. Strassel, A. Mitchell","doi":"10.3115/1119384.1119391","DOIUrl":"https://doi.org/10.3115/1119384.1119391","url":null,"abstract":"Progress in human language technology requires increasing amounts of data and annotation in a growing variety of languages. Research in Named Entity extraction is no exception. Linguistic Data Consortium is creating annotated corpora to support information extraction in English, Chinese, Arabic, and other languages for a variety of US Government-sponsored programs. This paper covers the scope of annotation and research tasks within these programs, describes some of the challenges of multilingual corpus development for entity extraction, and concludes with a description of the corpora developed to support this research.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117047082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119393
Youzheng Wu, Jun Zhao, Bo Xu
{"title":"Chinese Named Entity Recognition Combining Statistical Model wih Human Knowledge","authors":"Youzheng Wu, Jun Zhao, Bo Xu","doi":"10.3115/1119384.1119393","DOIUrl":"https://doi.org/10.3115/1119384.1119393","url":null,"abstract":"Named Entity Recognition is one of the key techniques in the fields of natural language processing, information retrieval, question answering and so on. Unfortunately, Chinese Named Entity Recognition (NER) is more difficult for the lack of capitalization information and the uncertainty in word segmentation. In this paper, we present a hybrid algorithm which can combine a class-based statistical model with various types of human knowledge very well. In order to avoid data sparseness problem, we employ a back-off model and [Abstract contained text which could not be captured.], a Chinese thesaurus, to smooth the parameters in the model. The F-measure of person names, location names, and organization names on the newswire test data for the 1999 IEER evaluation in Mandarin is 86.84%, 84.40% and 76.22% respectively.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131657093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119385
Hsin-Hsi Chen, Changhua Yang, Ying Lin
{"title":"Learning Formulation and Transformation Rules for Multilingual Named Entities","authors":"Hsin-Hsi Chen, Changhua Yang, Ying Lin","doi":"10.3115/1119384.1119385","DOIUrl":"https://doi.org/10.3115/1119384.1119385","url":null,"abstract":"This paper investigates three multilingual named entity corpora, including named people, named locations and named organizations. Frequency-based approaches with and without dictionary are proposed to extract formulation rules of named entities for individual languages, and transformation rules for mapping among languages. We consider the issues of abbreviation and compound keyword at a distance. Keywords specify not only the types of named entities, but also tell out which parts of a named entity should be meaning-translated and which part should be phoneme-transliterated. An application of the results on cross language information retrieval is also shown.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119389
D. Maynard, V. Tablan, H. Cunningham
{"title":"NE Recognition Without Training Data on a Language You Don't Speak","authors":"D. Maynard, V. Tablan, H. Cunningham","doi":"10.3115/1119384.1119389","DOIUrl":"https://doi.org/10.3115/1119384.1119389","url":null,"abstract":"In this paper we describe an experiment to adapt a named entity recognition system from English to Cebuano as part of the TIDES surprise language program. With 4 person-days of effort, and with no previous knowledge of which language would be involved, no knowledge of the language in question once it was announced, and no training data available, we adapted the ANNIE system for Cebuano and achieved an F-measure of 77.5%.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122886762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119387
T. Kumano, H. Kashioka, Hideki Tanaka, T. Fukusima
{"title":"Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags","authors":"T. Kumano, H. Kashioka, Hideki Tanaka, T. Fukusima","doi":"10.3115/1119384.1119387","DOIUrl":"https://doi.org/10.3115/1119384.1119387","url":null,"abstract":"We are aiming to acquire named entity (NE) translation knowledge from nonparallel, content-aligned corpora, by utilizing NE extraction techniques. For this research, we are constructing a Japanese-English broadcast news corpus with NE tags. The tags represent not only NE class information but also coreference information within the same monolingual document and between corresponding Japanese-English document pairs. Analysis of about 1,100 annotated article pairs has shown that if NE occurrence information, such as classes, number of occurrence and occurrence order, is given for each language, it may provide a good clue for corresponding NEs across languages.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115086629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119388
Lluís Màrquez i Villodre, A. Gispert, X. Carreras, Lluís Padró
{"title":"Low-cost Named Entity Classification for Catalan: Exploiting Multilingual Resources and Unlabeled Data","authors":"Lluís Màrquez i Villodre, A. Gispert, X. Carreras, Lluís Padró","doi":"10.3115/1119384.1119388","DOIUrl":"https://doi.org/10.3115/1119384.1119388","url":null,"abstract":"This work studies Named Entity Classification (NEC) for Catalan without making use of large annotated resources of this language. Two views are explored and compared, namely exploiting solely the Catalan resources, and a direct training of bilingual classification models (Spanish and Catalan), given that a large collection of annotated examples is available for Spanish. The empirical results obtained on real data point out that multilingual models clearly outperform monolingual ones, and that the resulting Catalan NEC models are easier to improve by bootstrapping on unlabelled data.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"91 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120825242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119386
Fei Huang, S. Vogel, A. Waibel
{"title":"Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization","authors":"Fei Huang, S. Vogel, A. Waibel","doi":"10.3115/1119384.1119386","DOIUrl":"https://doi.org/10.3115/1119384.1119386","url":null,"abstract":"Translingual equivalence refers to the relationship between expressions of the same meaning from different languages. Identifying translingual equivalence of named entities (NE) can significantly contribute to multilingual natural language processing, such as crosslingual information retrieval, crosslingual information extraction and statistical machine translation. In this paper we present an integrated approach to extract NE translingual equivalence from a parallel Chinese-English corpus.Starting from a bilingual corpus where NEs are automatically tagged for each language, NE pairs are aligned in order to minimize the overall multi-feature alignment cost. An NE transliteration model is presented and iteratively trained using named entity pairs extracted from a bilingual dictionary. The transliteration cost, combined with the named entity tagging cost and word-based translation cost, constitute the multi-feature alignment cost. These features are derived from several information sources using unsupervised and partly supervised methods. A greedy search algorithm is applied to minimize the alignment cost. Experiments show that the proposed approach extracts NE translingual equivalence with 81% F-score and improves the translation score from 7.68 to 7.74.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119392
Paola Virga, S. Khudanpur
{"title":"Transliteration of Proper Names in Cross-Lingual Information Retrieval","authors":"Paola Virga, S. Khudanpur","doi":"10.3115/1119384.1119392","DOIUrl":"https://doi.org/10.3115/1119384.1119392","url":null,"abstract":"We address the problem of transliterating English names using Chinese orthography in support of cross-lingual speech and text processing applications. We demonstrate the application of statistical machine translation techniques to \"translate\" the phonemic representation of an English name, obtained by using an automatic text-to-speech system, to a sequence of initials and finals, commonly used sub-word units of pronunciation for Chinese. We then use another statistical translation model to map the initial/final sequence to Chinese characters. We also present an evaluation of this module in retrieval of Mandarin spoken documents from the TDT corpus using English text queries.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131857925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
NER@ACLPub Date : 2003-07-12DOI: 10.3115/1119384.1119390
Kuniko Saito, M. Nagata
{"title":"Multi-Language Named-Entity Recognition System based on HMM","authors":"Kuniko Saito, M. Nagata","doi":"10.3115/1119384.1119390","DOIUrl":"https://doi.org/10.3115/1119384.1119390","url":null,"abstract":"We introduce a multi-language named-entity recognition system based on HMM. Japanese, Chinese, Korean and English versions have already been implemented. In principle, it can analyze any other language if we have training data of the target language. This system has a common analytical engine and it can handle any language simply by changing the lexical analysis rules and statistical language model. In this paper, we describe the architecture and accuracy of the named-entity system, and report preliminary experiments on automatic bilingual named-entity dictionary construction using the Japanese and English named-entity recognizer.","PeriodicalId":237242,"journal":{"name":"NER@ACL","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124291136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}