{"title":"Content Selection Operators for Multidocument Summarization Based on Cross-Document Structure Theory","authors":"M. L. C. Jorge, T. Pardo","doi":"10.1109/STIL.2009.15","DOIUrl":"https://doi.org/10.1109/STIL.2009.15","url":null,"abstract":"This paper aims at presenting an analysis of content selection techniques for multidocument summarization based on the multidocument discourse theory CST (Cross-document Structure Theory). We approach the task of content selection by using CST-based operators and focus specifically on redundancy treatment, which is an important and pervasive problem in multidocument summarization. Our experiments with Brazilian Portuguese news texts show that CST improves summaries quality by exploring relations among texts. Particularly, redundancy is reduced by identifying common information among texts, especially when compression rate is low.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134181903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Fusion of Similar Sentences in Portuguese","authors":"E. M. Seno, M. G. V. Nunes","doi":"10.1109/STIL.2009.27","DOIUrl":"https://doi.org/10.1109/STIL.2009.27","url":null,"abstract":"This paper presents a Portuguese sentence fusion model. Sentence fusion is a text-to-text generation task which takes a set of similar sentences as input and combines these into a single output sentence. This process is of extreme relevance in many NLP applications, for instance, to treat redundancies in Multidocument Summarization by fusing information from a set of related sentences into a new one. We present three intrinsic evaluations of the model and the results obtained suggest that it has potential.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133611349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portuguese Temporal Expressions Recognition: From TE Characterization to an Effective TER Module Implementation","authors":"Caroline Hagège, J. Baptista, N. Mamede","doi":"10.1109/STIL.2009.12","DOIUrl":"https://doi.org/10.1109/STIL.2009.12","url":null,"abstract":"Taking into account the temporal dimension conveyed in texts is a challenge to natural language processing. At the same time this task is of great importance for a wide range of natural language processing applications. The goal of this paper is twofold. First a characterization of Portuguese temporal expressions as they appear in texts is presented. This classification is intended to meet the requirements of high inter-agreement between annotators of temporal expressions. Second, relying on this characterization, an effective temporal expression annotation tool is described. Results from its evaluation are reported.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123639904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eraldo Rezende Fernandes, B. '. Pires, C. D. Santos, R. Milidiú
{"title":"Clause Identification Using Entropy Guided Transformation Learning","authors":"Eraldo Rezende Fernandes, B. '. Pires, C. D. Santos, R. Milidiú","doi":"10.1109/STIL.2009.10","DOIUrl":"https://doi.org/10.1109/STIL.2009.10","url":null,"abstract":"Entropy Guided Transformation Learning (ETL) is a machine learning strategy that extends Transformation Based Learning by providing automatic template generation. In this work, we propose an ETL approach to the clause identification task. We use the English language corpus of the CoNLL'2001 shared task. The achieved performance is not competitive yet, since the F1 of the ETL based system is 80.55, whereas the state-of-the-art system performance is 85.03. Nevertheless, our modeling strategy is very simple, when compared to the state-of-the-art approaches. These first findings indicate that the ETL approach is a promising one for this task. One can enhance its performance by incorporating problem specific knowledge. Additional features can be easily introduced in the ETL model.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-Tuning in Portuguese-English Statistical Machine Translation","authors":"Wilker Aziz, T. Pardo, Ivandré Paraboni","doi":"10.1109/STIL.2009.16","DOIUrl":"https://doi.org/10.1109/STIL.2009.16","url":null,"abstract":"In previous work we have shown results of a first experiment in Statistical Machine Translation (SMT) for Brazilian Portuguese and American English using state-of-the-art phrase-based models. In this paper we compare a number of training and decoding parameter choices for fine-tuning the system as an attempt to obtain optimal results for this language pair.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125882159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Relation Extraction by Analysis of Terms Correlation in Documents","authors":"Sérgio William Botero, I. Ricarte","doi":"10.1109/STIL.2009.18","DOIUrl":"https://doi.org/10.1109/STIL.2009.18","url":null,"abstract":"Ontologies are important to organize and describe information, but are hard to create and maintain, which motivates the development of tools to help in this task. This article presents a strategy to extract, from a corpora of documents in a given domain, semantic elements expressing proximity relations between terms and concepts to help the construction of domain ontologies. The technique presented here, ACT, is based on linguistic processing, machine learning, and biclustering. Results show that concepts obtained by ACT are at least as good as those from similar techniques, such as LSI and NMF. In relation to those techniques, it additionally has the advantage of allowing the supervision by a domain expert.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132782411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Factorial to Quadratic Time Complexity for Sentence Realization Using Nearest Neighbour Algorithm","authors":"Karthik Gali, Sriram Venkatapathy, Taraka Rama","doi":"10.1109/STIL.2009.38","DOIUrl":"https://doi.org/10.1109/STIL.2009.38","url":null,"abstract":"{karthikg@students,sriram@research,taraka@students}.iiit.ac.in Abstract. Sentence Realization is the task of generating a well-formed sentence from a bag of words. Sentence Realization is a major step in many Natural Language Processing applications like Machine Translation (MT), Summariza- tion and Dialogue Systems. In this paper, we explore a graph based Nearest Neighbour Algorithm for the task of Sentence Realization. Sentence Realization is a major step in many Natural Language Processing applications like Machine Translation (MT), Summarization and Dialogue Systems. The task of Sen- tence Realization involves formation of a well-formed sentence from a bag of lexical items. These lexical items may be attached syntactically with one another. The level of syntactic information varies from application to application. Our aim consists of achiev- ing quality sentence realiser using as much as minimum syntactic information and of minimal computational complexity. As such our experiments assume only basic syntactic information, such as unlabeled dependency relationships between the lexical items. Graph based algorithms for Natural Language applications such as Pars- ing (McDonald et al. 2005), Summarization (Mihalcea and Tarau 2005) and Word sense disambiguation (Mihalcea 2005) have been well explored. For the task of Sentence Re- alization, graph based algorithms have yet to be explored. This paper is a novel effort in that direction.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134387608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAHARA: An Online Service for HAREM Named Entity Recognition Evaluation","authors":"Hugo Gonçalo Oliveira, Nuno Cardoso","doi":"10.1109/STIL.2009.31","DOIUrl":"https://doi.org/10.1109/STIL.2009.31","url":null,"abstract":"This paper presents SAHARA, an online service for the evaluation platform of Second HAREM. SAHARA allows a fast evaluation of any NER system that conforms with HAREM guidelines, making it easier to perform post-hoc evaluations and keep track of the overall performance of NER systems.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117094318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. M. D. Novais, Rafael L. de Oliveira, D. B. Pereira, Thiago Dias Tadeu, Ivandré Paraboni
{"title":"A Testbed for Portuguese Natural Language Generation","authors":"E. M. D. Novais, Rafael L. de Oliveira, D. B. Pereira, Thiago Dias Tadeu, Ivandré Paraboni","doi":"10.1109/STIL.2009.17","DOIUrl":"https://doi.org/10.1109/STIL.2009.17","url":null,"abstract":"We present a data-text aligned corpus for Brazilian Portuguese Natural Language Generation (NLG) called SINotas, which we believe to be the first of its kind. SINotas provides a testbed for research on various aspects of trainable, corpus-based NLG, and it is the basis of a simple NLG application under development in the education domain.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125062204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Performance of a Centroid-Based Probabilistic Neural Network","authors":"P. M. Ciarelli, E. Oliveira","doi":"10.1109/STIL.2009.32","DOIUrl":"https://doi.org/10.1109/STIL.2009.32","url":null,"abstract":"In this article is proposed a technique which uses centroids together with Probabilistic Neural Network to minimize some disadvantages of this net, such as the storage space for the neural network weights and linear time complexity order with the number of training samples. In the experiments carry out the memory usage and classification time were drastically reduced. Besides, the quality of the results was also considering improved by the a priory probability, when using it with theses centroids.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114922805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}