2009 Seventh Brazilian Symposium in Information and Human Language Technology最新文献

筛选
英文 中文
Contextual Information for Named Entity Recognition in Biomedical Texts 生物医学文本中命名实体识别的上下文信息
R. Goulart, Vera Lúcia Strube de Lima
{"title":"Contextual Information for Named Entity Recognition in Biomedical Texts","authors":"R. Goulart, Vera Lúcia Strube de Lima","doi":"10.1109/STIL.2009.28","DOIUrl":"https://doi.org/10.1109/STIL.2009.28","url":null,"abstract":"This article presents a study on Named Entities (NE) recognition using contextual information present on a Biomedical corpus. Related work indicates that the use of context (words surrounding a word) can assist the NE recognition. This work presents experimental results to evaluate the impact of different context settings, using machine learning, for the NE recognition.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124207013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Machine Translation: Little Changes Big Impacts 统计机器翻译:小变化大影响
Helena de Medeiros Caseli, Israel Aono Nunes
{"title":"Statistical Machine Translation: Little Changes Big Impacts","authors":"Helena de Medeiros Caseli, Israel Aono Nunes","doi":"10.1109/STIL.2009.24","DOIUrl":"https://doi.org/10.1109/STIL.2009.24","url":null,"abstract":"In this paper we describe some experiments carried out to test the impact of automatic casing and punctuation changes when training and testing statistical translation models. The experiments described here concern the translation from/to English and Brazilian Portuguese texts but since the superficial changes investigated are language independent, we believe that the conclusions can be applied to many other pairs of languages. These experiments weredesigned aiming at setting a baseline scenario for future training and testing of more complex statistical translation models such as the factored ones. From the experiments presented here it is possible to see that case and punctuation changes have a significant impact on automatic translation results.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123574320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Evaluation of Stopwords Removal on the Statistical Approach for Automatic Term Extraction 基于统计方法的自动词提取停用词去除效果评价
Í. Braga
{"title":"Evaluation of Stopwords Removal on the Statistical Approach for Automatic Term Extraction","authors":"Í. Braga","doi":"10.1109/STIL.2009.8","DOIUrl":"https://doi.org/10.1109/STIL.2009.8","url":null,"abstract":"The construction of terminological products is important to the organization and spreading of knowledge. This task can be leveraged by the automatic extraction of terms, which has been considered a Natural Language Processing problem. In this paper, the interaction between the statistical approach to term extraction and the process of stopword removal is investigated. Experiments conducted on two corpora show that stopword removal improves performance when extracting bigram terms, no matter if the removal is done before or after the application of a statistical metric. As a result of this investigation, it is possible to recommend more appropriate statistical metrics for the case where it is possible to remove stopwords and for the case that this removal cannot be done.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128512644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multimedia Collections of Indigenous Languages: An Organization Proposal 土著语言多媒体馆藏:组织建议
Ellison Cleyton Barbosa dos Santos
{"title":"Multimedia Collections of Indigenous Languages: An Organization Proposal","authors":"Ellison Cleyton Barbosa dos Santos","doi":"10.1109/STIL.2009.7","DOIUrl":"https://doi.org/10.1109/STIL.2009.7","url":null,"abstract":"This paper describes the Sistema de Informação do Acervo deLínguas Indígenas (SIALI), a database designed to organize the storage of linguistic and ethnographic media data. The database was implemented in MS Access and offers a personalized mechanism for controlling the organization and storage of data, based on library techniques. The physical design presented here identifies the core configuration required in a database, including organization, storage, management, retrieval of data, and other features that are important for a database on storage media.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"819 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127298326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Token Classification Approach to Dependency Parsing 依赖解析的令牌分类方法
R. Milidiú, C. M. P. Crestana, C. D. Santos
{"title":"A Token Classification Approach to Dependency Parsing","authors":"R. Milidiú, C. M. P. Crestana, C. D. Santos","doi":"10.1109/STIL.2009.29","DOIUrl":"https://doi.org/10.1109/STIL.2009.29","url":null,"abstract":"The Dependency-based syntactic parsing task consists in identifying a head word for each word in an input sentence. Hence, its output is a rooted tree where the nodes are the words in the sentence. State-of-the-art dependency parsing systems use transition-based or graph-based models. We present a token classification approach to dependency parsing, where any classification algorithm can be used. To evaluate its effectiveness, we apply the Entropy GuidedTransformation Learning algorithm to the CoNLL 2006 corpus, using the Unlabelled Attachment Score as the accuracy metric. Our results show that the generated models are close to the average CoNLL system performance. Additionally,these findings also indicate that the token classification approach is a promising one.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127492785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Challenges to the Creation of a Frame-Based Lexicon for the Portuguese Language: A study of the Judgement and Assessing Frames 基于框架的葡萄牙语词典创建的挑战:判断和评估框架的研究
A. Bertoldi, R. Chishman
{"title":"Challenges to the Creation of a Frame-Based Lexicon for the Portuguese Language: A study of the Judgement and Assessing Frames","authors":"A. Bertoldi, R. Chishman","doi":"10.1109/STIL.2009.40","DOIUrl":"https://doi.org/10.1109/STIL.2009.40","url":null,"abstract":"This paper presents a comparative study of Judgment and Assessing frames in English and Portuguese. The aim is to verify the possibility of using the FrameNet frames to construct a lexical database for Brazilian Portuguese. The research corpus is composed by 50 legal documents, totalizing 1.055,535 tokens and 39,108 types. Through a contrastive method the Judgment and Assessing frames were selected and translation equivalents for the English lexical units were established. The points considered in this research were the polysemy and the semantic relations of words. The polysemy is the main difficulty in applying FrameNet frames for Portuguese description.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125965381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The C-ORAL-BRASIL Corpus: Methodological Basis for the Treatment of Spontaneous Speech C-ORAL-BRASIL语料库:治疗自发性言语的方法学基础
M. Mittmann, Tommaso Raso, Heliana Mello
{"title":"The C-ORAL-BRASIL Corpus: Methodological Basis for the Treatment of Spontaneous Speech","authors":"M. Mittmann, Tommaso Raso, Heliana Mello","doi":"10.1109/STIL.2009.22","DOIUrl":"https://doi.org/10.1109/STIL.2009.22","url":null,"abstract":"This paper highlights the primary methods employed in the C-ORAL-BRASIL compiling process, i.e, recording, transcribing and segmenting oral texts. The C-ORAL-BRASIL is a Brazilian Portuguese corpus of spontaneous speech, designed for the study of informational structure. It is representative of the diaphasic variation, seeking to cover as many different comunicative situations as possible. This paper presents and exemplifies the processes of transcription and segmentation of speech into prosodic units as employed in our on-going research. It concludes with illustrations of some questions that the corpus will enable us to answer.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121680529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Extraction of Semantic Relations between Portuguese Words by Means of a Dictionary 用词典评价葡萄牙语词间语义关系的提取
Hugo Gonçalo Oliveira, Diana Santos, P. Gomes
{"title":"Evaluating the Extraction of Semantic Relations between Portuguese Words by Means of a Dictionary","authors":"Hugo Gonçalo Oliveira, Diana Santos, P. Gomes","doi":"10.1109/STIL.2009.30","DOIUrl":"https://doi.org/10.1109/STIL.2009.30","url":null,"abstract":"This paper presents PAPEL, a lexical resource for Portuguese, consisting of relations between terms, extracted by (semi) automatic means from a general dictionary. After a short overview of the building process, a quantitative overview is given together with some examples. Evaluation is then presented and discussed: for synonymy, we used a public thesaurus, Tep, for the other relations, we queried Portuguese corpora through the AC/DC interface.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126826995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Studying Portuguese as Used: the AC/DC service 学习葡萄牙语的使用:AC/DC服务
L. Costa, Diana Santos, Paulo Rocha
{"title":"Studying Portuguese as Used: the AC/DC service","authors":"L. Costa, Diana Santos, Paulo Rocha","doi":"10.1109/STIL.2009.25","DOIUrl":"https://doi.org/10.1109/STIL.2009.25","url":null,"abstract":"The AC/DC service has been giving access to Portuguese corpora through the Web since 1999. This paper describes the tasks related to processing and making the texts publicly available. It also provides an overview on the interface with which the users can query the corpora and finalizes pointing future directions.O AC/DC é um serviço que desde 1999 dá acesso a corpos emportuguês através da Internet. Neste artigo descrevemos sucintamente o processo pelo qual os textos são processados e tornados públicos e a interface através da qual se podem fazer as pesquisas. Concluímos lançando pontes para o desenvolvimento futuro deste serviço.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132817132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Classification by Literary Period Using PPM-C Data Compression 基于PPM-C数据压缩的文学时期文本分类
B. Barufaldi, E. F. Santana, José Rogério B. B. Filho, J. V. D. Poel, Milton Marques Júnior, L. Batista
{"title":"Text Classification by Literary Period Using PPM-C Data Compression","authors":"B. Barufaldi, E. F. Santana, José Rogério B. B. Filho, J. V. D. Poel, Milton Marques Júnior, L. Batista","doi":"10.1109/STIL.2009.39","DOIUrl":"https://doi.org/10.1109/STIL.2009.39","url":null,"abstract":"Methods and techniques for data compression have been used for pattern recognition, including automatic text classification. The performance of the Prediction by Partial Matching (PPM) as a text classifier has already been proofed by many works, including authorship attribution for Portuguese texts. Classes involved in classification process may not be restricted by only one author. By including two or more authors in one class, one can create a literature style. This work presents a literature style classifier for texts from Brazilian literature by using the PPM-C statistical model.","PeriodicalId":265848,"journal":{"name":"2009 Seventh Brazilian Symposium in Information and Human Language Technology","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131421797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信