{"title":"Using Language Models to Identify Language Impairment in Spanish-English Bilingual Children","authors":"T. Solorio, Yang Liu","doi":"10.3115/1572306.1572337","DOIUrl":"https://doi.org/10.3115/1572306.1572337","url":null,"abstract":"Children diagnosed with Specific Language Impairment (SLI) experience a delay in acquisition of certain language skills, with no evidence of hearing impediments, or other cognitive, behavioral, or overt neurological problems (Leonard, 1991; Paradis et al., 2005/6). Standardized tests, such as the Test for Early Grammatical Impairment, have shown to have great predictive value for assessing English speaking monolingual children. Diagnosing bilingual children with SLI is far more complicated due to the following factors: lack of standardized tests, lack of bilingual clinicians, and more importantly, the lack of a deep understanding of bilingualism and its implications on language disorders. In addition, bilingual children often exhibit code-switching patterns that will make the assessment task even more challenging. In this paper, we present preliminary results from using language models to help discriminating bilingual children with SLI from Typically-Developing (TD) bilingual children.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127179751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
György Szarvas, V. Vincze, Richárd Farkas, J. Csirik
{"title":"The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts","authors":"György Szarvas, V. Vincze, Richárd Farkas, J. Csirik","doi":"10.3115/1572306.1572314","DOIUrl":"https://doi.org/10.3115/1572306.1572314","url":null,"abstract":"This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negative and speculative keywords and at the sentence level for their linguistic scope. The annotation process was carried out by two independent linguist annotators and a chief annotator -- also responsible for setting up the annotation guidelines -- who resolved cases where the annotators disagreed. We will report our statistics on corpus size, ambiguity levels and the consistency of annotations.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132863059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting Protein-Protein Interaction based on Discriminative Training of the Hidden Vector State Model","authors":"Deyu Zhou, Yulan He","doi":"10.3115/1572306.1572328","DOIUrl":"https://doi.org/10.3115/1572306.1572328","url":null,"abstract":"The knowledge about gene clusters and protein interactions is important for biological researchers to unveil the mechanism of life. However, large quantity of the knowledge often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and deep parsing, have been proposed especially for extracting protein-protein interactions (Zhou and He, 2008).","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127658485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping Clinical Notes to Medical Terminologies at Point of Care","authors":"Yefeng Wang, J. Patrick","doi":"10.3115/1572306.1572330","DOIUrl":"https://doi.org/10.3115/1572306.1572330","url":null,"abstract":"Clinicians write the reports in natural language which contains a large amount of informal medical term. Automating conversion of text into clinical terminologies allows reliable retrieval and analysis of the clinical notes. We have created an algorithm that maps medical expressions in clinical notes into a medical terminology. This algorithm indexes medical terms into an augmented lexicon. It performs lexical searches in text and finds the longest possible matches in the target terminology, SNOMED CT. The mapping system was run on a collection of 470,000 clinical notes from an Intensive Care Service (ICS). The evaluation on a small part of the copus shows the precision is 70.4%.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124647498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Approach to Reducing Annotation Costs for BioNLP","authors":"Michael Bloodgood, K. Vijay-Shanker","doi":"10.13016/M2VC7V","DOIUrl":"https://doi.org/10.13016/M2VC7V","url":null,"abstract":"There is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, burdensome annotation costs, Support Vector Machines (SVMs) work well for the task, and imbalanced datasets (i.e. when set up as a binary classification problem, one class is substantially rarer than the other). Many BioNLP tasks have these characteristics and thus our AL algorithm is a natural approach to apply to BioNLP tasks.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116358059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Airola, Sampo Pyysalo, Jari Björne, T. Pahikkala, Filip Ginter, T. Salakoski
{"title":"A Graph Kernel for Protein-Protein Interaction Extraction","authors":"A. Airola, Sampo Pyysalo, Jari Björne, T. Pahikkala, Filip Ginter, T. Salakoski","doi":"10.3115/1572306.1572308","DOIUrl":"https://doi.org/10.3115/1572306.1572308","url":null,"abstract":"In this paper, we propose a graph kernel based approach for the automated extraction of protein-protein interactions (PPI) from scientific literature. In contrast to earlier approaches to PPI extraction, the introduced all-dependency-paths kernel has the capability to consider full, general dependency graphs. We evaluate the proposed method across five publicly available PPI corpora providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. Our method is shown to achieve state-of-the-art performance with respect to comparable evaluations, achieving 56.4 F-score and 84.8 AUC on the AImed corpus. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect cross-validation strategies and problems related to comparing F-score results achieved on different evaluation resources.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133947708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Species Disambiguation for Biomedical Term Identification","authors":"Xinglong Wang, Michael Matthews","doi":"10.3115/1572306.1572320","DOIUrl":"https://doi.org/10.3115/1572306.1572320","url":null,"abstract":"An important task in information extraction (IE) from biomedical articles is term identification (TI), which concerns linking entity mentions (e.g., terms denoting proteins) in text to unambiguous identifiers in standard databases (e.g., RefSeq). Previous work on TI has focused on species-specific documents. However, biomedical documents, especially full-length articles, often talk about entities across a number of species, in which case resolving species ambiguity becomes an indispensable part of ti. This paper describes our rule-based and machine-learning based approaches to species disambiguation and demonstrates that performance of TI can be improved by over 20% if the correct species are known. We also show that using the species predicted by the automatic species taggers can improve TI by a large margin.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125193902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining the Biomedical Literature for Genic Information","authors":"Catalina O. Tudor, K. Vijay-Shanker, C. Schmidt","doi":"10.3115/1572306.1572311","DOIUrl":"https://doi.org/10.3115/1572306.1572311","url":null,"abstract":"eGIFT (Extracting Gene Information From Text) is an intelligent system which is intended to aid scientists in surveying literature relevant to genes of interest. From a gene specific set of abstracts retrieved from PubMed, eGIFT determines the most important terms associated with the given gene. Annotators using eGIFT can quickly find articles describing gene functions and individuals scientists surveying the results of high-throughput experiments can quickly extract information important to their hits.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"405 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131892109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal Annotation of Clinical Text","authors":"D. Mowery, H. Harkema, W. Chapman","doi":"10.3115/1572306.1572332","DOIUrl":"https://doi.org/10.3115/1572306.1572332","url":null,"abstract":"We developed a temporal annotation schema that provides a structured method to capture contextual and temporal features of clinical conditions found in clinical reports. In this poster we describe the elements of the annotation schema and provide results of an initial annotation study on a document set comprising six different types of clinical reports.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125260656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Horn, B. Bakker, G. Geleijnse, J. Korst, S. Kurkin
{"title":"Determining causal and non-causal relationships in biomedical text by classifying verbs using a Naive Bayesian Classifier","authors":"P. Horn, B. Bakker, G. Geleijnse, J. Korst, S. Kurkin","doi":"10.3115/1572306.1572335","DOIUrl":"https://doi.org/10.3115/1572306.1572335","url":null,"abstract":"Since scientific journals are still the most important means of documenting biological findings, biomedical articles are the best source of information we have on protein-protein interactions. The mining of this information will provide us with specific knowledge of the presence and types of interactions, and the circumstances in which they occur.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124225340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}