International Workshop on Health Text Mining and Information Analysis最新文献

筛选
英文 中文
Defining and Learning Refined Temporal Relations in the Clinical Narrative 临床叙事中精细时间关系的定义与学习
International Workshop on Health Text Mining and Information Analysis Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.12
Kristin Wright-Bettner, Chen Lin, T. Miller, Steven Bethard, Dmitriy Dligach, Martha Palmer, James H. Martin, G. Savova
{"title":"Defining and Learning Refined Temporal Relations in the Clinical Narrative","authors":"Kristin Wright-Bettner, Chen Lin, T. Miller, Steven Bethard, Dmitriy Dligach, Martha Palmer, James H. Martin, G. Savova","doi":"10.18653/v1/2020.louhi-1.12","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.12","url":null,"abstract":"We present refinements over existing temporal relation annotations in the Electronic Medical Record clinical narrative. We refined the THYME corpus annotations to more faithfully represent nuanced temporality and nuanced temporal-coreferential relations. The main contributions are in re-defining CONTAINS and OVERLAP relations into CONTAINS, CONTAINS-SUBEVENT, OVERLAP and NOTED-ON. We demonstrate that these refinements lead to substantial gains in learnability for state-of-the-art transformer models as compared to previously reported results on the original THYME corpus. We thus establish a baseline for the automatic extraction of these refined temporal relations. Although our study is done on clinical narrative, we believe it addresses far-reaching challenges that are corpus- and domain- agnostic.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114690833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Normalization of Long-tail Adverse Drug Reactions in Social Media 社交媒体中药物不良反应的长尾规范化
International Workshop on Health Text Mining and Information Analysis Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.6
Emmanouil Manousogiannis, S. Mesbah, A. Bozzon, Robert-Jan Sips, Zoltan Szlanik, Selene Baez
{"title":"Normalization of Long-tail Adverse Drug Reactions in Social Media","authors":"Emmanouil Manousogiannis, S. Mesbah, A. Bozzon, Robert-Jan Sips, Zoltan Szlanik, Selene Baez","doi":"10.18653/v1/2020.louhi-1.6","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.6","url":null,"abstract":"The automatic mapping of Adverse Drug Reaction (ADR) reports from user-generated content to concepts in a controlled medical vocabulary provides valuable insights for monitoring public health. While state-of-the-art deep learning-based sequence classification techniques achieve impressive performance for medical concepts with large amounts of training data, they show their limit with long-tail concepts that have a low number of training samples. The above hinders their adaptability to the changes of layman’s terminology and the constant emergence of new informal medical terms. Our objective in this paper is to tackle the problem of normalizing long-tail ADR mentions in user-generated content. In this paper, we exploit the implicit semantics of rare ADRs for which we have few training samples, in order to detect the most similar class for the given ADR. The evaluation results demonstrate that our proposed approach addresses the limitations of the existing techniques when the amount of training data is limited.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128403730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation of Machine Translation Methods applied to Medical Terminologies 医学术语机器翻译方法的评价
International Workshop on Health Text Mining and Information Analysis Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.7
Konstantinos Skianis, Y. Briand, F. Desgrippes
{"title":"Evaluation of Machine Translation Methods applied to Medical Terminologies","authors":"Konstantinos Skianis, Y. Briand, F. Desgrippes","doi":"10.18653/v1/2020.louhi-1.7","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.7","url":null,"abstract":"Medical terminologies resources and standards play vital roles in clinical data exchanges, enabling significantly the services’ interoperability within healthcare national information networks. Health and medical science are constantly evolving causing requirements to advance the terminologies editions. In this paper, we present our evaluation work of the latest machine translation techniques addressing medical terminologies. Experiments have been conducted leveraging selected statistical and neural machine translation methods. The devised procedure is tested on a validated sample of ICD-11 and ICF terminologies from English to French with promising results.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133580064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating 使用预训练RoBERTa语言模型识别药物效果的个人体验推文及其更新
International Workshop on Health Text Mining and Information Analysis Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.14
Minghao Zhu, Yoonkyoung Song, Ge Jin, Keyuan Jiang
{"title":"Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating","authors":"Minghao Zhu, Yoonkyoung Song, Ge Jin, Keyuan Jiang","doi":"10.18653/v1/2020.louhi-1.14","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.14","url":null,"abstract":"Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116619349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only 仅使用英文注释检测多语言食源性疾病投诉
International Workshop on Health Text Mining and Information Analysis Pub Date : 2020-10-11 DOI: 10.18653/v1/2020.louhi-1.15
Ziyi Liu, Giannis Karamanolakis, Daniel J. Hsu, L. Gravano
{"title":"Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only","authors":"Ziyi Liu, Giannis Karamanolakis, Daniel J. Hsu, L. Gravano","doi":"10.18653/v1/2020.louhi-1.15","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.15","url":null,"abstract":"Health departments have been deploying text classification systems for the early detection of foodborne illness complaints in social media documents such as Yelp restaurant reviews. Current systems have been successfully applied for documents in English and, as a result, a promising direction is to increase coverage and recall by considering documents in additional languages, such as Spanish or Chinese. Training previous systems for more languages, however, would be expensive, as it would require the manual annotation of many documents for each new target language. To address this challenge, we consider cross-lingual learning and train multilingual classifiers using only the annotations for English-language reviews. Recent zero-shot approaches based on pre-trained multi-lingual BERT (mBERT) have been shown to effectively align languages for aspects such as sentiment. Interestingly, we show that those approaches are less effective for capturing the nuances of foodborne illness, our public health application of interest. To improve performance without extra annotations, we create artificial training documents in the target language through machine translation and train mBERT jointly for the source (English) and target language. Furthermore, we show that translating labeled documents to multiple languages leads to additional performance improvements for some target languages. We demonstrate the benefits of our approach through extensive experiments with Yelp restaurant reviews in seven languages. Our classifiers identify foodborne illness complaints in multilingual reviews from the Yelp Challenge dataset, which highlights the potential of our general approach for deployment in health departments.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115648097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Specializing Static and Contextual Embeddings in the Medical Domain Using Knowledge Graphs: Let’s Keep It Simple 使用知识图在医学领域专门研究静态和上下文嵌入:让我们保持简单
International Workshop on Health Text Mining and Information Analysis Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.louhi-1.9
Hicham El Boukkouri, Olivier Ferret, T. Lavergne, Pierre Zweigenbaum
{"title":"Specializing Static and Contextual Embeddings in the Medical Domain Using Knowledge Graphs: Let’s Keep It Simple","authors":"Hicham El Boukkouri, Olivier Ferret, T. Lavergne, Pierre Zweigenbaum","doi":"10.18653/v1/2022.louhi-1.9","DOIUrl":"https://doi.org/10.18653/v1/2022.louhi-1.9","url":null,"abstract":"Domain adaptation of word embeddings has mainly been explored in the context of retraining general models on large specialized corpora. While this usually yields good results, we argue that knowledge graphs, which are used less frequently, could also be utilized to enhance existing representations with specialized knowledge. In this work, we aim to shed some light on whether such knowledge injection could be achieved using a basic set of tools: graph-level embeddings and concatenation. To that end, we adopt an incremental approach where we first demonstrate that static embeddings can indeed be improved through concatenation with in-domain node2vec representations. Then, we validate this approach on contextual models and generalize it further by proposing a variant of BERT that incorporates knowledge embeddings within its hidden states through the same process of concatenation. We show that this variant outperforms plain retraining on several specialized tasks, then discuss how this simple approach could be improved further. Both our code and pre-trained models are open-sourced for future research. In this work, we conduct experiments that target the medical domain and the English language.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115002817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving information fusion on multimodal clinical data in classification settings 改进多模式临床数据在分类设置中的信息融合
International Workshop on Health Text Mining and Information Analysis Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.louhi-1.18
Sneha Jha, Erik Mayer, Mauricio Barahona
{"title":"Improving information fusion on multimodal clinical data in classification settings","authors":"Sneha Jha, Erik Mayer, Mauricio Barahona","doi":"10.18653/v1/2022.louhi-1.18","DOIUrl":"https://doi.org/10.18653/v1/2022.louhi-1.18","url":null,"abstract":"Clinical data often exists in different forms across the lifetime of a patient’s interaction with the healthcare system - structured, unstructured or semi-structured data in the form of laboratory readings, clinical notes, diagnostic codes, imaging and audio data of various kinds, and other observational data. Formulating a representation model that aggregates information from these heterogeneous sources may allow us to jointly model on data with more predictive signal than noise and help inform our model with useful constraints learned from better data. Multimodal fusion approaches help produce representations combined from heterogeneous modalities, which can be used for clinical prediction tasks. Representations produced through different fusion techniques require different training strategies. We investigate the advantage of adding narrative clinical text to structured modalities to classification tasks in the clinical domain. We show that while there is a competitive advantage in combined representations of clinical data, the approach can be helped by training guidance customized to each modality. We show empirical results across binary/multiclass settings, single/multitask settings and unified/multimodal learning rate settings for early and late information fusion of clinical data.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115729033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Limits of Straightforward Models for Nested Named Entity Recognition in Spanish Clinical Narratives 评估西班牙临床叙述中嵌套命名实体识别的直接模型的局限性
International Workshop on Health Text Mining and Information Analysis Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.louhi-1.2
Matías Rojas, C. Carrino, Aitor Gonzalez-Agirre, J. Dunstan, Marta Villegas
{"title":"Assessing the Limits of Straightforward Models for Nested Named Entity Recognition in Spanish Clinical Narratives","authors":"Matías Rojas, C. Carrino, Aitor Gonzalez-Agirre, J. Dunstan, Marta Villegas","doi":"10.18653/v1/2022.louhi-1.2","DOIUrl":"https://doi.org/10.18653/v1/2022.louhi-1.2","url":null,"abstract":"Nested Named Entity Recognition (NER) is an information extraction task that aims to identify entities that may be nested within other entity mentions. Despite the availability of several corpora with nested entities in the Spanish clinical domain, most previous work has overlooked them due to the lack of models and a clear annotation scheme for dealing with the task. To fill this gap, this paper provides an empirical study of straightforward methods for tackling the nested NER task on two Spanish clinical datasets, Clinical Trials, and the Chilean Waiting List. We assess the advantages and limitations of two sequence labeling approaches; one based on Multiple LSTM-CRF architectures and another on Joint labeling models. To better understand the differences between these models, we compute task-specific metrics that adequately measure the ability of models to detect nested entities and perform a fine-grained comparison across models. Our experimental results show that employing domain-specific language models trained from scratch significantly improves the performance obtained with strong domain-specific and general-domain baselines, achieving state-of-the-art results in both datasets. Specifically, we obtained F1 scores of 89.21 and 83.16 in Clinical Trials and the Chilean Waiting List, respectively. Interestingly enough, we observe that the task-specific metrics and analysis properly reflect the limitations of the models when recognizing nested entities. Finally, we perform a case study on an aggregated NER dataset created from several clinical corpora in Spanish. We highlight how entity length and the simultaneous recognition of inner and outer entities are the most critical variables for the nested NER task.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116288075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter Efficient Transfer Learning for Suicide Attempt and Ideation Detection 自杀企图和意念检测的参数高效迁移学习
International Workshop on Health Text Mining and Information Analysis Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.louhi-1.13
Bhanu Pratap Singh Rawat, Hong Yu
{"title":"Parameter Efficient Transfer Learning for Suicide Attempt and Ideation Detection","authors":"Bhanu Pratap Singh Rawat, Hong Yu","doi":"10.18653/v1/2022.louhi-1.13","DOIUrl":"https://doi.org/10.18653/v1/2022.louhi-1.13","url":null,"abstract":"Pre-trained language models (LMs) have been deployed as the state-of-the-art natural language processing (NLP) approaches for multiple clinical applications. Model generalisability is important in clinical domain due to the low available resources. In this study, we evaluated transfer learning techniques for an important clinical application: detecting suicide attempt (SA) and suicide ideation (SI) in electronic health records (EHRs). Using the annotation guideline provided by the authors of ScAN, we annotated two EHR datasets from different hospitals. We then fine-tuned ScANER, a publicly available SA and SI detection model, to evaluate five different parameter efficient transfer learning techniques, such as adapter-based learning and soft-prompt tuning, on the two datasets. Without any fine-tuning, ScANER achieve macro F1-scores of 0.85 and 0.87 for SA and SI evidence detection across the two datasets. We observed that by fine-tuning less than ~2% of ScANER’s parameters, we were able to further improve the macro F1-score for SA-SI evidence detection by 3% and 5% for the two EHR datasets. Our results show that parameter-efficient transfer learning methods can help improve the performance of publicly available clinical models on new hospital datasets with few annotations.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124014888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of Heterogeneous Knowledge Sources for Biomedical Text Processing 生物医学文本处理中异构知识库的集成
International Workshop on Health Text Mining and Information Analysis Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.louhi-1.25
Parsa Bagherzadeh, S. Bergler
{"title":"Integration of Heterogeneous Knowledge Sources for Biomedical Text Processing","authors":"Parsa Bagherzadeh, S. Bergler","doi":"10.18653/v1/2022.louhi-1.25","DOIUrl":"https://doi.org/10.18653/v1/2022.louhi-1.25","url":null,"abstract":"Recently, research into bringing outside knowledge sources into current neural NLP models has been increasing. Most approaches that leverage external knowledge sources require laborious and non-trivial designs, as well as tailoring the system through intensive ablation of different knowledge sources, an effort that discourages users to use quality ontological resources. In this paper, we show that multiple large heterogeneous KSs can be easily integrated using a decoupled approach, allowing for an automatic ablation of irrelevant KSs, while keeping the overall parameter space tractable. We experiment with BERT and pre-trained graph embeddings, and show that they interoperate well without performance degradation, even when some do not contribute to the task.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115786488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信