Leili Tavabi, Trang Tran, Kalin Stefanov, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani
{"title":"Analysis of Behavior Classification in Motivational Interviewing.","authors":"Leili Tavabi, Trang Tran, Kalin Stefanov, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani","doi":"10.18653/v1/2021.clpsych-1.13","DOIUrl":"10.18653/v1/2021.clpsych-1.13","url":null,"abstract":"<p><p>Analysis of client and therapist behavior in counseling sessions can provide helpful insights for assessing the quality of the session and consequently, the client's behavioral outcome. In this paper, we study the automatic classification of standardized behavior codes (i.e. annotations) used for assessment of psychotherapy sessions in Motivational Interviewing (MI). We develop models and examine the classification of client behaviors throughout MI sessions, comparing the performance by models trained on large pretrained embeddings (RoBERTa) versus interpretable and expert-selected features (LIWC). Our best performing model using the pretrained RoBERTa embeddings beats the baseline model, achieving an F1 score of 0.66 in the subject-independent 3-class classification. Through statistical analysis on the classification results, we identify prominent LIWC features that may not have been captured by the model using pretrained embeddings. Although classification using LIWC features underperforms RoBERTa, our findings motivate the future direction of incorporating auxiliary tasks in the classification of MI codes.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"110-115"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321779/pdf/nihms-1727153.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39266882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, Harry Hochheiser
{"title":"TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora.","authors":"Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, Harry Hochheiser","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https://textessence.github.io.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"106-115"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8212692/pdf/nihms-1710045.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39251210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denis Newman-Griffis, Jill Fain Lehman, Carolyn Rosé, Harry Hochheiser
{"title":"Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research.","authors":"Denis Newman-Griffis, Jill Fain Lehman, Carolyn Rosé, Harry Hochheiser","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of <i>Translational NLP</i>, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"4125-4138"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223521/pdf/nihms-1710048.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39115253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H Andrew Schwartz
{"title":"Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality.","authors":"Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H Andrew Schwartz","doi":"10.18653/v1/2021.naacl-main.357","DOIUrl":"https://doi.org/10.18653/v1/2021.naacl-main.357","url":null,"abstract":"<p><p>In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just <math> <mrow><mfrac><mn>1</mn> <mrow><mn>12</mn></mrow> </mfrac> </mrow> </math> of the embedding dimensions.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"4515-4532"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294338/pdf/nihms-1716243.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39215546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Griffin Adams, Emily Alsentzer, Mert Ketenci, Jason Zucker, Noémie Elhadad
{"title":"What's in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization.","authors":"Griffin Adams, Emily Alsentzer, Mert Ketenci, Jason Zucker, Noémie Elhadad","doi":"10.18653/v1/2021.naacl-main.382","DOIUrl":"10.18653/v1/2021.naacl-main.382","url":null,"abstract":"<p><p>Summarization of clinical narratives is a long-standing research problem. Here, we introduce the task of hospital-course summarization. Given the documentation authored throughout a patient's hospitalization, generate a paragraph that tells the story of the patient admission. We construct an English, text-to-text dataset of 109,000 hospitalizations (2M source notes) and their corresponding summary proxy: the clinician-authored \"Brief Hospital Course\" paragraph written as part of a discharge note. Exploratory analyses reveal that the BHC paragraphs are highly abstractive with some long extracted fragments; are concise yet comprehensive; differ in style and content organization from the source notes; exhibit minimal lexical cohesion; and represent silver-standard references. Our analysis identifies multiple implications for modeling this complex, multi-document summarization task.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"4794-4811"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8225248/pdf/nihms-1705151.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39115254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Word centrality constrained representation for keyphrase extraction.","authors":"Zelalem Gero, Joyce C Ho","doi":"10.18653/v1/2021.bionlp-1.17","DOIUrl":"https://doi.org/10.18653/v1/2021.bionlp-1.17","url":null,"abstract":"<p><p>To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient concepts in a document. Various supervised approaches model keyphrase extraction using local context to predict the label for each token and perform much better than the unsupervised counterparts. Unfortunately, this method fails for short documents where the context is unclear. Moreover, keyphrases, which are usually the gist of a document, need to be the central theme. We propose a new extraction model that introduces a centrality constraint to enrich the word representation of a Bidirectional long short-term memory. Performance evaluation on two publicly available datasets demonstrate our model outperforms existing state-of-the art approaches. Our model is publicly available at https://github.com/ZHgero/keyphrases_centrality.git.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":" ","pages":"155-161"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9208728/pdf/nihms-1815573.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40396966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denis Newman-Griffis, J. Lehman, C. Ros'e, H. Hochheiser
{"title":"Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research","authors":"Denis Newman-Griffis, J. Lehman, C. Ros'e, H. Hochheiser","doi":"10.18653/V1/2021.NAACL-MAIN.325","DOIUrl":"https://doi.org/10.18653/V1/2021.NAACL-MAIN.325","url":null,"abstract":"Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"21 1","pages":"4125-4138"},"PeriodicalIF":0.0,"publicationDate":"2021-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86528175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashwin Devaraj, I. Marshall, Byron C. Wallace, J. Li
{"title":"Paragraph-level Simplification of Medical Texts","authors":"Ashwin Devaraj, I. Marshall, Byron C. Wallace, J. Li","doi":"10.18653/V1/2021.NAACL-MAIN.395","DOIUrl":"https://doi.org/10.18653/V1/2021.NAACL-MAIN.395","url":null,"abstract":"We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline encoder-decoder Transformer models for simplification and propose a novel augmentation to these in which we explicitly penalize the decoder for producing “jargon” terms; we find that this yields improvements over baselines in terms of readability.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"7 1","pages":"4972-4984"},"PeriodicalIF":0.0,"publicationDate":"2021-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78695199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, E. Fosler-Lussier, H. Hochheiser
{"title":"TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora","authors":"Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, E. Fosler-Lussier, H. Hochheiser","doi":"10.18653/v1/2021.naacl-demos.13","DOIUrl":"https://doi.org/10.18653/v1/2021.naacl-demos.13","url":null,"abstract":"Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https://textessence.github.io.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"48 1","pages":"106-115"},"PeriodicalIF":0.0,"publicationDate":"2021-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87979469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin E Nye, Ani Nenkova, Iain J Marshall, Byron C Wallace
{"title":"<i>Trialstreamer</i>: Mapping and Browsing Medical Evidence in Real-Time.","authors":"Benjamin E Nye, Ani Nenkova, Iain J Marshall, Byron C Wallace","doi":"10.18653/v1/2020.acl-demos.9","DOIUrl":"https://doi.org/10.18653/v1/2020.acl-demos.9","url":null,"abstract":"<p><p>We introduce <i>Trialstreamer</i>, a living database of clinical trial reports. Here we mainly describe the <i>evidence extraction</i> component; this extracts from biomedical abstracts key pieces of information that clinicians need when appraising the literature, and also the relations between these. Specifically, the system extracts descriptions of trial participants, the treatments compared in each arm (the <i>interventions</i>), and which outcomes were measured. The system then attempts to infer which interventions were reported to work best by determining their relationship with identified trial outcome measures. In addition to summarizing individual trials, these extracted data elements allow automatic synthesis of results across many trials on the same topic. We apply the system at scale to all reports of randomized controlled trials indexed in MEDLINE, powering the automatic generation of <i>evidence maps</i>, which provide a global view of the efficacy of different interventions combining data from all relevant clinical trials on a topic. We make all code and models freely available alongside a demonstration of the web interface.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2020 ","pages":"63-69"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8204713/pdf/nihms-1593346.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39239461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}