Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar M. Cumbicus-Pineda, Aitor Soroa Etxabe
{"title":"IrekiaLFes: a New Open Benchmark and Baseline Systems for Spanish Automatic Text Simplification","authors":"Itziar Gonzalez-Dios, Iker Gutiérrez-Fandiño, Oscar M. Cumbicus-Pineda, Aitor Soroa Etxabe","doi":"10.18653/v1/2022.tsar-1.8","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.8","url":null,"abstract":"Automatic Text simplification (ATS) seeks to reduce the complexity of a text for a general public or a target audience. In the last years, deep learning methods have become the most used systems in ATS research, but these systems need large and good quality datasets to be evaluated. Moreover, these data are available on a large scale only for English and in some cases with restrictive licenses. In this paper, we present IrekiaLF_es, an open-license benchmark for Spanish text simplification. It consists of a document-level corpus and a sentence-level test set that has been manually aligned. We also conduct a neurolinguistically-based evaluation of the corpus in order to reveal its suitability for text simplification. This evaluation follows the Lexicon-Unification-Linearity (LeULi) model of neurolinguistic complexity assessment. Finally, we present a set of experiments and baselines of ATS systems in a zero-shot scenario.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129496224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RCML at TSAR-2022 Shared Task: Lexical Simplification With Modular Substitution Candidate Ranking","authors":"Desislava Aleksandrova, Olivier Brochu Dufour","doi":"10.18653/v1/2022.tsar-1.29","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.29","url":null,"abstract":"This paper describes the lexical simplification system RCML submitted to the English language track of the TSAR-2022 Shared Task. The system leverages a pre-trained language model to generate contextually plausible substitution candidates which are then ranked according to their simplicity as well as their grammatical and semantic similarity to the target complex word. Our submissions secure 6th and 7th places out of 33, improving over the SOTA baseline for 27 out of the 51 metrics.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130148753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Text Simplification with Factuality Error Detection","authors":"Yuan Ma, Sandaru Seneviratne, E. Daskalaki","doi":"10.18653/v1/2022.tsar-1.16","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.16","url":null,"abstract":"In the past few years, the field of text simplification has been dominated by supervised learning approaches thanks to the appearance of large parallel datasets such as Wikilarge and Newsela. However, these datasets suffer from sentence pairs with factuality errors which compromise the models’ performance. So, we proposed a model-independent factuality error detection mechanism, considering bad simplification and bad alignment, to refine the Wikilarge dataset through reducing the weight of these samples during training. We demonstrated that this approach improved the performance of the state-of-the-art text simplification model TST5 by an FKGL reduction of 0.33 and 0.29 on the TurkCorpus and ASSET testing datasets respectively. Our study illustrates the impact of erroneous samples in TS datasets and highlights the need for automatic methods to improve their quality.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130188820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CILS at TSAR-2022 Shared Task: Investigating the Applicability of Lexical Substitution Methods for Lexical Simplification","authors":"Sandaru Seneviratne, E. Daskalaki, H. Suominen","doi":"10.18653/v1/2022.tsar-1.21","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.21","url":null,"abstract":"Lexical simplification — which aims to simplify complex text through the replacement of difficult words using simpler alternatives while maintaining the meaning of the given text — is popular as a way of improving text accessibility for both people and computers. First, lexical simplification through substitution can improve the understandability of complex text for, for example, non-native speakers, second language learners, and people with low literacy. Second, its usefulness has been demonstrated in many natural language processing problems like data augmentation, paraphrase generation, or word sense induction. In this paper, we investigated the applicability of existing unsupervised lexical substitution methods based on pre-trained contextual embedding models and WordNet, which incorporate Context Information, for Lexical Simplification (CILS). Although the performance of this CILS approach has been outstanding in lexical substitution tasks, its usefulness was limited at the TSAR-2022 shared task on lexical simplification. Consequently, a minimally supervised approach with careful tuning to a given simplification task may work better than unsupervised methods. Our investigation also encouraged further work on evaluating the simplicity of potential candidates and incorporating them into the lexical simplification methods.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128585958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexically Constrained Decoding with Edit Operation Prediction for Controllable Text Simplification","authors":"Tatsuya Zetsu, Tomoyuki Kajiwara, Yuki Arase","doi":"10.18653/v1/2022.tsar-1.13","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.13","url":null,"abstract":"Controllable text simplification assists language learners by automatically rewriting complex sentences into simpler forms of a target level. However, existing methods tend to perform conservative edits that keep complex words intact. To address this problem, we employ lexically constrained decoding to encourage rewriting. Specifically, the proposed method predicts edit operations conditioned to a target level and creates positive/negative constraints for words that should/should not appear in an output sentence. The experimental results confirm that our method significantly outperforms previous methods and demonstrates a new state-of-the-art performance.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129265239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai North, Alphaeus Dmonte, Tharindu Ranasinghe, Marcos Zampieri
{"title":"GMU-WLV at TSAR-2022 Shared Task: Evaluating Lexical Simplification Models","authors":"Kai North, Alphaeus Dmonte, Tharindu Ranasinghe, Marcos Zampieri","doi":"10.18653/v1/2022.tsar-1.30","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.30","url":null,"abstract":"This paper describes team GMU-WLV submission to the TSAR shared-task on multilingual lexical simplification. The goal of the task is to automatically provide a set of candidate substitutions for complex words in context. The organizers provided participants with ALEXSIS a manually annotated dataset with instances split between a small trial set with a dozen instances in each of the three languages of the competition (English, Portuguese, Spanish) and a test set with over 300 instances in the three aforementioned languages. To cope with the lack of training data, participants had to either use alternative data sources or pre-trained language models. We experimented with monolingual models: BERTimbau, ELECTRA, and RoBERTA-largeBNE. Our best system achieved 1st place out of sixteen systems for Portuguese, 8th out of thirty-three systems for English, and 6th out of twelve systems for Spanish.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125120739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Target-Level Sentence Simplification as Controlled Paraphrasing","authors":"Tannon Kew, Sarah Ebling","doi":"10.18653/v1/2022.tsar-1.4","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.4","url":null,"abstract":"Automatic text simplification aims to reduce the linguistic complexity of a text in order to make it easier to understand and more accessible. However, simplified texts are consumed by a diverse array of target audiences and what might be appropriately simplified for one group of readers may differ considerably for another. In this work we investigate a novel formulation of sentence simplification as paraphrasing with controlled decoding. This approach aims to alleviate the major burden of relying on large amounts of in-domain parallel training data, while at the same time allowing for modular and adaptive simplification. According to automatic metrics, our approach performs competitively against baselines that prove more difficult to adapt to the needs of different target audiences or require significant amounts of complex-simple parallel aligned data.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125920266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jan Trienes, Jörg Schlötterer, H. Schildhaus, C. Seifert
{"title":"Patient-friendly Clinical Notes: Towards a new Text Simplification Dataset","authors":"Jan Trienes, Jörg Schlötterer, H. Schildhaus, C. Seifert","doi":"10.18653/v1/2022.tsar-1.3","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.3","url":null,"abstract":"Automatic text simplification can help patients to better understand their own clinical notes. A major hurdle for the development of clinical text simplification methods is the lack of high quality resources. We report ongoing efforts in creating a parallel dataset of professionally simplified clinical notes. Currently, this corpus consists of 851 document-level simplifications of German pathology reports. We highlight characteristics of this dataset and establish first baselines for paragraph-level simplification.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133770101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JADES: New Text Simplification Dataset in Japanese Targeted at Non-Native Speakers","authors":"Akio Hayakawa, Tomoyuki Kajiwara, Hiroki Ouchi, Taro Watanabe","doi":"10.18653/v1/2022.tsar-1.17","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.17","url":null,"abstract":"The user-dependency of Text Simplification makes its evaluation obscure. A targeted evaluation dataset clarifies the purpose of simplification, though its specification is hard to define. We built JADES (JApanese Dataset for the Evaluation of Simplification), a text simplification dataset targeted at non-native Japanese speakers, according to public vocabulary and grammar profiles. JADES comprises 3,907 complex-simple sentence pairs annotated by an expert. Analysis of JADES shows that wide and multiple rewriting operations were applied through simplification. Furthermore, we analyzed outputs on JADES from several benchmark systems and automatic and manual scores of them. Results of these analyses highlight differences between English and Japanese in operations and evaluations.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133089397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"teamPN at TSAR-2022 Shared Task: Lexical Simplification using Multi-Level and Modular Approach","authors":"Nikita Nikita, P. Rajpoot","doi":"10.18653/v1/2022.tsar-1.26","DOIUrl":"https://doi.org/10.18653/v1/2022.tsar-1.26","url":null,"abstract":"Lexical Simplification is the process of reducing the lexical complexity of a text by replacing difficult words with easier-to-read (or understand) expressions while preserving the original information and meaning. This paper explains the work done by our team “teamPN” for the English track of TSAR 2022 Shared Task of Lexical Simplification. We created a modular pipeline which combines transformers based models with traditional NLP methods like paraphrasing and verb sense disambiguation. We created a multi-level and modular pipeline where the target text is treated according to its semantics (Part of Speech Tag). The pipeline is multi-level as we utilize multiple source models to find potential candidates for replacement. It is modular as we can switch the source models and their weighting in the final re-ranking.","PeriodicalId":247582,"journal":{"name":"Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114547350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}