{"title":"Killing me softly: Creative and cognitive aspects of implicitness in abusive language online","authors":"Simona Frenda, V. Patti, Paolo Rosso","doi":"10.1017/s1351324922000316","DOIUrl":"https://doi.org/10.1017/s1351324922000316","url":null,"abstract":"\u0000 Abusive language is becoming a problematic issue for our society. The spread of messages that reinforce social and cultural intolerance could have dangerous effects in victims’ life. State-of-the-art technologies are often effective on detecting explicit forms of abuse, leaving unidentified the utterances with very weak offensive language but a strong hurtful effect. Scholars have advanced theoretical and qualitative observations on specific indirect forms of abusive language that make it hard to be recognized automatically. In this work, we propose a battery of statistical and computational analyses able to support these considerations, with a focus on creative and cognitive aspects of the implicitness, in texts coming from different sources such as social media and news. We experiment with transformers, multi-task learning technique, and a set of linguistic features to reveal the elements involved in the implicit and explicit manifestations of abuses, providing a solid basis for computational applications.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2022-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45738669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empirical study of incorporating syntactic constraints into BERT-based location metonymy resolution","authors":"Hao Wang, Siyuan Du, X. Zheng, Li Meng","doi":"10.1017/S135132492200033X","DOIUrl":"https://doi.org/10.1017/S135132492200033X","url":null,"abstract":"Abstract Metonymy resolution (MR) is a challenging task in the field of natural language processing. The task of MR aims to identify the metonymic usage of a word that employs an entity name to refer to another target entity. Recent BERT-based methods yield state-of-the-art performances. However, they neither make full use of the entity information nor explicitly consider syntactic structure. In contrast, in this paper, we argue that the metonymic process should be completed in a collaborative manner, relying on both lexical semantics and syntactic structure (syntax). This paper proposes a novel approach to enhancing BERT-based MR models with hard and soft syntactic constraints by using different types of convolutional neural networks to model dependency parse trees. Experimental results on benchmark datasets (e.g., ReLocaR, SemEval 2007 and WiMCor) confirm that leveraging syntactic information into fine pre-trained language models benefits MR tasks.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"669 - 692"},"PeriodicalIF":2.5,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47772219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingshu Liu, E. Morin, Sebastian Peña Saldarriaga, Joseph Lark
{"title":"From unified phrase representation to bilingual phrase alignment in an unsupervised manner","authors":"Jingshu Liu, E. Morin, Sebastian Peña Saldarriaga, Joseph Lark","doi":"10.1017/S1351324922000328","DOIUrl":"https://doi.org/10.1017/S1351324922000328","url":null,"abstract":"Abstract Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"643 - 668"},"PeriodicalIF":2.5,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48513181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Named-entity recognition in Turkish legal texts","authors":"Can Çetindağ, Berkay Yazıcıoğlu, Aykut Koç","doi":"10.1017/S1351324922000304","DOIUrl":"https://doi.org/10.1017/S1351324922000304","url":null,"abstract":"Abstract Natural language processing (NLP) technologies and applications in legal text processing are gaining momentum. Being one of the most prominent tasks in NLP, named-entity recognition (NER) can substantiate a great convenience for NLP in law due to the variety of named entities in the legal domain and their accentuated importance in legal documents. However, domain-specific NER models in the legal domain are not well studied. We present a NER model for Turkish legal texts with a custom-made corpus as well as several NER architectures based on conditional random fields and bidirectional long-short-term memories (BiLSTMs) to address the task. We also study several combinations of different word embeddings consisting of GloVe, Morph2Vec, and neural network-based character feature extraction techniques either with BiLSTM or convolutional neural networks. We report 92.27% F1 score with a hybrid word representation of GloVe and Morph2Vec with character-level features extracted with BiLSTM. Being an agglutinative language, the morphological structure of Turkish is also considered. To the best of our knowledge, our work is the first legal domain-specific NER study in Turkish and also the first study for an agglutinative language in the legal domain. Thus, our work can also have implications beyond the Turkish language.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"615 - 642"},"PeriodicalIF":2.5,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46893283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural automated writing evaluation for Korean L2 writing","authors":"Kyungtae Lim, Jayoung Song, Jungyeul Park","doi":"10.1017/S1351324922000298","DOIUrl":"https://doi.org/10.1017/S1351324922000298","url":null,"abstract":"Abstract Although Korean language education is experiencing rapid growth in recent years and several studies have investigated automated writing evaluation (AWE) systems, AWE for Korean L2 writing still remains unexplored. Therefore, this study aims to develop and validate a state-of-the-art neural model AWE system which can be widely used for Korean language teaching and learning. Based on a Korean learner corpus, the proposed AWE is developed using natural language processing techniques such as part-of-speech tagging, syntactic parsing, and statistical language modeling to engineer linguistic features and a pre-trained neural language model. This study attempted to determine how neural network models use different linguistic features to improve AWE performance. Experimental results of the proposed AWE tool showed that the neural AWE system achieves high reliability for unseen test data from the corpus, which implies metrics used in the AWE system can help differentiate different proficiency levels and predict holistic scores. Furthermore, the results confirmed that the proposed linguistic features–syntactic complexity, quantitative complexity, and fluency–offer benefits that complement neural automated writing evaluation.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"1341 - 1363"},"PeriodicalIF":2.5,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42622491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction Grammar Conceptual Network: Coordination-based graph method for semantic association analysis","authors":"Benedikt Perak, Tajana Ban Kirigin","doi":"10.1017/S1351324922000274","DOIUrl":"https://doi.org/10.1017/S1351324922000274","url":null,"abstract":"Abstract In this article, we present the Construction Grammar Conceptual Network method, developed for identifying lexical similarity and word sense discrimination in a syntactically tagged corpus, based on the cognitive linguistic assumption that coordination construction instantiates conceptual relatedness. This graph analysis method projects a semantic value onto a given coordinated syntactic dependency and constructs a second-order lexical network of lexical collocates with a high co-occurrence measure. The subsequent process of clustering and pruning the graph reveals lexical communities with high conceptual similarity, which are interpreted as associated senses of the source lexeme. We demonstrate the theory and its application to the task of identifying the conceptual structure and different meanings of nouns, adjectives and verbs using examples from different corpora, and explain the modulating effects of linguistic and graph parameters. This graph approach is based on syntactic dependency processing and can be used as a complementary method to other contemporary natural language processing resources to enrich semantic tasks such as word disambiguation, domain relatedness, sense structure, identification of synonymy, metonymy, and metaphoricity, as well as to automate comprehensive meta-reasoning about languages and identify cross/intra-cultural discourse variations of prototypical conceptualization patterns and knowledge representations. As a contribution, we provide a web-based app at http://emocnet.uniri.hr/.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"584 - 614"},"PeriodicalIF":2.5,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49089089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial fine-tuning tasks for yes/no question answering","authors":"Dimitris Dimitriadis, Grigorios Tsoumakas","doi":"10.1017/s1351324922000286","DOIUrl":"https://doi.org/10.1017/s1351324922000286","url":null,"abstract":"\u0000 Current research in yes/no question answering (QA) focuses on transfer learning techniques and transformer-based models. Models trained on large corpora are fine-tuned on tasks similar to yes/no QA, and then the captured knowledge is transferred for solving the yes/no QA task. Most previous studies use existing similar tasks, such as natural language inference or extractive QA, for the fine-tuning step. This paper follows a different perspective, hypothesizing that an artificial yes/no task can transfer useful knowledge for improving the performance of yes/no QA. We introduce three such tasks for this purpose, by adapting three corresponding existing tasks: candidate answer validation, sentiment classification, and lexical simplification. Furthermore, we experimented with three different variations of the BERT model (BERT base, RoBERTa, and ALBERT). The results show that our hypothesis holds true for all artificial tasks, despite the small size of the corresponding datasets that are used for the fine-tuning process, the differences between these tasks, the decisions that we made to adapt the original ones, and the tasks’ simplicity. This gives an alternative perspective on how to deal with the yes/no QA problem, that is more creative, and at the same time more flexible, as it can exploit multiple other existing tasks and corresponding datasets to improve yes/no QA models.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":2.5,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42369049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Linda Zhou, Andrew Caines, Ildiko Pete, Alice Hutchings
{"title":"Automated hate speech detection and span extraction in underground hacking and extremist forums","authors":"Linda Zhou, Andrew Caines, Ildiko Pete, Alice Hutchings","doi":"10.1017/S1351324922000262","DOIUrl":"https://doi.org/10.1017/S1351324922000262","url":null,"abstract":"Abstract Hate speech is any kind of communication that attacks a person or a group based on their characteristics, such as gender, religion and race. Due to the availability of online platforms where people can express their (hateful) opinions, the amount of hate speech is steadily increasing that often leads to offline hate crimes. This paper focuses on understanding and detecting hate speech in underground hacking and extremist forums where cybercriminals and extremists, respectively, communicate with each other, and some of them are associated with criminal activity. Moreover, due to the lengthy posts, it would be beneficial to identify the specific span of text containing hateful content in order to assist site moderators with the removal of hate speech. This paper describes a hate speech dataset composed of posts extracted from HackForums, an online hacking forum, and Stormfront and Incels.co, two extremist forums. We combined our dataset with a Twitter hate speech dataset to train a multi-platform classifier. Our evaluation shows that a classifier trained on multiple sources of data does not always improve the performance compared to a mono-platform classifier. Finally, this is the first work on extracting hate speech spans from longer texts. The paper fine-tunes BERT (Bidirectional Encoder Representations from Transformers) and adopts two approaches – span prediction and sequence labelling. Both approaches successfully extract hateful spans and achieve an F1-score of at least 69%.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"1247 - 1274"},"PeriodicalIF":2.5,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44662755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NLE volume 28 issue 4 Cover and Back matter","authors":"","doi":"10.1017/s1351324922000250","DOIUrl":"https://doi.org/10.1017/s1351324922000250","url":null,"abstract":"","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":"b1 - b2"},"PeriodicalIF":2.5,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43592862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NLE volume 28 issue 4 Cover and Front matter","authors":"R. Mitkov, B. Boguraev","doi":"10.1017/s1351324922000249","DOIUrl":"https://doi.org/10.1017/s1351324922000249","url":null,"abstract":"","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":"f1 - f2"},"PeriodicalIF":2.5,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43482558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}