Isabeau Oliveri, Luca Ardito, Giuseppe Rizzo, M. Morisio
{"title":"Creativity Embedding: A Vector to Characterise and Classify Plausible Triples in Deep Learning NLP Models","authors":"Isabeau Oliveri, Luca Ardito, Giuseppe Rizzo, M. Morisio","doi":"10.4000/books.aaccademia.8768","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8768","url":null,"abstract":"English. In this paper we define the creativity embedding of a text based on four self-assessment creativity metrics, namely diversity, novelty, serendipity and magnitude, knowledge graphs, and neural networks. We use as basic unit the notion of triple (head, relation, tail). We investigate if additional information about creativity improves natural language processing tasks. In this work, we focus on triple plausibility task, exploiting BERT model and a WordNet11 dataset sample. Contrary to our hypothesis, we do not detect increase in the performance.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114518887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hate Speech Detection with Machine-Translated Data: The Role of Annotation Scheme, Class Imbalance and Undersampling","authors":"Camilla Casula, Sara Tonelli","doi":"10.4000/books.aaccademia.8345","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8345","url":null,"abstract":"While using machine-translated data for supervised training can alleviate data sparseness problems when dealing with less-resourced languages, it is important that the source data are not only correctly translated, but also follow the same annotation scheme and possibly class balance as the smaller dataset in the target language. We therefore present an evaluation of hate speech detection in Italian using machine-translated data from English and comparing three settings, in order to understand the impact of training size, class distribution and annotation scheme.1","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130934947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Silvia Brambilla, D. Croce, F. Tamburini, R. Basili
{"title":"Automatic Induction of FrameNet lexical units in Italian","authors":"Silvia Brambilla, D. Croce, F. Tamburini, R. Basili","doi":"10.4000/books.aaccademia.8300","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8300","url":null,"abstract":"In this paper we investigate the applicability of automatic methods for frame induction to improve the coverage of IFrameNet, a novel lexical resource based on Frame Semantics in Italian. The experimental evaluations show that the adopted methods based on neural word embeddings pave the way for the assisted development of a large scale lexical resource for","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131322863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. M. Cecchini, R. Sprugnoli, Giovanni Moretti, M. Passarotti
{"title":"UDante: First Steps Towards the Universal Dependencies Treebank of Dante's Latin Works","authors":"F. M. Cecchini, R. Sprugnoli, Giovanni Moretti, M. Passarotti","doi":"10.4000/books.aaccademia.8653","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8653","url":null,"abstract":"English. This paper1 presents the early stages of the development of a new treebank containing all of Dante Alighieri’s Latin works. In particular, it describes the conversion of the original TEI-XML files to CoNLL-U, the creation of a gold standard, the process of training four annotators and the evaluation of the syntactic annotation in terms of inter-annotator agreement and LA, UAS and LAS. The aim is to release a new resource, in view of the celebrations for the 700th anniversary of Dante’s death, which can support the development of the Vocabolario Dantesco.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"39 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123272858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The AEREST Reading Database","authors":"Marcello Ferro, Sara Giulivi, Claudia Cappa","doi":"10.4000/books.aaccademia.8558","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8558","url":null,"abstract":"Aerest is a reading assessment protocol for the concurrent evaluation of a child’s decoding and comprehension skills. Reading data complying with the Aerest protocol were automatically collected and structured with the ReadLet web-based platform in a pilot study, to form the Aerest Reading Database. The content, structure and potential of the database are described here, together with the main directions of current and future developments. Aerest è un protocollo di valutazione della lettura che misura in parallelo la capacità di decodifica e quella di comprensione del testo. Il protocollo è stato applicato in uno studio pilota i cui dati sono stati raccolti attraverso la piattaforma web ReadLet. L’articolo descrive il contenuto, la strutture e le potenzialità del data set risultante, insieme a future direzioni di sviluppo.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133503911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Masini, M. Micheli, Andrea Zaninello, S. Castagnoli, M. Nissim
{"title":"Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian","authors":"F. Masini, M. Micheli, Andrea Zaninello, S. Castagnoli, M. Nissim","doi":"10.4000/books.aaccademia.8710","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8710","url":null,"abstract":"The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"64 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132187094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributional Semantics: Yesterday, Today, and Tomorrow","authors":"Alessandro Lenci","doi":"10.4000/books.aaccademia.9030","DOIUrl":"https://doi.org/10.4000/books.aaccademia.9030","url":null,"abstract":"Distributional semantics is undoubtedly the mainstream approach to meaning representation in computational linguistics today. It has also become an important paradigm of semantic analysis in cognitive science, and even linguists have started looking at it with growing interest. The popularity of distributional semantics has literally boomed in the era of Deep Learning, when “word embeddings” have become the basic ingredient to “cook” any NLP task. The era of BERT & co. has brought new types of contextualized representations that have often generated hasty claims of incredible breakthroughs in the natural language understanding capability of deep learning models. Unfortunately, these claims are not always supported by the improved semantic abilities of the last generation of embeddings. Models like BERT are still rooted in the principles of distributional learning, but at the same time their goal is more ambitious than generating corpus-based representations of meaning. On the one hand, the embeddings they produce encode much more than lexical meaning, but on the other hand we are still largely uncertain about what semantic properties of natural language they actually capture. Distributional semantics has surely benefited from the successes of the deep learning, but this might even jeopardize the very essence of distributional models of meaning, by making their goals and foundations unclear.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134520385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}