Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020最新文献

A Case Study of Natural Gender Phenomena in Translation. A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish 翻译中自然性别现象的个案研究。谷歌翻译、必应、微软翻译和DeepL对英语意大利语、法语和西班牙语的比较

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 2020-10-01 DOI: 10.4000/books.aaccademia.8844

Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way

引用次数: 19

How Granularity of Orthography-Phonology Mappings Affect Reading Development: Evidence from a Computational Model of English Word Reading and Spelling 正字法-音系映射的粒度如何影响阅读发展:来自英语单词阅读和拼写计算模型的证据

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 2020-09-01 DOI: 10.4000/books.aaccademia.8628

A. Lim, B. O’Brien, Luca Onnis

引用次数: 0

Domain Adaptation for Text Classification with Weird Embeddings 怪异嵌入文本分类的领域自适应

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8250

Valerio Basile

引用次数: 3

#andràtuttobene: Images, Texts, Emojis and Geodata in a Sentiment Analysis Pipeline #andràtuttobene:情感分析管道中的图像，文本，表情符号和地理数据

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8954

Pierluigi Vitale, Serena Pelosi, M. Falco

引用次数: 5

Becoming JILDA

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8915

Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza

{"title":"Becoming JILDA","authors":"Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza","doi":"10.4000/books.aaccademia.8915","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8915","url":null,"abstract":"English. The difficulty in finding useful dialogic data to train a conversational agent is an open issue even nowadays, when chatbots and spoken dialogue systems are widely used. For this reason we decided to build JILDA, a novel data collection of chat-based dialogues, produced by Italian native speakers and related to the job-offer domain. JILDA is the first dialogue collection related to this domain for the Italian language. Because of its collection modalities, we believe that JILDA can be a useful resource not only for the Italian research community, but also for the international one. Italiano. Negli ultimi anni l’utilizzo di chatbot e sistemi dialogici è diventato sempre più comune; tuttavia, il reperimento di dati di apprendimento adeguati per addestrare agenti conversazionali costituisce ancora una questione irrisolta. Per questo motivo abbiamo deciso di produrre JILDA, un nuovo dataset di dialoghi relativi al dominio della ricerca del lavoro e realizzati via chat da parlanti nativi italiani. JILDA costituisce la prima collezione di dialoghi relativi a questo dominio, in lingua italiana. Per gli aspetti metodologici e la modalità di raccolta dei dati, riteniamo che una simile risorsa possa essere utile ed interessante non solo per la comunità di ricerca italiana ma anche per quella internazionale.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Surviving the Legal Jungle: Text Classification of Italian Laws in Extremely Noisy Conditions 幸存的法律丛林:文本分类的意大利法律在极端嘈杂的条件

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8390

Riccardo Coltrinari, Alessandro Antinori, Fabio Celli

{"title":"Surviving the Legal Jungle: Text Classification of Italian Laws in Extremely Noisy Conditions","authors":"Riccardo Coltrinari, Alessandro Antinori, Fabio Celli","doi":"10.4000/books.aaccademia.8390","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8390","url":null,"abstract":"In this paper, we present a method based on Linear Discriminant Analysis for legal text classification of extremely noisy data, such as duplicated documents classified in different classes. The results show that Linear Discriminant Analysis obtains very good performances both in clean and noisy conditions, if used as classifier in ensemble learning and in multi-label text classification. 1 Motivation and Background We address text categorization of businessoriented legal documents in Italian, but with a custom and overlapping hierarchy of product categories. A typical approach to tackle similar tasks is to exploit resources such as EUROVOC (Daudaravicius, 2012), a multilingual thesaurus consisting of over 6700 hierarchically-organised class descriptors used by many organizations of the European Union (EU) for the classification and retrieval of official documents. Our editorial system has a hierarchy of 23 product categories and more than 20600 labels, manually annotated and customized for different clients in more than 15 years, hence it is not possible to exploit resources like EUROVOC to categorize documents. In this paper, we propose a fast and efficient method for document classification for noisy data based on Linear Discriminant Analysis, a dimensionality reduction technique that has been employed successfully in many domains, including neuroimaging and medicine. We believe that our contribution will be useful to the NLP community in the context of document categorization as Copyright c ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). well as automatic ontology population, in particular when dealing with very noisy data. The paper is structured as follows: in Section 1.1 we present the related works in the field of text classification and the potential of Linear Discriminant Analysis, in Section 2 we describe the datasets we used, in Section 3 we report and discuss the result of our classification experiments and in Section 4 we draw our conclusions.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121469372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Italian Counter Narrative Generation to Fight Online Hate Speech 意大利反叙事一代打击网络仇恨言论

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8378

Yi-Ling Chung, Serra Sinem Tekiroğlu, Marco Guerini

{"title":"Italian Counter Narrative Generation to Fight Online Hate Speech","authors":"Yi-Ling Chung, Serra Sinem Tekiroğlu, Marco Guerini","doi":"10.4000/books.aaccademia.8378","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8378","url":null,"abstract":"English. Counter Narratives are textual responses meant to withstand online hatred and prevent its spreading. The use of neural architectures for the generation of Counter Narratives (CNs) is beginning to be investigated by the NLP community. Still, the efforts were solely targeting English. In this paper, we try to fill the gap for Italian, studying how to implement CN generation approaches effectively. We experiment with an existing dataset of CNs and a novel language model, recently released for Italian, under several configurations, including zero and few shot learning. Results show that even for underresourced languages, data augmentation strategies paired with large unsupervised LMs can held promising results. Italiano. Le Contro Narrative sono risposte testuali volte a contrastare l’odio online e a prevenirne la diffusione. La comunità di NLP ha iniziato a studiare l’uso di architetture neurali per la generazione di CN. Tuttavia, gli sforzi sono stati rivolti esclusivamente all’inglese. In questo lavoro, cerchiamo di colmare la lacuna per l’italiano, mostrando come implementare efficacemente approcci di generazione di CN. Sperimentiamo con un dataset esistente di CN e un modello del linguaggio per l’italiano recentemente rilasciato, in diverse configurazioni, tra cui zero e few shot learning. I risultati mostrano che anche per lingue con poche risorse, strategie di data augmentation abbinate a potenti modelli del linguaggio possono offrire risultati promettenti. Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124542933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Analyses of Character Emotions in Dramatic Works by Using EmoLex Unigrams 运用EmoLex图形分析戏剧作品中的人物情感

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.9004

Mehmet Can Yavuz

引用次数: 6

The "Corpus Anchise 320" and the Analysis of Conversations between Healthcare Workers and People with Dementia “安奇斯320语料库”与医护人员与痴呆症患者对话分析

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8260

Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca

引用次数: 0

(Stem and Word) Predictability in Italian Verb Paradigms: An Entropy-Based Study Exploiting the New Resource LeFFI 意大利语动词范式的可预测性:基于熵的新资源LeFFI的研究

Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8830

Matteo Pellegrini, A. T. Cignarella

引用次数: 2