Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020最新文献

筛选
英文 中文
A Case Study of Natural Gender Phenomena in Translation. A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish 翻译中自然性别现象的个案研究。谷歌翻译、必应、微软翻译和DeepL对英语意大利语、法语和西班牙语的比较
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 2020-10-01 DOI: 10.4000/books.aaccademia.8844
Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way
{"title":"A Case Study of Natural Gender Phenomena in Translation. A Comparison of Google Translate, Bing Microsoft Translator and DeepL for English to Italian, French and Spanish","authors":"Argentina Anna Rescigno, Eva Vanmassenhove, J. Monti, Andy Way","doi":"10.4000/books.aaccademia.8844","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8844","url":null,"abstract":"This paper presents the results of an evaluation of Google Translate, DeepL and Bing Microsoft Translator with reference to natural gender translation and provides statistics about the frequency of female, male and neutral forms in the translations of a list of personality adjectives, and nouns referring to professions and bigender nouns. The evaluation is carried out for English→Spanish, English→Italian and English→French.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122762140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
How Granularity of Orthography-Phonology Mappings Affect Reading Development: Evidence from a Computational Model of English Word Reading and Spelling 正字法-音系映射的粒度如何影响阅读发展:来自英语单词阅读和拼写计算模型的证据
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 2020-09-01 DOI: 10.4000/books.aaccademia.8628
A. Lim, B. O’Brien, Luca Onnis
{"title":"How Granularity of Orthography-Phonology Mappings Affect Reading Development: Evidence from a Computational Model of English Word Reading and Spelling","authors":"A. Lim, B. O’Brien, Luca Onnis","doi":"10.4000/books.aaccademia.8628","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8628","url":null,"abstract":"It is widely held that children implicitly learn the structure of their writing system through statistical learning of spelling-tosound mappings. Yet an unresolved question is how to sequence reading experience so that children can ‘pick up’ the structure optimally. We tackle this question here using a computational model of encoding and decoding. The order of presentation of words was manipulated so that they exhibited two distinct progressions of granularity of spelling-to-sound mappings. We found that under a training regime that introduced written words progressively from small-to-large granularity, the network exhibited an early advantage in reading acquisition as compared to a regime introducing written words from large-to-small granularity. Our results thus provide support for the grain size theory (Ziegler and Goswami, 2005) and demonstrate that the order of learning can influence learning trajectories of literacy skills.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124370545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain Adaptation for Text Classification with Weird Embeddings 怪异嵌入文本分类的领域自适应
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8250
Valerio Basile
{"title":"Domain Adaptation for Text Classification with Weird Embeddings","authors":"Valerio Basile","doi":"10.4000/books.aaccademia.8250","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8250","url":null,"abstract":"Pre-trained word embeddings are often used to initialize deep learning models for text classification, as a way to inject precomputed lexical knowledge and boost the learning process. However, such embeddings are usually trained on generic corpora, while text classification tasks are often domain-specific. We propose a fully automated method to adapt pre-trained word embeddings to any given classification task, that needs no additional resource other than the original training set. The method is based on the concept of word weirdness, extended to score the words in the training set according to how characteristic they are with respect to the labels of a text classification dataset. The polarized weirdness scores are then used to update the word embeddings to reflect taskspecific semantic shifts. Our experiments show that this method is beneficial to the performance of several text classification tasks in different languages.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117204780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
#andràtuttobene: Images, Texts, Emojis and Geodata in a Sentiment Analysis Pipeline #andràtuttobene:情感分析管道中的图像,文本,表情符号和地理数据
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8954
Pierluigi Vitale, Serena Pelosi, M. Falco
{"title":"#andràtuttobene: Images, Texts, Emojis and Geodata in a Sentiment Analysis Pipeline","authors":"Pierluigi Vitale, Serena Pelosi, M. Falco","doi":"10.4000/books.aaccademia.8954","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8954","url":null,"abstract":"This research investigates Instagram users’ sentiment narrated during the lockdown period in Italy, caused by the COVID-19 pandemic The study is based on the analysis of all the posts published on Instagram under the hashtag #andratuttobene on May 4, May 18 and June 3, 2020 Our research carried out a view on a national, regional and provincial scale We analyzed all the different languages and forms (i e captions, hashtags, emojis and images) that constitute the posts The aim of this research is to provide a set of procedures revealing the different polarity trends for each kind of expression and to propose a single comprehensive measure Copyright © 2020 for this paper by its authors","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127545085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Becoming JILDA
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8915
Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza
{"title":"Becoming JILDA","authors":"Irene Sucameli, Alessandro Lenci, B. Magnini, M. Simi, Manuela Speranza","doi":"10.4000/books.aaccademia.8915","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8915","url":null,"abstract":"English. The difficulty in finding useful dialogic data to train a conversational agent is an open issue even nowadays, when chatbots and spoken dialogue systems are widely used. For this reason we decided to build JILDA, a novel data collection of chat-based dialogues, produced by Italian native speakers and related to the job-offer domain. JILDA is the first dialogue collection related to this domain for the Italian language. Because of its collection modalities, we believe that JILDA can be a useful resource not only for the Italian research community, but also for the international one. Italiano. Negli ultimi anni l’utilizzo di chatbot e sistemi dialogici è diventato sempre più comune; tuttavia, il reperimento di dati di apprendimento adeguati per addestrare agenti conversazionali costituisce ancora una questione irrisolta. Per questo motivo abbiamo deciso di produrre JILDA, un nuovo dataset di dialoghi relativi al dominio della ricerca del lavoro e realizzati via chat da parlanti nativi italiani. JILDA costituisce la prima collezione di dialoghi relativi a questo dominio, in lingua italiana. Per gli aspetti metodologici e la modalità di raccolta dei dati, riteniamo che una simile risorsa possa essere utile ed interessante non solo per la comunità di ricerca italiana ma anche per quella internazionale.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Surviving the Legal Jungle: Text Classification of Italian Laws in Extremely Noisy Conditions 幸存的法律丛林:文本分类的意大利法律在极端嘈杂的条件
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8390
Riccardo Coltrinari, Alessandro Antinori, Fabio Celli
{"title":"Surviving the Legal Jungle: Text Classification of Italian Laws in Extremely Noisy Conditions","authors":"Riccardo Coltrinari, Alessandro Antinori, Fabio Celli","doi":"10.4000/books.aaccademia.8390","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8390","url":null,"abstract":"In this paper, we present a method based on Linear Discriminant Analysis for legal text classification of extremely noisy data, such as duplicated documents classified in different classes. The results show that Linear Discriminant Analysis obtains very good performances both in clean and noisy conditions, if used as classifier in ensemble learning and in multi-label text classification. 1 Motivation and Background We address text categorization of businessoriented legal documents in Italian, but with a custom and overlapping hierarchy of product categories. A typical approach to tackle similar tasks is to exploit resources such as EUROVOC (Daudaravicius, 2012), a multilingual thesaurus consisting of over 6700 hierarchically-organised class descriptors used by many organizations of the European Union (EU) for the classification and retrieval of official documents. Our editorial system has a hierarchy of 23 product categories and more than 20600 labels, manually annotated and customized for different clients in more than 15 years, hence it is not possible to exploit resources like EUROVOC to categorize documents. In this paper, we propose a fast and efficient method for document classification for noisy data based on Linear Discriminant Analysis, a dimensionality reduction technique that has been employed successfully in many domains, including neuroimaging and medicine. We believe that our contribution will be useful to the NLP community in the context of document categorization as Copyright c ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). well as automatic ontology population, in particular when dealing with very noisy data. The paper is structured as follows: in Section 1.1 we present the related works in the field of text classification and the potential of Linear Discriminant Analysis, in Section 2 we describe the datasets we used, in Section 3 we report and discuss the result of our classification experiments and in Section 4 we draw our conclusions.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121469372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Italian Counter Narrative Generation to Fight Online Hate Speech 意大利反叙事一代打击网络仇恨言论
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8378
Yi-Ling Chung, Serra Sinem Tekiroğlu, Marco Guerini
{"title":"Italian Counter Narrative Generation to Fight Online Hate Speech","authors":"Yi-Ling Chung, Serra Sinem Tekiroğlu, Marco Guerini","doi":"10.4000/books.aaccademia.8378","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8378","url":null,"abstract":"English. Counter Narratives are textual responses meant to withstand online hatred and prevent its spreading. The use of neural architectures for the generation of Counter Narratives (CNs) is beginning to be investigated by the NLP community. Still, the efforts were solely targeting English. In this paper, we try to fill the gap for Italian, studying how to implement CN generation approaches effectively. We experiment with an existing dataset of CNs and a novel language model, recently released for Italian, under several configurations, including zero and few shot learning. Results show that even for underresourced languages, data augmentation strategies paired with large unsupervised LMs can held promising results. Italiano. Le Contro Narrative sono risposte testuali volte a contrastare l’odio online e a prevenirne la diffusione. La comunità di NLP ha iniziato a studiare l’uso di architetture neurali per la generazione di CN. Tuttavia, gli sforzi sono stati rivolti esclusivamente all’inglese. In questo lavoro, cerchiamo di colmare la lacuna per l’italiano, mostrando come implementare efficacemente approcci di generazione di CN. Sperimentiamo con un dataset esistente di CN e un modello del linguaggio per l’italiano recentemente rilasciato, in diverse configurazioni, tra cui zero e few shot learning. I risultati mostrano che anche per lingue con poche risorse, strategie di data augmentation abbinate a potenti modelli del linguaggio possono offrire risultati promettenti. Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124542933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Analyses of Character Emotions in Dramatic Works by Using EmoLex Unigrams 运用EmoLex图形分析戏剧作品中的人物情感
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.9004
Mehmet Can Yavuz
{"title":"Analyses of Character Emotions in Dramatic Works by Using EmoLex Unigrams","authors":"Mehmet Can Yavuz","doi":"10.4000/books.aaccademia.9004","DOIUrl":"https://doi.org/10.4000/books.aaccademia.9004","url":null,"abstract":"In theatrical pieces, written language is the primary medium for establishing antagonisms. As one of the most important figures of renaissance, Shakespeare wrote characters which express themselves clearly. Thus, the emotional landscape of the plays can be revealed from the texts. It is important to analyze such landscapes for further demonstrating these structures. We use word-emotion association lexicon with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). By using this lexicon, the emotional state of each character is represented in 10 dimensional space and mapped onto a plane. This principle axes planes position each character relatively. Additionally, tempora-emotional evaluation of each play is graphed. We conclude that the protagonist and the antagonist have different emotional states from the rest and these two emotionally oppose each other. Temporal-Emotional timeline of the plays are meaningful to have a better insight into the tragedies.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126292994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The "Corpus Anchise 320" and the Analysis of Conversations between Healthcare Workers and People with Dementia “安奇斯320语料库”与医护人员与痴呆症患者对话分析
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8260
Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca
{"title":"The \"Corpus Anchise 320\" and the Analysis of Conversations between Healthcare Workers and People with Dementia","authors":"Nicola Benvenuti, Andrea Bolioli, A. Mazzei, Pietro Vigorelli, A. Bosca","doi":"10.4000/books.aaccademia.8260","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8260","url":null,"abstract":"The aim of this research was to create the first Italian corpus of free conversations between healthcare workers and people with dementia, in order to investigate specific linguistic phenomena from a computational point of view. Most of the previous researches on speech disorders of people with dementia have been based on qualitative analysis, or on the study of a few dozen cases executed in laboratory conditions, and not in spontaneous speech (in particular for the Italian language). The creation of the Corpus Anchise 320 aims to investigate Dementia language by providing a broader number of dialogues collected in ecological conditions and obtained transcribing spontaneous speech. Moreover, quantitative linguistic analysis can show some peculiarities of this language.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124240892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
(Stem and Word) Predictability in Italian Verb Paradigms: An Entropy-Based Study Exploiting the New Resource LeFFI 意大利语动词范式的可预测性:基于熵的新资源LeFFI的研究
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020 Pub Date : 1900-01-01 DOI: 10.4000/books.aaccademia.8830
Matteo Pellegrini, A. T. Cignarella
{"title":"(Stem and Word) Predictability in Italian Verb Paradigms: An Entropy-Based Study Exploiting the New Resource LeFFI","authors":"Matteo Pellegrini, A. T. Cignarella","doi":"10.4000/books.aaccademia.8830","DOIUrl":"https://doi.org/10.4000/books.aaccademia.8830","url":null,"abstract":"English. In this paper we present LeFFI, an inflected lexicon of Italian listing all the available wordforms of 2,053 verbs. We then use this resource to perform an entropy-based analysis of the mutual predictability of wordforms within Italian verb paradigms, and compare our findings to the ones of previous work on stem predictability in Italian verb inflection.","PeriodicalId":300279,"journal":{"name":"Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114855565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信