Language Resources and Evaluation最新文献

筛选
英文 中文
Resources for Turkish natural language processing: A critical survey. 土耳其自然语言处理资源:一项重要调查。
IF 2.7 3区 计算机科学
Language Resources and Evaluation Pub Date : 2023-01-01 DOI: 10.1007/s10579-022-09605-4
Çağrı Çöltekin, A Seza Doğruöz, Özlem Çetinoğlu
{"title":"Resources for Turkish natural language processing: A critical survey.","authors":"Çağrı Çöltekin,&nbsp;A Seza Doğruöz,&nbsp;Özlem Çetinoğlu","doi":"10.1007/s10579-022-09605-4","DOIUrl":"https://doi.org/10.1007/s10579-022-09605-4","url":null,"abstract":"<p><p>This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications in Turkish Linguistics and Natural Language Processing.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"57 1","pages":"449-488"},"PeriodicalIF":2.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9417072/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9504201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Speech acts in the Dutch COVID-19 Press Conferences. 荷兰2019冠状病毒病新闻发布会上的演讲。
IF 2.7 3区 计算机科学
Language Resources and Evaluation Pub Date : 2023-01-01 DOI: 10.1007/s10579-022-09602-7
Daan Schueler, Maarten Marx
{"title":"Speech acts in the Dutch COVID-19 Press Conferences.","authors":"Daan Schueler,&nbsp;Maarten Marx","doi":"10.1007/s10579-022-09602-7","DOIUrl":"https://doi.org/10.1007/s10579-022-09602-7","url":null,"abstract":"<p><p>An open source corpus of all Dutch COVID-19 Press Conferences with sentences annotated on the basis of John Searle's Speech Act taxonomy was created. It contains all 58 press conferences held between March 6 2020 and April 20 2021 and has 9.441 manually annotated sentences. Speech acts were annotated in a consistent manner, with a Krippendorff's alpha of .71. The corpus is easy to use and rich in metadata, with lexical, syntactic, discourse (speaker, question or answer) features and information on the type of regulations being present. We analyse the press conferences in terms of speech act usage, giving insight into the use of speech acts over time, the relation of speech act usage to real world phenomena, the general structure of the press conferences and the division of roles between speakers. Relations were found between speech act usage and the type of press conference (i.e. easing, tightening or neutral) as well as the number of hospital admissions. Speech act classes showed preferred locations within the press conferences, indicating a general structure. Distinct roles between speakers were identified. We also investigate the use of our set of labelled sentences for training a speech act classifier and achieve a reasonable accuracy of .73 and a mean reciprocal rank of .74 with the state of the art transformer RoBERTa model.</p><p><strong>Supplementary information: </strong>The online version of this article contains supplementary material available 10.1007/s10579-022-09602-7.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"57 2","pages":"869-892"},"PeriodicalIF":2.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9504423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
EventDNA: a dataset for Dutch news event extraction as a basis for news diversification. EventDNA:荷兰新闻事件提取的数据集,作为新闻多样化的基础。
IF 2.7 3区 计算机科学
Language Resources and Evaluation Pub Date : 2023-01-01 DOI: 10.1007/s10579-022-09623-2
Camiel Colruyt, Orphée De Clercq, Thierry Desot, Véronique Hoste
{"title":"EventDNA: a dataset for Dutch news event extraction as a basis for news diversification.","authors":"Camiel Colruyt,&nbsp;Orphée De Clercq,&nbsp;Thierry Desot,&nbsp;Véronique Hoste","doi":"10.1007/s10579-022-09623-2","DOIUrl":"https://doi.org/10.1007/s10579-022-09623-2","url":null,"abstract":"<p><p>News organizations increasingly tailor their news offering to the reader through personalized recommendation algorithms. However, automated recommendation algorithms reflect a commercial logic based on calculated relevance to the user, rather than aiming at a well-informed citizenry. In this paper, we introduce the EventDNA corpus, a dataset of 1773 Dutch-language news articles annotated with information on entities, news events and IPTC Media Topic codes, with the ultimate goal to outline a recommendation algorithm that uses news event diversity rather than previous reading behaviour as a key driver for personalized news recommendation. We describe the EventDNA annotation guidelines, which are inspired by the well-known ERE framework and conclude that it is not practical to apply a fixed event typology such as used in ERE to an unrestricted data context. The corpus and related source code is made available at https://github.com/NewsDNA-LT3/.github.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"57 1","pages":"189-221"},"PeriodicalIF":2.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9672586/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9136049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
The ParlaMint corpora of parliamentary proceedings. 议会议事程序的议会文集
IF 2.7 3区 计算机科学
Language Resources and Evaluation Pub Date : 2023-01-01 DOI: 10.1007/s10579-021-09574-0
Tomaž Erjavec, Maciej Ogrodniczuk, Petya Osenova, Nikola Ljubešić, Kiril Simov, Andrej Pančur, Michał Rudolf, Matyáš Kopp, Starkaður Barkarson, Steinþór Steingrímsson, Çağrı Çöltekin, Jesse de Does, Katrien Depuydt, Tommaso Agnoloni, Giulia Venturi, María Calzada Pérez, Luciana D de Macedo, Costanza Navarretta, Giancarlo Luxardo, Matthew Coole, Paul Rayson, Vaidas Morkevičius, Tomas Krilavičius, Roberts Darǵis, Orsolya Ring, Ruben van Heusden, Maarten Marx, Darja Fišer
{"title":"The ParlaMint corpora of parliamentary proceedings.","authors":"Tomaž Erjavec,&nbsp;Maciej Ogrodniczuk,&nbsp;Petya Osenova,&nbsp;Nikola Ljubešić,&nbsp;Kiril Simov,&nbsp;Andrej Pančur,&nbsp;Michał Rudolf,&nbsp;Matyáš Kopp,&nbsp;Starkaður Barkarson,&nbsp;Steinþór Steingrímsson,&nbsp;Çağrı Çöltekin,&nbsp;Jesse de Does,&nbsp;Katrien Depuydt,&nbsp;Tommaso Agnoloni,&nbsp;Giulia Venturi,&nbsp;María Calzada Pérez,&nbsp;Luciana D de Macedo,&nbsp;Costanza Navarretta,&nbsp;Giancarlo Luxardo,&nbsp;Matthew Coole,&nbsp;Paul Rayson,&nbsp;Vaidas Morkevičius,&nbsp;Tomas Krilavičius,&nbsp;Roberts Darǵis,&nbsp;Orsolya Ring,&nbsp;Ruben van Heusden,&nbsp;Maarten Marx,&nbsp;Darja Fišer","doi":"10.1007/s10579-021-09574-0","DOIUrl":"https://doi.org/10.1007/s10579-021-09574-0","url":null,"abstract":"<p><p>This paper presents the ParlaMint corpora containing transcriptions of the sessions of the 17 European national parliaments with half a billion words. The corpora are uniformly encoded, contain rich meta-data about 11 thousand speakers, and are linguistically annotated following the Universal Dependencies formalism and with named entities. Samples of the corpora and conversion scripts are available from the project's GitHub repository, and the complete corpora are openly available via the CLARIN.SI repository for download, as well as through the NoSketch Engine and KonText concordancers and the Parlameter interface for on-line exploration and analysis.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"57 1","pages":"415-448"},"PeriodicalIF":2.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8807381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9190380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Between welcome culture and border fence: A dataset on the European refugee crisis in German newspaper reports. 在欢迎文化和边境围栏之间:德国报纸报道的欧洲难民危机数据集。
IF 2.7 3区 计算机科学
Language Resources and Evaluation Pub Date : 2023-01-01 DOI: 10.1007/s10579-023-09641-8
Nico Blokker, André Blessing, Erenay Dayanik, Jonas Kuhn, Sebastian Padó, Gabriella Lapesa
{"title":"Between welcome culture and border fence: A dataset on the European refugee crisis in German newspaper reports.","authors":"Nico Blokker,&nbsp;André Blessing,&nbsp;Erenay Dayanik,&nbsp;Jonas Kuhn,&nbsp;Sebastian Padó,&nbsp;Gabriella Lapesa","doi":"10.1007/s10579-023-09641-8","DOIUrl":"https://doi.org/10.1007/s10579-023-09641-8","url":null,"abstract":"<p><p>Newspaper reports provide a rich source of information on the unfolding of public debates, which can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite the reactions of political actors: crisis sparks the debate. However, due to the challenges of reliable annotation and modeling, few large-scale datasets with high-quality annotation are available. This paper introduces <i>DebateNet2.0</i>, which traces the political discourse on the 2015 European refugee crisis in the German quality newspaper <i>taz</i>. The core units of our annotation are political claims (requests for specific actions to be taken) and the actors who advance them (politicians, parties, etc.). Our contribution is twofold. First, we document and release <i>DebateNet2.0</i> along with its companion R package, mardyR. Second, we outline and apply a Discourse Network Analysis (DNA) to <i>DebateNet2.0</i>, comparing two crucial moments of the policy debate on the \"refugee crisis\": the migration flux through the Mediterranean in April/May and the one along the Balkan route in September/October. We guide the reader through the methods involved in constructing a discourse network from a newspaper, demonstrating that there is not one single discourse network for the German migration debate, but multiple ones, depending on the research question through the associated choices regarding political actors, policy fields and time spans.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"57 1","pages":"121-153"},"PeriodicalIF":2.7,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9924208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9192305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信