{"title":"DirKorp","authors":"Petra Bago, Virna Karlić","doi":"10.4312/slo2.0.2023.1.189-217","DOIUrl":null,"url":null,"abstract":"In this paper, we present recent developments on a new version (v3.0) of DirKorp (Korpus direktivnih govornih činova hrvatskoga jezika), the first Croatian corpus of directive speech acts developed for the purposes of pragmatic research. The corpus contains 800 elicited speech acts collected via an online questionnaire with role-playing tasks, a method of simulated communication that is implemented under pre-set conditions. This method is suitable for researching speech acts due to the ability to collect a great number of examples of such acts of equal propositional content and illocutionary purpose used in the same controlled situations. The presented situations are classified into two categories with regard to the relationship between the participants of the communication act: (1) situations involving interlocutors who are not in a familiar relationship; (2) situations involving interlocutors in a familiar relationship. Assignments of the two categories are organized into four pairs, asking respondents to share a speech act of similar propositional content. The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and Social Sciences (University of Zagreb). The corpus has been manually annotated on the speech act level, each speech act containing up to 14 features: (1) respondent ID, (2) familiarity/unfamiliarity, (3) utterance type, (4) directive performative verb in 1st person, (5) illocutionary force, (6) propositional content, (7) T/V form, (8) exhortative, (9) lexical marker of request, (10) lexical marker of apology, (11) lexical marker of gratitude, (12) honorific title, (13) grammatical mood, and (14) modal verb in 2nd person. It contains 12,676 tokens and 1,692 types. The corpus is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. We describe applied pragmatic annotation as well as the structure of the corpus.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DirKorp\",\"authors\":\"Petra Bago, Virna Karlić\",\"doi\":\"10.4312/slo2.0.2023.1.189-217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present recent developments on a new version (v3.0) of DirKorp (Korpus direktivnih govornih činova hrvatskoga jezika), the first Croatian corpus of directive speech acts developed for the purposes of pragmatic research. The corpus contains 800 elicited speech acts collected via an online questionnaire with role-playing tasks, a method of simulated communication that is implemented under pre-set conditions. This method is suitable for researching speech acts due to the ability to collect a great number of examples of such acts of equal propositional content and illocutionary purpose used in the same controlled situations. The presented situations are classified into two categories with regard to the relationship between the participants of the communication act: (1) situations involving interlocutors who are not in a familiar relationship; (2) situations involving interlocutors in a familiar relationship. Assignments of the two categories are organized into four pairs, asking respondents to share a speech act of similar propositional content. The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and Social Sciences (University of Zagreb). The corpus has been manually annotated on the speech act level, each speech act containing up to 14 features: (1) respondent ID, (2) familiarity/unfamiliarity, (3) utterance type, (4) directive performative verb in 1st person, (5) illocutionary force, (6) propositional content, (7) T/V form, (8) exhortative, (9) lexical marker of request, (10) lexical marker of apology, (11) lexical marker of gratitude, (12) honorific title, (13) grammatical mood, and (14) modal verb in 2nd person. It contains 12,676 tokens and 1,692 types. The corpus is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. We describe applied pragmatic annotation as well as the structure of the corpus.\",\"PeriodicalId\":36888,\"journal\":{\"name\":\"Slovenscina 2.0\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Slovenscina 2.0\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4312/slo2.0.2023.1.189-217\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Slovenscina 2.0","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4312/slo2.0.2023.1.189-217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 0
摘要
在本文中,我们介绍了DirKorp (Korpus direcktivnih govornih inova hrvatskoga jezika)新版本(v3.0)的最新进展,这是为语用研究目的开发的第一个克罗地亚指示性言语行为语料库。该语料库包含800个引出的言语行为,这些行为是通过一个带有角色扮演任务的在线问卷收集的,这是一种在预先设定的条件下实施的模拟交流方法。这种方法适合研究言语行为,因为它能够收集到大量在相同的控制情境中使用的具有相同命题内容和言外目的的言语行为的例子。根据交际行为参与者之间的关系,所呈现的情景分为两类:(1)不熟悉关系的对话者的情景;(2)对话者处于熟悉关系的情景。这两类任务被分成四对,要求被调查者分享一个命题内容相似的言语行为。受访者是100名说克罗地亚语的人,都是萨格勒布大学人文与社会科学学院的本科生(63%)或研究生(37%)。语料库在言语行为层面进行了人工标注,每个言语行为包含多达14个特征:(1)应答者身份,(2)熟悉/不熟悉,(3)话语类型,(4)第一人称指示行为动词,(5)言外之力,(6)命题内容,(7)T/V形式,(8)劝诫,(9)请求词汇标记,(10)道歉词汇标记,(11)感激词汇标记,(12)敬语标题,(13)语法语气,(14)第二人称情态动词。它包含12,676个令牌和1,692种类型。语料库按照TEI P5:电子文本编码和交换指南进行编码,该指南由文本编码倡议联盟(TEI)开发和维护。DirKorp可以在GitHub的CC BY-SA 4.0许可下以TEI格式下载。我们描述了应用的语用注释以及语料库的结构。
In this paper, we present recent developments on a new version (v3.0) of DirKorp (Korpus direktivnih govornih činova hrvatskoga jezika), the first Croatian corpus of directive speech acts developed for the purposes of pragmatic research. The corpus contains 800 elicited speech acts collected via an online questionnaire with role-playing tasks, a method of simulated communication that is implemented under pre-set conditions. This method is suitable for researching speech acts due to the ability to collect a great number of examples of such acts of equal propositional content and illocutionary purpose used in the same controlled situations. The presented situations are classified into two categories with regard to the relationship between the participants of the communication act: (1) situations involving interlocutors who are not in a familiar relationship; (2) situations involving interlocutors in a familiar relationship. Assignments of the two categories are organized into four pairs, asking respondents to share a speech act of similar propositional content. The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and Social Sciences (University of Zagreb). The corpus has been manually annotated on the speech act level, each speech act containing up to 14 features: (1) respondent ID, (2) familiarity/unfamiliarity, (3) utterance type, (4) directive performative verb in 1st person, (5) illocutionary force, (6) propositional content, (7) T/V form, (8) exhortative, (9) lexical marker of request, (10) lexical marker of apology, (11) lexical marker of gratitude, (12) honorific title, (13) grammatical mood, and (14) modal verb in 2nd person. It contains 12,676 tokens and 1,692 types. The corpus is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. We describe applied pragmatic annotation as well as the structure of the corpus.