Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.189-217
Petra Bago, Virna Karlić
{"title":"DirKorp","authors":"Petra Bago, Virna Karlić","doi":"10.4312/slo2.0.2023.1.189-217","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.189-217","url":null,"abstract":"In this paper, we present recent developments on a new version (v3.0) of DirKorp (Korpus direktivnih govornih činova hrvatskoga jezika), the first Croatian corpus of directive speech acts developed for the purposes of pragmatic research. The corpus contains 800 elicited speech acts collected via an online questionnaire with role-playing tasks, a method of simulated communication that is implemented under pre-set conditions. This method is suitable for researching speech acts due to the ability to collect a great number of examples of such acts of equal propositional content and illocutionary purpose used in the same controlled situations. The presented situations are classified into two categories with regard to the relationship between the participants of the communication act: (1) situations involving interlocutors who are not in a familiar relationship; (2) situations involving interlocutors in a familiar relationship. Assignments of the two categories are organized into four pairs, asking respondents to share a speech act of similar propositional content. The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and Social Sciences (University of Zagreb). The corpus has been manually annotated on the speech act level, each speech act containing up to 14 features: (1) respondent ID, (2) familiarity/unfamiliarity, (3) utterance type, (4) directive performative verb in 1st person, (5) illocutionary force, (6) propositional content, (7) T/V form, (8) exhortative, (9) lexical marker of request, (10) lexical marker of apology, (11) lexical marker of gratitude, (12) honorific title, (13) grammatical mood, and (14) modal verb in 2nd person. It contains 12,676 tokens and 1,692 types. The corpus is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. We describe applied pragmatic annotation as well as the structure of the corpus.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.8-32
Špela Arhar Holdt, Iztok Kosem, Eva Pori, Vojko Gorjanc, Simon Krek, Polona Gantar
{"title":"Negativno zaznamovano besedišče v Slovarju sopomenk sodobne slovenščine 2.0","authors":"Špela Arhar Holdt, Iztok Kosem, Eva Pori, Vojko Gorjanc, Simon Krek, Polona Gantar","doi":"10.4312/slo2.0.2023.1.8-32","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.8-32","url":null,"abstract":"V prispevku predstavljamo rešitve za prepoznavanje in označevanje zaznamovanega besedišča v okviru koncepta odzivnega Slovarja sopomenk sodobne slovenščine. Ker gre za prvi tovrstni projekt, so pripravljene rešitve v veliki meri inovativne, umeščene pa v okvir problematike avtomatske strojne izdelave slovarja, njegove odprtosti in vključenosti uporabniške skupnosti. Prispevek prikazuje postopek prepoznavanja sovražnega in grobega besedišča ter pripis oznak, opozorilnih ikon in daljših pojasnil. Ukvarjamo se tako s tehničnimi kot vsebinskimi vprašanji označevanja. Vsebinsko oznake temeljijo na sporočanjskem namenu in učinku, pri čemer je njihovo bistvo informacija o možnih posledicah rabe, pri tehničnih rešitvah pa veliko pozornost posvečamo digitalnemu mediju in vizualizaciji rešitev v njem. Ker je odzivnost eden ključnih konceptov slovarja, se pri rešitvah glede označevanja zavedamo pomembnosti sodelovanja z uporabniško skupnostjo, zato tudi pri dodajanju oznak predlagamo rešitve za sodelovanje s skupnostjo. Izhodiščni konferenčni prispevek je bil razširjen v vseh poglavjih, dodano pa je povsem novo poglavje o obdelavi večpomenskih iztočnic, njihovi pomenski členitvi in pomenskem opisovanju z zgledi pomenov z negativno zaznamovanostjo.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.91-117
Špela Antloga
{"title":"Identifikacija metafore in metonimije v jezikovnih korpusih","authors":"Špela Antloga","doi":"10.4312/slo2.0.2023.1.91-117","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.91-117","url":null,"abstract":"Z jezikom nismo vedno zmožni neposredno ubesediti vsega, kar mislimo, zato za razlago pojavnosti uporabljamo različne jezikovno-kognitivne postopke, med drugim metafore in metonimije. Prepoznavanje vrednosti in razširjenosti metaforičnih in metonimičnih izrazov v jeziku je v zadnjih dvajsetih letih vodilo k povečanemu zanimanju za sistematično identifikacijo in luščenje tovrstnih figurativnih izrazov v korpusih posameznih jezikov. Izraze, pri katerih potekajo konceptualne preslikave, ki sodelujejo pri metaforičnih in metonimičnih procesih, je namreč težko izluščiti iz korpusa, ki niso posebej označeni za namene raziskovanja figurativnega jezika. V članku opredelim razumevanje konceptualne metafore in konceptualne metonimije, predstavim najpogostejše metode luščenja metaforičnih in metonimičnih izrazov iz jezikovnih korpusov ter na primeru korpusa g-KOMET, ki je ročno označen za metaforične izraze in metonimične prenose, ponazarjam poskus sistematizacije nekaterih najbolj prisotnih metonimičnih prenosov v slovenskem govorjenem jeziku.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.1-6
Darja Fišer, Tomaž Erjavec
{"title":"Uvodnik v tematsko številko o Digitalnem jezikoslovju","authors":"Darja Fišer, Tomaž Erjavec","doi":"10.4312/slo2.0.2023.1.1-6","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.1-6","url":null,"abstract":"Pričujoča tematska številka revije Slovenščina 2.0 se posveča digitalnemu jezikoslovju, hitro rastočemu interdisciplinarnemu področju raziskav na stičišču tradicionalnega jezikoslovja, informacijskih tehnologij in družboslovnih ved. V ospredju digitalnojezikoslovnih raziskav je ohranjanje, analiza in uporaba jezikovnih podatkov, digitalnih artefaktov z jezikom kot nosilcem medčloveškega sporazumevanja. Digitalno jezikoslovje tako pri nas kot po svetu postaja vse pomembnejše ne samo v akademskih in izobraževalnih krogih, temveč tudi v javnem in zasebnem sektorju, ki za uspešno delovanje v sodobni družbi in gospodarstvu vse bolj potrebujeta strokovnjake, vešče upravljanja z digitalnimi jezikovnimi podatki.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.218-246
Kaja Dobrovoljc, Luka Terčon, Nikola Ljubešić
{"title":"Universal Dependencies za slovenščino","authors":"Kaja Dobrovoljc, Luka Terčon, Nikola Ljubešić","doi":"10.4312/slo2.0.2023.1.218-246","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.218-246","url":null,"abstract":"Universal Dependencies (UD) je mednarodno usklajena označevalna shema za medjezikovno primerljivo oblikoslovno in skladenjsko označevanje besedil po načelih odvisnostne slovnice, ki je bila ob več kot 130 drugih svetovnih jezikih uspešno uporabljena tudi za označevanje besedil v slovenščini. V prispevku predstavimo rezultate nedavnih aktivnosti v povezavi s shemo UD znotraj projekta Razvoj slovenščine v digitalnem okolju, v okviru katerega smo obstoječo infrastrukturo nadgradili s prenovo in podrobno dokumentacijo označevalnih smernic UD za slovenščino, razširitvijo drevesnice SSJ-UD za pisno slovenščino z novimi povedmi iz korpusov ssj500k in ELEXIS-WSD, izdelavo testne množice iz besedil korpusa SentiCoref za spletni portal SloBENCH ter polavtomatsko pretvorbo oblikoslovnih oznak referenčnih učnih korpusov SUK in Janes-Tag. Na razširjeni drevesnici SSJ-UD je bil naučen tudi novi napovedni model za skladenjsko razčlenjevanje v orodju CLASSLA-Stanza, ki ga v prispevku v podporo nadaljnjim jezikoslovnim aplikacijam podrobneje ovrednotimo z vidika splošne natančnosti razčlenjevanja in najpogostejših tipov napak.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.247-274
Uroš Šmajdek, Matjaž Zupanič, Maj Zirkelbach, Meta Jazbinšek
{"title":"Adapting an English Corpus and a Question Answering System for Slovene","authors":"Uroš Šmajdek, Matjaž Zupanič, Maj Zirkelbach, Meta Jazbinšek","doi":"10.4312/slo2.0.2023.1.247-274","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.247-274","url":null,"abstract":"Developing effective question answering (QA) models for less-resourced languages like Slovene is challenging due to the lack of proper training data. Modern machine translation tools can address this issue, but this presents another challenge: the given answers must be found in their exact form within the given context since the model is trained to locate answers and not generate them. To address this challenge, we propose a method that embeds the answers within the context before translation and evaluate its effectiveness on the SQuAD 2.0 dataset translated using both eTranslation and Google Cloud translator. The results show that by employing our method we can reduce the rate at which answers were not found in the context from 56% to 7%. We then assess the translated datasets using various transformer-based QA models, examining the differences between the datasets and model configurations. To ensure that our models produce realistic results, we test them on a small subset of the original data that was human-translated. The results indicate that the primary advantages of using machine-translated data lie in refining smaller multilingual and monolingual models. For instance, the multilingual CroSloEngual BERT model fine-tuned and tested on Slovene data achieved nearly equivalent performance to one fine-tuned and tested on English data, with 70.2% and 73.3% questions answered, respectively. While larger models, such as RemBERT, achieved comparable results, correctly answering questions in 77.9% of cases when fine-tuned and tested on Slovene compared to 81.1% on English, fine-tuning with English and testing with Slovene data also yielded similar performance.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.33-68
Jakob Lenardič, Kristina Pahor de Maiti
{"title":"Grammatical and Pragmatic Aspects of Slovenian Modality in Socially Unacceptable Facebook Comments","authors":"Jakob Lenardič, Kristina Pahor de Maiti","doi":"10.4312/slo2.0.2023.1.33-68","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.33-68","url":null,"abstract":"This paper investigates the grammatical and pragmatic uses of epistemic and deontic modal expressions in a corpus of Slovenian socially acceptable and unacceptable Facebook comments. We propose a set of modals that do not interpretatively vary in their modality type in order to enable robust corpus searches and reliable quantification of the results. We show that deontic, but not epistemic, modals are significantly more frequent in socially unacceptable comments, and specifically that they favour violent discourse. We complement the quantitative findings with a qualitative analysis of the discursive roles played by the modals. We explore how pragmatic communicative strategies such as hedging, boosting, and face-saving arise from the underlying syntactic and semantic properties of the modal expressions, such as the modal force and clausal syntax.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.138-159
David Bordon
{"title":"Govoriš nevronsko?","authors":"David Bordon","doi":"10.4312/slo2.0.2023.1.138-159","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.138-159","url":null,"abstract":"Namen prispevka je predstaviti raziskavo preverjanja razumljivosti nerevidiranih strojno prevedenih spletnih besedil. Primarni udeleženci v raziskavi so bili splošni bralci in ne izurjeni prevajalci ali popravljalci strojnih prevodov. Gre za prvo tovrstno raziskavo, ki je bila izvedena za slovenski jezik. Cilj raziskave je bil preveriti, v kolikšni meri so nerevidirani strojni prevodi razumljivi splošnemu bralstvu, pri čemer sem se posvetil tudi vplivu besedilnega in slikovnega konteksta. Preverjal sem prevode prevajalnikov Google Translate in eTranslation. Raziskava je bila izvedena z anketo, v kateri so udeleženci odgovarjali na vprašanja, ki so preverjala razumevanje spremljajočega besedilnega segmenta, v katerem je bila napaka. Rezultati nudijo vpogled v trenutno stopnjo razvoja strojnih prevajalnikov, ne z vidika storilnosti pri njihovem popravljanju, ampak z vidika, koliko jih razume ciljno bralstvo. Na koncu članka nudim novo evalvacijo izvornih segmentov, ki sem jih v začetku leta 2023 ponovno prevedel, tokrat še s prevajalnikom DeepL.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.118-137
Andrejka Žejn, Mojca Šorli
{"title":"Named Entities in Modernist Literary Texts","authors":"Andrejka Žejn, Mojca Šorli","doi":"10.4312/slo2.0.2023.1.118-137","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.118-137","url":null,"abstract":"This paper is a follow-up and elaboration of the paper published in the JTDH 2022 Conference Proceedings on manual semantic annotation of named entities based on a proposed set of annotations for a corpus of modernist literary texts. We first briefly describe the corpus and introduce the annotation scheme, then focus on the results of additional analyses, and conclude with further challenges and issues we identified with respect to established NER systems and practices of related projects. Overall, we identify several categories of proper names, foreign language elements, and bibliographic citations, but focus here on the challenges of annotating names of literary characters and place names, and provide examples of the results of preliminary analyses of these entities in the corpus.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Slovenscina 2.0Pub Date : 2023-09-12DOI: 10.4312/slo2.0.2023.1.69-90
Darja Fišer, Tjaša Konovšek, Andrej Pančur
{"title":"Referencing the Public by Populist and Non-populist Parties in the Slovene Parliament","authors":"Darja Fišer, Tjaša Konovšek, Andrej Pančur","doi":"10.4312/slo2.0.2023.1.69-90","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.69-90","url":null,"abstract":"The present moment raises many questions about the workings and resilience of parliamentary democracy in Western-type democracies, including the former socialist states of the East Central European region, where various forms of populism and illiberal democracy are taking shape. Among these, Slovenia is taken as a case study, since it is not only a former socialist state, but was also for a long time acknowledged as a post-socialist success story. Focusing on the central state institution in systems of parliamentary democracy, i.e. the parliament, and its members (MPs) this paper considers speech as performed during parliamentary sessions by MPs from populist and non-populist political parties between the years 1992 and 2018, the period of a fully democratic Slovene national parliament. It combines the methodological approaches of cultural history with corpus linguistics in order to map any possible differences in populist and non-populist discourse of MPs. Special attention is given to situations where MPs mentioned the public, thus testing the hypothesis that populist MPs engage more with the public as a part of their populist political style.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}