Katarína Gajdošová, Michaela Mošaťová, Petra Švancarová
{"title":"Errors in the Congruent Attribute Among Students Learning Slovak as a Foreign Language (Learner Corpus -Based )","authors":"Katarína Gajdošová, Michaela Mošaťová, Petra Švancarová","doi":"10.2478/jazcas-2023-0034","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0034","url":null,"abstract":"Abstract This paper analyses error rates in the congruent attribute in texts written by students learning Slovak as a foreign language. The material, which was qualitatively analysed, comes from the pilot version of the learner corpus errkorp-pilot. The paper defines the most common types of errors in the congruent attribute and interprets the causes of their origin. The most common errors include the wrong congruence with the grammatical gender and with the case of the defining noun. Errors are usually caused by the following factors: transfer from the L1 language, excessive generalization of the rule in the target language, or insufficient knowledge of the grammatical rule.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"15 1","pages":"163 - 172"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expressing Measure in Czech (A Corpus-Based Study)","authors":"Marie Mikulová","doi":"10.2478/jazcas-2023-0029","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0029","url":null,"abstract":"Abstract In the contribution, we provide a theory-based and corpus-verified description of expressions for measure in Czech. We demonstrate that the measure expressions may modify quantity of entities (approximately ten boys), internal characteristics of events (he works a lot), properties (very big) and relations (completely without sound). We distinguish between the measure expressions that are an answer to the question To what extent? (Extent-modifiers) and expressions that modify an answer to the question How many? (Quantity-modifiers). The Extent-modifiers are formally, structurally and semantically more diverse than the Quantity-modifiers. For the Quantity-modifiers a list of forms and functions is provided. Theoretical knowledge stemming from the analysis will subsequently be used to improve the annotation in the Prague Dependency Treebanks. It can be also useful for other semantically-oriented descriptions of language.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"11 1","pages":"108 - 118"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keywords in Religious Literature of 17th and 18th Centuries in Light of the Data from the Electronic Corpus of 17th - and 18th -Century Polish Texts","authors":"Magdalena Majdak","doi":"10.2478/jazcas-2023-0028","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0028","url":null,"abstract":"Abstract This paper discusses the application of standard keyword extraction methods from corpus linguistics for the study of old Polish language. The unfolding analysis is based on writings included in the Electronic Corpus of 17th- and 18th-century Polish Texts. The aim of this analysis is to select keywords from over two million tokens derived from texts tagged as religion in the corpus and compare them with the reference corpus containing over nine million tokens, while verifying the applicability of the log-likelihood method for the analysis of old Polish language and developing a part of the research model.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"48 1","pages":"100 - 107"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When is a Crisis Realy a Crisis? Using NLP and Corpus Linguistic Methods to Reveal Diferences in Migration Discourse Acros Czech Media","authors":"Ondřej Pekáček, Irene Elmerot","doi":"10.2478/jazcas-2023-0053","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0053","url":null,"abstract":"Abstract This article presents an interdisciplinary analysis of discourses on refugees, asylum seekers, immigrants, and migrants (RASIM) in mainstream and alternative media in the Czech Republic. Using techniques from corpus linguistics (CL) and natural language processing (NLP) and drawing on insights from media sociology, we demonstrate the value of an interdisciplinary approach for conducting robust research that can inform policymakers and media practitioners. Our analysis of nearly one million documents from January 2015 to February 2023 reveals distinctive terms and phrases used by alternative media, highlighting the growing divergence between the mainstream and alternative media discourse and its intensity over different periods. These findings have implications for understanding the mobilization of anti-systemic groups, particularly those on the far right.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"2 1","pages":"369 - 380"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Through Derivational Relations to Valency of Nonverbal Predicates in the Nomvallex Lexicon","authors":"V. Kolářová, Václava Kettnerová, Jiří Mírovský","doi":"10.2478/jazcas-2023-0036","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0036","url":null,"abstract":"Abstract NomVallex is a manually annotated valency lexicon of Czech nouns and adjectives that enables a comparison of valency properties of derivationally related lexical units. We present new developments in how the lexicon facilitates research into changes in valency across part-of-speech categories and derivational types. In particular, it provides links from derived lexical units to their base lexical units and also allows to search and display a base lexical unit together with all lexical units directly derived from it. Using an automatic procedure, any difference in valency between two derivationally related lexical units is specified. As a case study, focusing on nouns and adjectives directly or indirectly motivated by verbs, the facilities provided by the lexicon are used to show differences in what ways the particular deverbal derivatives representing various derivational types express the valency complementation standing in the base verbal construction in the subject position.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"15 1","pages":"182 - 192"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is it Possible to Re-Educate Roberta? Expert-Driven Machine Learning for Punctuation Correction","authors":"J. Machura, Hana Zizková, Adam Frémund, Jan Svec","doi":"10.2478/jazcas-2023-0052","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0052","url":null,"abstract":"Abstract Although Czech rule-based tools for automatic punctuation insertion rely on extensive grammar and achieve respectable precision, the pre-trained Transformers outperform rule-based systems in precision and recall (Machura et al. 2022). The Czech pre-trained RoBERTa model achieves excellent results, yet a certain level of phenomena is ignored, and the model partially makes errors. This paper aims to investigate whether it is possible to retrain the RoBERTa language model to increase the number of sentence commas the model correctly detects. We have chosen a very specific and narrow type of sentence comma, namely the sentence comma delimiting vocative phrases, which is clearly defined in the grammar and is very often omitted by writers. The chosen approaches were further tested and evaluated on different types of texts.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"111 1","pages":"357 - 368"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of Computer and Corpus Tols in the Research of a 19th Century German -Language Manuscript Bok of Notes and Extracts","authors":"Martin Braxatoris, Anita Braxatorisová","doi":"10.2478/jazcas-2023-0046","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0046","url":null,"abstract":"Abstract The study explores the possibilities of using computer and corpus tools in the interpretation of texts of the genre of book of notes and extracts; these are documents consisting of extracts and modified excerpts from contemporary press and literature, records of the author’s own thoughts, etc. Samuel Ferjenčík’s manuscript is a Germanlanguage document by a Slovak author intended for private use; cited or adapted passages are usually given without any reference to the source. The paper introduces the problems of automatic identification of the source base, which relate to the application of OCR and content similarity detection tools. It discusses the results of text matching, which revealed several manipulations of source texts, especially substitutions, indicating attitudes and priority problems in the author’s thought-world. It further interprets the results of the use of the Sketch Engine corpus manager tools by which the frequency of occurrence of key terms and their collocability were investigated, paying special attention to substituted words. The paper is an example of the application of computer and corpus-linguistics methods to the interpretation of literary texts, which is represented by a number of current studies in the field of digital humanities. The proposed approaches are applicable to research on other books of notes and extracts, topical in the context of research trends related to egodocuments, as well as to textual research on monu ments of other genres.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"67 1","pages":"287 - 300"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CapekDraCor: A New Contribution to the European Programable Drama Corpora","authors":"Petr Porízka","doi":"10.2478/jazcas-2023-0042","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0042","url":null,"abstract":"Abstract The aim of this paper is to present the new CapekDraCor corpus and the DraCor project with its research-oriented concept of a programmable corpora focused on quantitative analyses within the framework of computational literary studies. This digital platform extends the possibilities of large-scale drama analysis with a focus on the dramatic character(s). The basic operationalisation is the interaction within a dramatic configuration, i.e., the scenic co-presence of two speakers, from which network data are automatically extracted, both global networks of interactions of dramas and data characterising individual actors, i.e., literary characters. The paper demonstrates the CapekDraCor corpus, a new contribution to the extensive DraCor database, and presents the way the data are processed with respect to their specific multi-layered structure. The corpus contains all the plays written by Karel and Josef Čapek and the data are processed in a standardized format based on XML and general TEI guidelines for processing drama with a defined basic drama tagset. CapekDraCor also uses the newly created EZdrama format for data processing, which works as an intermediate step from .txt to .xml file as a lightweight YAML-like markup language. A file in this format can be automatically converted into a DraCor-ready XML file with a TEI header. The advantage of the programmable corpora concept is the possibility to use suitably structured data for drama research outside the DraCor platform and with other methods or tools for textual analysis. Simultaneously, this approach moves the researcher from the technical requirements of the analysis to operationalised computational analysis based on research questions and pre-prepared and flexible tools. DraCor is a unique open infrastructure (both in terms of data and tools) for the analysis of European drama, currently comprising 15 corpora in 10 different languages with a total of about 3,000 plays from a wide range of periods.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"7 1","pages":"244 - 253"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexical Diversity and Language Impairment","authors":"Natalia Časnochová Zozuk","doi":"10.2478/jazcas-2023-0047","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0047","url":null,"abstract":"Abstract The development of artificial intelligence tools has seen an enormous growth recently. Linguistic artificial intelligence tools are being successfully applied in the field of speech analysis and discourse. In our study, we used automatic NLP tools to detect differences in picture description in the discourse of people diagnosed with Alzheimer’s disease (AD), Mild Cognitive Impairment (MCI) and healthy people. A measure of lexical diversity was used to compare discourse complexity. Transcripts of recordings of the probands within the EWA project were used in the study. From the multiple comparisons, we found that there is a statistically significant difference between healthy people and people suffering from MCI and AD. Our results indicate that healthy people have more lexical diversity than people suffering from MCI and AD – a more diverse vocabulary in spontaneous speech, in our case, when describing a picture.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"16 1","pages":"301 - 309"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaroslav David, Tereza Klemensová, Michal Místecký
{"title":"Appellativization of Proper Names – In the Perspective of Corpus Analysis","authors":"Jaroslav David, Tereza Klemensová, Michal Místecký","doi":"10.2478/jazcas-2023-0021","DOIUrl":"https://doi.org/10.2478/jazcas-2023-0021","url":null,"abstract":"Abstract The study deals with appellativization of proper names, using as its base selected personal names (surnames). Looking at opinion journalism texts in the Czech National Corpus, corpus SYN, version 11, we investigate aspects of word-formation within appellativization of personal names Masaryk, Beneš, Hitler, Stalin – including frequencies of parts of speech and word-formation types (derivation, composition) with respect to their productivity and word-formation potential.","PeriodicalId":262732,"journal":{"name":"Journal of Linguistics/Jazykovedný casopis","volume":"114 1","pages":"32 - 42"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139371941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}