Slovenščina 2.0: empirical, applied and interdisciplinary research最新文献

筛选
英文 中文
Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address 将原始抄本转换为带注释且对齐的TEI-XML语料库:塞尔维亚语地址形式语料库的示例
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.123-144
Dolores Lemmenmeier-Batinić
{"title":"Converting raw transcripts into an annotated and turn-aligned TEI-XML corpus: the example of the Corpus of Serbian Forms of Address","authors":"Dolores Lemmenmeier-Batinić","doi":"10.4312/SLO2.0.2021.1.123-144","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.123-144","url":null,"abstract":"This paper describes the procedure of building a TEI-XML corpus of spoken Serbian starting from raw transcripts. The corpus consists of semi–structured interviews, which were gathered with the aim of investigating forms of address in Serbian. The interviews were thoroughly transcribed according to GAT transcribing conventions. However, the transcription was carried out without tools that would control the validity of the GAT syntax, or align the transcript with the audio records. In order to offer this resource to a broader audience, we resolved the inconsistencies in the original transcripts, normalised the semi-orthographic transcriptions and converted the corpus into a TEI-format for transcriptions of speech. Further, we enriched the corpus by tagging and lemmatising the data. Lastly, we aligned the corpus turns to the corresponding audio segments by using a force-alignment tool. In addition to presenting the main steps involved in converting the corpus to the XML-format, this paper also discusses current challenges in the processing of spoken data, and the implications of data re-use regarding transcriptions of speech. This corpus can be used for studying Serbian from the perspective of interactional linguistics, for investigating morphosyntax, grammar, lexicon and phonetics of spoken Serbian, for studying disfluencies, as well as for testing models for automatic speech recognition and forced alignment. The corpus is freely available for research purposes.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114146521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Avtomatsko razpoznavanja slovenskega govora za dnevnoinformativne oddaje
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.60-89
Lucija Gril, Mirjam Sepesy Maučec, Gregor Donaj, Andrej Žgank
{"title":"Avtomatsko razpoznavanja slovenskega govora za dnevnoinformativne oddaje","authors":"Lucija Gril, Mirjam Sepesy Maučec, Gregor Donaj, Andrej Žgank","doi":"10.4312/SLO2.0.2021.1.60-89","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.60-89","url":null,"abstract":"Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128788196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Učno E-okolje Slovenščina na dlani: izzivi in rešitve
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.181-215
Darinka Verdonik, Simona Majhenič, Špela Antloga, Sandi Majninger, Marko Ferme, Kaja Dobrovoljc, Simona Pulko, Mira Krajnc Ivič, Natalija Ulčnik
{"title":"Učno E-okolje Slovenščina na dlani: izzivi in rešitve","authors":"Darinka Verdonik, Simona Majhenič, Špela Antloga, Sandi Majninger, Marko Ferme, Kaja Dobrovoljc, Simona Pulko, Mira Krajnc Ivič, Natalija Ulčnik","doi":"10.4312/SLO2.0.2021.1.181-215","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.181-215","url":null,"abstract":"Prispevek izhaja iz treh izzivov, ki jih zaznavamo pri pouku slovenščine v višjih razredih osnovnih šol in v srednjih šolah: kako odpraviti napake knjižne norme, ki vztrajajo v pisnih izdelkih učencev; kako izboljšati frazeološko kompetenco; kako izboljšati sporazumevalno jezikovno zmožnost. Ti izzivi so osrednja točka razvoja sodobnega učnega e-okolja Slovenščina na dlani, ki temelji na jezikovnih in informacijsko-komunikacijskih tehnologijah ter prinaša podporo prožnim oblikam poučevanja, poučevanju na daljavo, lajša učiteljevo delo, omogoča pa tudi motiviranje učencev prek elementov igrifikacije. V prispevku predstavljamo zasnovo in izvedbo vsakega od štirih vsebinskih sklopov e-okolja: pravopis, slovnica, frazeologija in besedila.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130373487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nadgradnja Zgodovinarskega indeksa citiranosti
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.216-235
Katja Meden, Ana Cvek
{"title":"Nadgradnja Zgodovinarskega indeksa citiranosti","authors":"Katja Meden, Ana Cvek","doi":"10.4312/SLO2.0.2021.1.216-235","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.216-235","url":null,"abstract":"Začetki Zgodovinarskega indeksa citiranja segajo v leto 2003, ko so raziskovalci Inštituta za novejšo zgodovino začeli spremljati in sistematično popisovati citate za prijave projektov in programov na ARRS. Citatni indeks je doživel nekaj nadgradenj, poskusov harmonizacije podatkov in prečiščevanja relacijskih baz, vendar je bilo v zadnjih letih ugotovljeno, da sistem ne zadostuje potrebam indeksatorjev in uporabnikov. Pred nadgradnjo smo izvedli analizo podatkov, kjer so se identificirale največje težave. Nadgradnja je potekala v dveh delih; v prvem delu smo nadgradili administrativni del, v drugem delu pa spletno aplikacijo. Zgodovinarski indeks citiranja je bil med nadgradnjo tehnično posodobljen in s tem oblikovan tako, da je intuitiven za indeksatorje in uporabnike.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"774 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124088248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hedging modal adverbs in Slovenian academic discourse 斯洛文尼亚语学术语篇中的模糊情态副词
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.145-180
Jakob Lenardic, Darja Fišer
{"title":"Hedging modal adverbs in Slovenian academic discourse","authors":"Jakob Lenardic, Darja Fišer","doi":"10.4312/SLO2.0.2021.1.145-180","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.145-180","url":null,"abstract":"This paper first presents a comparative analysis of modal adverbs in doctoral theses in the humanities and social sciences on the one hand, and in natural and technical sciences on the other from the 1.7-billion-token corpus of Slovenian academic texts KAS (Erjavec et al., 2019a). Using a randomized concordance analysis, we observe the epistemic and non-epistemic usage of the modal adverbs and show that epistemic adverbs are more characteristic of the humanities and social sciences theses. We also show that the non-epistemic dispositional meaning of possibility, which is most commonly used in natural and technical sciences theses, is not used as a hedging device. In the second part of the paper we compare the usage of a selected set of modals in bachelor’s, master’s and doctoral theses in order to chart how researchers’ approach to stance-taking changes at different proficiency levels in academic writing, showing that the observed increase in hedging devices in doctoral theses seems to be less a function of an increased proficiency level in academic writing as such and more the result of conceptual differences between undergraduate and postgraduate theses, only the latter of which are original research contributions with extensive discussion of the results.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116962347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Tri spletne aplikacije o slovenskih narečjih
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.236-261
Rok Mrvič, Špela Zupančič
{"title":"Tri spletne aplikacije o slovenskih narečjih","authors":"Rok Mrvič, Špela Zupančič","doi":"10.4312/SLO2.0.2021.1.236-261","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.236-261","url":null,"abstract":"Potreba po večji prisotnosti narečnih vsebin na spletu in njihovi interaktivni multimedijski predstavitvi, predvsem strokovno zasnovanih dialektoloških virov in orodij, je spodbudila interdisciplinarno sodelovanje različnih fakultet Univerze v Ljubljani, zlasti Filozofske fakultete (FF) in Fakultete za računalništvo in informatiko (FRI), ki je v letih 2017 in 2018 obrodilo sadove v obliki treh prostodostopnih in odprtokodnih spletnih aplikacij o slovenskih narečjih – to so Slovenski narečni atlas (SNA, 2017), Interaktivna karta slovenskih narečnih besedil (IKNB, 2018) in Slovar starega orodja v govoru Loškega Potoka (SSOLP, 2018). Članek v prvem delu prinaša splošen pregled slovenskih spletnih dialektoloških virov in orodij, v drugem delu pa podrobnejšo predstavitev funkcionalnosti navedenih treh aplikacij, ki so uporabnikom trenutno na voljo. V diskusijskem delu pregleda je izpostavljen del okoliščin nastanka obravnavanih aplikacij in z nastankom povezanih omejitev, nakazane pa so tudi možne rešitve, ki bi jih veljalo preudariti za zagotovitev njihovega dolgoročnega razvoja.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123236694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Sign language lexicography: a case study of an online dictionary 手语词典编纂:一个在线词典的案例研究
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.90-122
Lucia Vlášková, Hana Strachoňová
{"title":"Sign language lexicography: a case study of an online dictionary","authors":"Lucia Vlášková, Hana Strachoňová","doi":"10.4312/SLO2.0.2021.1.90-122","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.90-122","url":null,"abstract":"As a growing field of study within sign language linguistics, sign language lexicography faces many challenges that have already been answered for audio-oral language material. In this paper, we present some of these challenges and methods developed to help navigate the complex lexical classification field. The described methods and strategies are implemented in the first Czech sign language (ČZJ) online dictionary, a part of the platform Dictio, developed at Masaryk University in Brno. We cover the topic of lemmatisation and how to decide what constitutes a lexeme in sign language. We introduce four types of expressions that qualify for a dictionary entry: a simple lexeme, a compound, a derivative, and a set phrase. We address the question of the place of classifier constructions and shape and size specifiers in a dictionary, given their peculiar semantic status. We maintain the standard classification of classifiers (whole entity and holding classifiers) and size and shape specifiers (SASSes; static and tracing specifiers). We provide arguments for separating the category of specifiers from the category of classifiers. We discuss the proper treatment of mouthings and mouth gestures concerning citation forms, derivation and translation. We show why it is difficult in sign language to distinguish synonyms from variants and how our proposed phonological criteria can help. We explain how to construct a semantic definition in a sign language and what is the solution for multiple meanings of one form. We offer simple guidelines for forming proper examples of use in a sign language. And finally, we briefly comment on the process of the translation between sign and spoken languages. We conclude the paper with a summary of roles that Dictio plays in the ČZJ-signing community.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122419001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Slovenščina 2.0: Language Technologies and Digital Humanities Slovenščina 2.0:语言技术和数字人文
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.I-VI
Darja Fišer, Tomaž Erjavec, Ajda Pretnar
{"title":"Slovenščina 2.0: Language Technologies and Digital Humanities","authors":"Darja Fišer, Tomaž Erjavec, Ajda Pretnar","doi":"10.4312/SLO2.0.2021.1.I-VI","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.I-VI","url":null,"abstract":"","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"285 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133003390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-lingual transfer of sentiment classifiers 情感分类器的跨语言迁移
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2021-07-06 DOI: 10.4312/SLO2.0.2021.1.1-25
M. Robnik-Sikonja, Kristjan Reba, I. Mozetič
{"title":"Cross-lingual transfer of sentiment classifiers","authors":"M. Robnik-Sikonja, Kristjan Reba, I. Mozetič","doi":"10.4312/SLO2.0.2021.1.1-25","DOIUrl":"https://doi.org/10.4312/SLO2.0.2021.1.1-25","url":null,"abstract":"Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms that recently show superior transfer performance. The first mechanism uses the trained models whose input is the joint numerical space for many languages as implemented in the LASER library. The second mechanism uses large pretrained multilingual BERT language models. Our experiments show that the transfer of models between similar languages is sensible, even with no target language data. The performance of cross-lingual models obtained with the multilingual BERT and LASER library is comparable, and the differences are language-dependent. The transfer with CroSloEngual BERT, pretrained on only three languages, is superior on these and some closely related languages.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127193710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Okrogla miza »(Bližnja) srečanja oblikovalcev jezikovne politike«
Slovenščina 2.0: empirical, applied and interdisciplinary research Pub Date : 2020-12-21 DOI: 10.4312/slo2.0.2020.1.92-112
I. Ferbežar, Igor Cetina, Alojz Ihan, Marko Stabej, Lana Zdravković, Tina Zupančič
{"title":"Okrogla miza »(Bližnja) srečanja oblikovalcev jezikovne politike«","authors":"I. Ferbežar, Igor Cetina, Alojz Ihan, Marko Stabej, Lana Zdravković, Tina Zupančič","doi":"10.4312/slo2.0.2020.1.92-112","DOIUrl":"https://doi.org/10.4312/slo2.0.2020.1.92-112","url":null,"abstract":"V Ljubljani sta med 6. in 8. 11. 2019 potekala 54. srečanje in javni posvet ALTE (Association of Language Testers in Europe). Srečanje na temo Enojezično testiranje v večjezični realnosti: jezikovne ideologije in njihov vpliv na jezikovno testiranje sta organizirala Univerza v Ljubljani, Filozofska fakulteta in njen Center za slovenščino kot drugi in tuji jezik pri Oddelku za slovenistiko. V tem okviru je 8. 11. 2019 potekala okrogla miza (Bližnja) srečanja oblikovalcev jezikovne politike. Objavljamo zapis posnetka pogovora sodelujočih na dogodku.","PeriodicalId":371035,"journal":{"name":"Slovenščina 2.0: empirical, applied and interdisciplinary research","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117049880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信