Slovenscina 2.0最新文献

筛选
英文 中文
Praktični vidiki uporabe podbesednih enot v strojnem prevajanju slovenščina-angleščina 在斯洛文尼亚语到英语的机器翻译中使用子词单元的实用方面
Slovenscina 2.0 Pub Date : 2023-09-12 DOI: 10.4312/slo2.0.2023.1.275-301
Gregor Donaj, Mirjam Sepesy Maučec
{"title":"Praktični vidiki uporabe podbesednih enot v strojnem prevajanju slovenščina-angleščina","authors":"Gregor Donaj, Mirjam Sepesy Maučec","doi":"10.4312/slo2.0.2023.1.275-301","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.275-301","url":null,"abstract":"Večina sodobnih sistemov za strojno prevajanje temelji na arhitekturi nevronskih mrež. To velja za spletne ponudnike strojnega prevajanja, za raziskovalne sisteme in za orodja, ki so lahko v pomoč poklicnim prevajalcem v njihovi praksi. Čeprav lahko sisteme nevronskih mrež uporabljamo na običajnih centralnih procesnih enotah osebnih računalnikov in strežnikov, je za delovanje s smiselno hitrostjo potrebna uporaba grafičnih procesnih enot. Pri tem smo omejeni z velikostjo slovarja, kar zmanjšuje kakovost prevodov. Velikost slovarja besednih enot je še posebej pereč problem visoko pregibnih jezikov. Rešujemo ga z uporabo podbesednih enot, s katerimi dosežemo večjo pokritost jezika. V članku predstavljamo različne metode razcepljanja besed na podbesedne enote z različno velikimi slovarji in primerjamo njihovo uporabo v strojnem prevajalniku za jezikovni par slovenščina-angleščina. V primerjavo vključujemo še prevajalnik brez razcepljanja besed. Predstavljamo rezultate uspešnosti prevajanja z metriko BLEU, hitrosti učenja modelov in hitrosti prevajanja ter velikosti modelov. Dodajamo pregled praktičnih vidikov uporabe podbesednih enot v strojnem prevajalniku, ki ga uporabljamo skupaj z orodji za računalniško podprto prevajanje.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spremljevalni korpus Trendi in avtomatska kategorizacija 配套语料库 趋势和自动分类
Slovenscina 2.0 Pub Date : 2023-09-12 DOI: 10.4312/slo2.0.2023.1.161-188
Iztok Kosem, Jaka Čibej, Kaja Dobrovoljc, Taja Kuzman, Nikola Ljubešić
{"title":"Spremljevalni korpus Trendi in avtomatska kategorizacija","authors":"Iztok Kosem, Jaka Čibej, Kaja Dobrovoljc, Taja Kuzman, Nikola Ljubešić","doi":"10.4312/slo2.0.2023.1.161-188","DOIUrl":"https://doi.org/10.4312/slo2.0.2023.1.161-188","url":null,"abstract":"Prispevek predstavlja izdelavo korpusa Trendi, prvega spremljevalnega korpusa za slovenščino. Trenutna različica Trendi 2023-02 pokriva besedila od januarja 2019 do konca februarja 2023, vsebuje pa že več kot 700 milijonov pojavnic oz. več kot 586 milijonov besed. Namen korpusa je, da tako strokovni kot nestrokovni javnosti ponudi podatke o aktualni jezikovni rabi in omogoči spremljanje pojavljanja novih besed ter upadanja ali naraščanja rabe že obstoječih. Poleg same vsebine predstavimo tudi metodologijo in načela izdelave korpusa. Drugi del prispevka opisuje razvoj algoritma za avtomatsko kategorizacijo besedil z novičarskih portalov, ki je bil pripravljen za potrebe korpusa Trendi in tudi drugih korpusov s tovrstnimi besedili. Za namene algoritma je bil izdelan nabor 13 tematskih kategorij, ki so v veliki meri prekrivne z mednarodnimi standardi in kategorijami v primerljivih korpusih drugih jezikov. Na besedilih, označenih s kategorijami, smo naučili več različnih jezikovnih modelov in z najprimernejšim dosegli visoko zanesljivost določevanja tematike besedilom.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135826733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teoretska izhodišča in metodološki okvir pri izdelavi uporabnikom prijaznega spletišča: primer platforme SMeJse – Slovenščina kot manjšinski jezik 创建用户友好网站的理论起点和方法框架:SMEJse平台的一个例子——斯洛文尼亚语是少数民族语言
Slovenscina 2.0 Pub Date : 2017-12-30 DOI: 10.4312/SLO2.0.2017.2.85-112
Matejka Grgič
{"title":"Teoretska izhodišča in metodološki okvir pri izdelavi uporabnikom prijaznega spletišča: primer platforme SMeJse – Slovenščina kot manjšinski jezik","authors":"Matejka Grgič","doi":"10.4312/SLO2.0.2017.2.85-112","DOIUrl":"https://doi.org/10.4312/SLO2.0.2017.2.85-112","url":null,"abstract":"This paper aims to present some theoretical and methodological issues related to the online portal SLOVENSCINA KOT MANJSINSKI JEZIK – SMeJse / SLOVENIAN AS A MINORITY LANGUAGE – SMiLe where existent tools, materials and information for the development of linguistic skills and abilities in Slovenian are collected. The platform was established by SLORI – Slovenski raziskovalni institut / Slovenian research institute of Trieste, Italy, and the Dijaski dom S. Kosovela / Slovenian student’s center of Trieste, Italy. The purpose of the portal is to stimulate different usages of the current Slovenian language in the Slovenian-Italian contact area, particularly in Italy, with the aim of assuring high communication proficiency in all kinds and varieties of the Slovenian language (the so called “equilingualism”), a balanced bilingualism and also the development of lects, still within the Slovenian linguistic continuum.Specific language policies are particularly successful for the development of linguistic skills which enable proficiency in the minority language, as well as equilingualism and balanced bilingualism among the speakers of the minority group. Such policies are based on the implementation of measures for an increased exposure to different language uses and on the creation of the need of language use in circles and situations where compensatory strategies are unsuitable. The portal is based on the newest linguistic, sociolinguistic and psycholinguistic studies concerning the Slovenian language in Italy, on the Slovenian-Italian language contact and on the acquisition of the minority language. An analysis of the status of the Slovenian language in Italy, its perception and its phenomena, as well as the overview of some language policies and methodological frames, has shown a gap between the existent tools and the needs of the community of speakers.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"5 1","pages":"85-112"},"PeriodicalIF":0.0,"publicationDate":"2017-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev 根据法律限制允许访问斯洛文尼亚在线文本语料库
Slovenscina 2.0 Pub Date : 2016-09-27 DOI: 10.4312/SLO2.0.2016.2.189-219
T. Erjavec, Jaka Čibej, Darja Fišer
{"title":"Omogočanje dostopa do korpusov slovenskih spletnih besedil v luči pravnih omejitev","authors":"T. Erjavec, Jaka Čibej, Darja Fišer","doi":"10.4312/SLO2.0.2016.2.189-219","DOIUrl":"https://doi.org/10.4312/SLO2.0.2016.2.189-219","url":null,"abstract":"Web texts are becoming increasingly relevant sources of information, with web corpora useful for corpus linguistic studies and development of language technologies. Even though web texts are directly accessable, which substantially simplifies the collection procedure compilation of web corpora is still complex, time consuming and expensive. It is crucial that similar endeavours are not repeated, which is why it is necessary to make the created corpora easily and widely accessible both to researchers and a wider audience. While this is logistically and technically a straightforward procedure, legal constraints, such as copyright, privacy and terms of use severely hinder the dissemination of web corpora. This paper discusses legal conditions and actual practice in this area, gives an overview of current practices and proposes a range of mitigation measures on the example of the Janes corpus of Slovene user-generated content in order to ensure free and open dissemination of Slovene web corpora.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"4 1","pages":"189-219"},"PeriodicalIF":0.0,"publicationDate":"2016-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The value of the Janes corpus for Slovenian language standardization 简氏语料库对斯洛文尼亚语标准化的价值
Slovenscina 2.0 Pub Date : 2016-09-27 DOI: 10.4312/slo2.0.2016.2.1-37
Špela Arhar Holdt, K. Dobrovoljc
{"title":"The value of the Janes corpus for Slovenian language standardization","authors":"Špela Arhar Holdt, K. Dobrovoljc","doi":"10.4312/slo2.0.2016.2.1-37","DOIUrl":"https://doi.org/10.4312/slo2.0.2016.2.1-37","url":null,"abstract":"The main objective of this article is to assess the value of the Janes corpus for research in the field of language standardization. Unlike the existing reference corpora of written Slovenian, the newly available Janes corpus of user-generated content mostly consists of texts that have not been modified by a proofreading expert; it therefore offers a more realistic insight into the trends of language use, as well as the intuitiveness of existing language rules, within a wider language community. We illustrate this methodological potential in a case study of nominal phrases with nonagreeing premodifiers, such as solo petje and RTV prispevek, by comparing their usage in Janes and the reference Kres corpus. The results reveal: this type of phrases is used more often in Janes and includes a longer list of candidates than in Kres; both corpora include a large number of phrases with variant spelling as either one or two words, irrespective of the premodifier in question; and, somewhat surprising, Janes displays a more consistent language use, suggesting that prescriptive regulation actually increases the level of inconsistency in language use. The article, a revised and enhanced extension of a prior conference paper, concludes with a discussion on possible future approaches to this linguistic issue and advocates for inclusion of Janes into Slovenian language standardisation methodology.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"4 1","pages":"1-37"},"PeriodicalIF":0.0,"publicationDate":"2016-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
(Re)standardization in the Vice of National Identity: the Cases of Croatian, Serbian, Bosnian, and Montenegrin (再)民族认同副中的标准化:克罗地亚、塞尔维亚、波斯尼亚和黑山的案例
Slovenscina 2.0 Pub Date : 2015-12-01 DOI: 10.4312/slo2.0.2015.2.67-94
Vesna Požgaj Hadži, Tatjana Balažic Bulc
{"title":"(Re)standardization in the Vice of National Identity: the Cases of Croatian, Serbian, Bosnian, and Montenegrin","authors":"Vesna Požgaj Hadži, Tatjana Balažic Bulc","doi":"10.4312/slo2.0.2015.2.67-94","DOIUrl":"https://doi.org/10.4312/slo2.0.2015.2.67-94","url":null,"abstract":"Among different functions of linguistic standardization, the unifying, separatist, and prestige functions play a special role. In this paper, we focus on the separatist function, which calls for a redefinition of the status of standard languages. In addition, politics plays an important role within this process. In such cases we are often dealing with restandardization or – in other words – the reshaping of an already standardized language; however, on different terms. We present the results of such processes on the four successor-languages of the former Serbo-Croatian, i.e. Croatian, Serbian, Bosnian, and Montenegrin. All underwent numerous (necessary as well as unnecessary) changes following the separation, especially in lexis and phonetics, the changes bearing significant symbolic meaning. The reasons for changes are thus external (new sociopolitical order) as well as internal (change in the relation towards the neighboring standard languages, increased interest in linguistic matters, partisanship of individual linguists within institutions, etc.), and in both cases, closely linked to political structures.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"3 1","pages":"67-94"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Internet Slovene Research Summer Camp for Secondary School Pupils 斯洛文尼亚中学生研究夏令营
Slovenscina 2.0 Pub Date : 2015-12-01 DOI: 10.4312/slo2.0.2015.1.59-61
Darja Fišer
{"title":"Internet Slovene Research Summer Camp for Secondary School Pupils","authors":"Darja Fišer","doi":"10.4312/slo2.0.2015.1.59-61","DOIUrl":"https://doi.org/10.4312/slo2.0.2015.1.59-61","url":null,"abstract":"","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"3 1","pages":"59-61"},"PeriodicalIF":0.0,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Collocations and examples of use: a lexical-semantic approach to terminology 搭配和用法的例子:词汇-语义方法的术语
Slovenscina 2.0 Pub Date : 2014-12-01 DOI: 10.4312/SLO2.0.2014.1.41-61
N. Logar, P. Gantar, Iztok Kosem
{"title":"Collocations and examples of use: a lexical-semantic approach to terminology","authors":"N. Logar, P. Gantar, Iztok Kosem","doi":"10.4312/SLO2.0.2014.1.41-61","DOIUrl":"https://doi.org/10.4312/SLO2.0.2014.1.41-61","url":null,"abstract":"The paper describes the compilation of an online terminological database that also includes a lexical-semantic framework of terms in the form of collocations and examples of use. Both types of information were extracted from a specialised corpus automatically, using Word Sketch and GDEX functions in the Sketch Engine corpus tool. Each entry contains links to two corpora: the LSP corpus of the public relations field KoRP and the Gigafida corpus, a reference corpus of Slovene. Preliminary results of the survey conducted among the target users of the terminological database indicate that the information on the term’s typical collocations is very useful for fully understanding the term, its meaning and role in the context.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"2 1","pages":"41-61"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
WHAT WOULD DR MURRAY HAVE MADE OF THE OED ONLINE TODAY 穆雷博士会如何看待今天的在线牛津词典呢
Slovenscina 2.0 Pub Date : 2014-12-01 DOI: 10.4312/SLO2.0.2014.2.15-36
J. Simpson
{"title":"WHAT WOULD DR MURRAY HAVE MADE OF THE OED ONLINE TODAY","authors":"J. Simpson","doi":"10.4312/SLO2.0.2014.2.15-36","DOIUrl":"https://doi.org/10.4312/SLO2.0.2014.2.15-36","url":null,"abstract":"During the final years of the twentieth century the text of the Oxford English Dictionary (OED) was transformed from a print resource to a digital one. Surprisingly, the way in which data was structured in the print version lent itself fairly easily to this transformation. This paper looks briefly at the publishing history of the OED, and then at continuity and change in editorial policy across the two media, and finally at new options (such as data visualisation through graphs, charts, and animations, as well as linking through to other sources) that are opened to users of the dictionary as a result of its availability as a digital resource. The paper concludes that although Dr Murray, the dictionary’s original editor, would have been pleased by the way his text has migrated from the print to the digital medium, the real significance of the development is that the modern user can now begin to analyse language change, and not just the history of individual words, through the functionality of the OED Online web site.","PeriodicalId":36888,"journal":{"name":"Slovenscina 2.0","volume":"2 1","pages":"15-36"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70585701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信