Acta Linguistica Academica最新文献

筛选
英文 中文
Neural text summarization for Hungarian 匈牙利语的神经文本摘要
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-29 DOI: 10.1556/2062.2022.00577
Zijian Győző Yang
{"title":"Neural text summarization for Hungarian","authors":"Zijian Győző Yang","doi":"10.1556/2062.2022.00577","DOIUrl":"https://doi.org/10.1556/2062.2022.00577","url":null,"abstract":"One of the most important NLP tasks for the industry today is to produce an extract from longer text documents. This task is one of the hottest topics for the researchers and they have created some solutions for English. There are two types of the text summarization called extractive and abstractive. The goal of the first task is to find the relevant sentences from the text, while the second one should generate the extraction based on the original text. In this research I have built the first solutions for Hungarian text summarization systems both for extractive and abstractive subtasks. Different kinds of neural transformer-based methods were used and evaluated. I present in this publication the first Hungarian abstractive summarization tool based on mBART and mT5 models, which gained state-of-the-art results.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41444405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-lingual transfer of knowledge in distributional language models: Experiments in Hungarian 分布语言模型中知识的跨语言迁移:匈牙利语实验
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-22 DOI: 10.1556/2062.2022.00580
Attila Novák, Borbála Novák
{"title":"Cross-lingual transfer of knowledge in distributional language models: Experiments in Hungarian","authors":"Attila Novák, Borbála Novák","doi":"10.1556/2062.2022.00580","DOIUrl":"https://doi.org/10.1556/2062.2022.00580","url":null,"abstract":"In this paper, we argue that the very convincing performance of recent deep-neural-model-based NLP applications has demonstrated that the distributionalist approach to language description has proven to be more successful than the earlier subtle rule-based models created by the generative school. The now ubiquitous neural models can naturally handle ambiguity and achieve human-like linguistic performance with most of their training consisting only of noisy raw linguistic data without any multimodal grounding or external supervision refuting Chomsky's argument that some generic neural architecture cannot arrive at the linguistic performance exhibited by humans given the limited input available to children. In addition, we demonstrate in experiments with Hungarian as the target language that the shared internal representations in multilingually trained versions of these models make them able to transfer specific linguistic skills, including structured annotation skills, from one language to another remarkably efficiently.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43867203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Winograd schemata and other datasets for anaphora resolution in Hungarian 匈牙利语回指解析的Winograd模式和其他数据集
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-22 DOI: 10.1556/2062.2022.00575
Noémi Vadász, Noémi Ligeti-Nagy
{"title":"Winograd schemata and other datasets for anaphora resolution in Hungarian","authors":"Noémi Vadász, Noémi Ligeti-Nagy","doi":"10.1556/2062.2022.00575","DOIUrl":"https://doi.org/10.1556/2062.2022.00575","url":null,"abstract":"The Winograd Schema Challenge (WSC, proposed by Levesque, Davis & Morgenstern 2012) is considered to be the novel Turing Test to examine machine intelligence. Winograd schema questions require the resolution of anaphora with the help of world knowledge and commonsense reasoning. Anaphora resolution is itself an important and difficult issue in natural language processing, therefore, many other datasets have been created to address this issue. In this paper we look into the Winograd schemata and other Winograd-like datasets and the translations of the schemata to other languages, such as Chinese, French and Portuguese. We present the Hungarian translation of the original Winograd schemata and a parallel corpus of all the translations of the schemata currently available. We also adapted some other anaphora resolution datasets to Hungarian. We aim to discuss the challenges we faced during the translation/adaption process.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48321613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Principles of corpus querying: A discussion note 语料库查询的原则:讨论笔记
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-22 DOI: 10.1556/2062.2022.00581
Bálint Sass
{"title":"Principles of corpus querying: A discussion note","authors":"Bálint Sass","doi":"10.1556/2062.2022.00581","DOIUrl":"https://doi.org/10.1556/2062.2022.00581","url":null,"abstract":"Nowadays, it is quite common in linguistics to base research on data instead of introspection. There are countless corpora – both raw and linguistically annotated – available to us which provide essential data needed. Corpora are large in most cases, ranging from several million words to some billion words in size, clearly not suitable to investigate word by word by close reading. Basically, there are two ways to retrieve data from them: (1) through a query interface or (2) directly by automatic text processing. Here we present principles on how to soundly and effectively collect linguistic data from corpora by querying i.e. without knowledge of programming to directly manipulate the data. What is worth thinking about, which tools to use, what to do by default and how to solve problematic cases. In sum, how to obtain correct and complete data from corpora to do linguistic research.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44300339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PrevDistro: An open-access dataset of Hungarian preverb constructions PrevDistro:匈牙利preverb结构的开放访问数据集
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-22 DOI: 10.1556/2062.2022.00578
Ágnes Kalivoda
{"title":"PrevDistro: An open-access dataset of Hungarian preverb constructions","authors":"Ágnes Kalivoda","doi":"10.1556/2062.2022.00578","DOIUrl":"https://doi.org/10.1556/2062.2022.00578","url":null,"abstract":"Hungarian has a prolific system of complex predicate formation combining a separable preverb and a verb. These combinations can enter a wide range of constructions, with the preverb preserving its separability to some extent, depending on the construction in question. The primary concern of this paper is to advance the investigation of these phenomena by presenting PrevDistro (Preverb Distributions), an open-access dataset containing more than 41.5 million corpus occurrences of 49 preverb construction types. The paper gives a detailed introduction to PrevDistro, including design considerations, methodology and the resulting dataset's main characteristics.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41295681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morphology aware data augmentation with neural language models for online hybrid ASR 基于神经语言模型的在线混合ASR形态学感知数据增强
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-21 DOI: 10.1556/2062.2022.00582
Balázs Tarján, T. Fegyó, P. Mihajlik
{"title":"Morphology aware data augmentation with neural language models for online hybrid ASR","authors":"Balázs Tarján, T. Fegyó, P. Mihajlik","doi":"10.1556/2062.2022.00582","DOIUrl":"https://doi.org/10.1556/2062.2022.00582","url":null,"abstract":"Recognition of Hungarian conversational telephone speech is challenging due to the informal style and morphological richness of the language. Neural Network Language Models (NNLMs) can provide remedy for the high perplexity of the task; however, their high complexity makes them very difficult to apply in the first (single) pass of an online system. Recent studies showed that a considerable part of the knowledge of NNLMs can be transferred to traditional n-grams by using neural text generation based data augmentation. Data augmentation with NNLMs works well for isolating languages; however, we show that it causes a vocabulary explosion in a morphologically rich language. Therefore, we propose a new, morphology aware neural text augmentation method, where we retokenize the generated text into statistically derived subwords. We compare the performance of word-based and subword-based data augmentation techniques with recurrent and Transformer language models and show that subword-based methods can significantly improve the Word Error Rate (WER) while greatly reducing vocabulary size and memory requirements. Combining subword-based modeling and neural language model-based data augmentation, we were able to achieve 11% relative WER reduction and preserve real-time operation of our conversational telephone speech recognition system. Finally, we also demonstrate that subword-based neural text augmentation outperforms the word-based approach not only in terms of overall WER but also in recognition of Out-of-Vocabulary (OOV) words.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49069067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The syntax of exceptive constructions in Arabic 阿拉伯语例外结构的句法
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-17 DOI: 10.1556/2062.2022.00520
Sameerah T. Saeed
{"title":"The syntax of exceptive constructions in Arabic","authors":"Sameerah T. Saeed","doi":"10.1556/2062.2022.00520","DOIUrl":"https://doi.org/10.1556/2062.2022.00520","url":null,"abstract":"This paper investigates the underlying structure of exceptive constructions with the Arabic exceptive marker ’illā and reveals the existence of two types of constructions: r(estrictive)-exceptives and s(ubtractive)-exceptives. The underlying factor that distinguishes these two constructions relates to the existence of a subtraction domain in s-exceptive constructions and its absence in r-exceptives. This distinction suggests that the exceptive marker ’illā ‘except' has a different syntactic function in these two constructions. Furthermore, this difference in the functional status of ’illā suggests a different internal and external structure of the ’illā-XP in each of these constructions. I argue that while the ’illā-XP in r-exceptive constructions projects a R-ExP, involving a covert antecedent in the form of the NPIs ’aḥad ‘one' or shay’ ‘thing’ and is a nominal adjunct, in s-exceptive constructions the ’illā-XP forms an S-ExP and can be classified into connected and free exceptives.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44515294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two grammatical categories for please in Mandarin imperative clauses 普通话祈使句中请的两个语法范畴
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-16 DOI: 10.1556/2062.2022.00507
Wei-Cherng Sam Jheng
{"title":"Two grammatical categories for please in Mandarin imperative clauses","authors":"Wei-Cherng Sam Jheng","doi":"10.1556/2062.2022.00507","DOIUrl":"https://doi.org/10.1556/2062.2022.00507","url":null,"abstract":"This paper develops a syntax-pragmatics interface analysis of imperative clauses overtly marked by two grammatical categories of qing ‘please’ in Mandarin and refines the division of labor among directive force, clause typing and deontic modality jointly computing the interpretative properties of qing imperatives. We present a cluster of properties to differentiate between the two categories of qing and observe that qing1 denotes obligation imposed on the addressee by the speaker, while qing2 denotes permission with which the addressee is allowed to perform an action or make true a state of affairs according to a set of norms. It is argued that qing1 is an imperative mood head, while qing2 is an imperative adverb, but both are endowed with a similar internal composition and extent of the phrasal hierarchies of the CP periphery, and their disparate imperative properties can be ascribed to the addressee-oriented and subject-oriented deontic modality (Tsai & Portner 2008). Following Haegeman & Hill's (2013) version of the Speech Act Phrase, we claim that a speech act layer externally merges to the topmost position of ForceP to drive the syntax-pragmatics interface computation of the speaker-addressee relation and to mediate the imperative mood and clause typing represented in the CP layer.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41359647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effects of time duration and bilingualism/trilingualism on second-language production 时间长度和双语/三语对第二语言产生的影响
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-11-08 DOI: 10.1556/2062.2022.00569
Hsiu-ling Hsu
{"title":"The effects of time duration and bilingualism/trilingualism on second-language production","authors":"Hsiu-ling Hsu","doi":"10.1556/2062.2022.00569","DOIUrl":"https://doi.org/10.1556/2062.2022.00569","url":null,"abstract":"This investigation explored the effects of time duration and bilingualism/trilingualism on speakers' language production. A word-naming task was conducted under three conditions—700 ms, 1,000 ms, and unlimited time. The results showed that the participants incurred fewer errors and successfully corrected errors at 1,000 ms and unlimited time; the bilingual/trilingual advantage was identified in error self-repairs at 1,000 ms; and trilinguals were more strategic in correcting errors than monolinguals and bilinguals. This suggests that unlimited time did not ensure higher accuracy in lexical production and efficient error correction, and that 1,000 ms was the optimal timeframe for processing single monosyllabic Chinese characters.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43833355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysing phatic interaction through speech acts – A discussion note 通过言语行为分析哈里发互动——讨论笔记
IF 0.5 3区 文学
Acta Linguistica Academica Pub Date : 2022-08-24 DOI: 10.1556/2062.2022.00533
J. House, D. Kádár
{"title":"Analysing phatic interaction through speech acts – A discussion note","authors":"J. House, D. Kádár","doi":"10.1556/2062.2022.00533","DOIUrl":"https://doi.org/10.1556/2062.2022.00533","url":null,"abstract":"In this discussion note we explore why and how we need a pragmalinguistic and speech act-anchored approach to systematically study a key pragmatic phenomenon: phatic interaction. By so doing, we aim to draw attention to a special issue which we plan to publish in Acta Linguistica Academica. First, we present a general model through which phatic interaction can be replicably studied across different data types and linguacultures, by breaking it down to speech act types occurring in different slots of an interaction. Second, we provide a case study involving Chinese learners of English as a foreign language, in order to illustrate how the proposed framework can be put to actual use.","PeriodicalId":37594,"journal":{"name":"Acta Linguistica Academica","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46228617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信