使用r的语料库:一个带有印尼语否定结构的介绍性注释

Linguistik Indonesia Pub Date : 2019-02-20 DOI:10.26499/LI.V36I1.71

Gede Primahadi Wijaya Rajeg, Karlina Denistia, I. M. Rajeg

{"title":"使用r的语料库:一个带有印尼语否定结构的介绍性注释","authors":"Gede Primahadi Wijaya Rajeg, Karlina Denistia, I. M. Rajeg","doi":"10.26499/LI.V36I1.71","DOIUrl":null,"url":null,"abstract":"This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly attracts specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak.","PeriodicalId":221379,"journal":{"name":"Linguistik Indonesia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"WORKING WITH A LINGUISTIC CORPUS USING R: AN INTRODUCTORY NOTE WITH INDONESIAN NEGATING CONSTRUCTION\",\"authors\":\"Gede Primahadi Wijaya Rajeg, Karlina Denistia, I. M. Rajeg\",\"doi\":\"10.26499/LI.V36I1.71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly attracts specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak.\",\"PeriodicalId\":221379,\"journal\":{\"name\":\"Linguistik Indonesia\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Linguistik Indonesia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26499/LI.V36I1.71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistik Indonesia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26499/LI.V36I1.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文通过对印尼语否定结构的一系列基于语料库的分析，展示了R在语料库语言学中统一数据科学的使用。这些数据是基于在线新闻语料库中的c1700万个单词标记，该语料库是印度尼西亚莱比锡语料库的一部分。我们发现tidak是我们语料库中最常见的形式。其次，我们发现tak对[ter-X-kan]模式的否定谓词的类型频率显著高于tidak;这一发现为印尼语参考语法中的描述提供了数量上的细微差别，说明(i)在今天的印尼语中，否定关联谓词也很常见，而(ii)否定关联谓词的强制性是过去的用法。最后，我们通过应用独特的词素分析来完善我们的第二个发现，以确定与tidak相比，tak强烈吸引了[ter-X-kan]图式中的特定动词;这一发现为潮汐和tak提供了更深层次的特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

WORKING WITH A LINGUISTIC CORPUS USING R: AN INTRODUCTORY NOTE WITH INDONESIAN NEGATING CONSTRUCTION

This paper demonstrates the use of R for a unified data science in corpus linguistics via a series of corpus-based analyses on Indonesian Negating Construction. The data is based on c17-million word-tokens of an online-news corpus, a part of the Indonesian Leipzig Corpora. We identified that tidak is the most frequent form in our corpus. Next, we found that tak has significantly higher type frequency for negated-predicates with [ter-X-kan] schema compared to tidak; this finding provides a quantitative nuance against a description in an Indonesian reference grammar, stating that (i) in present-day Indonesian tidak is also common to negate ter- related predicates, while (ii) the compulsoriness of tak to negate ter- predicates is a past usage. Lastly, we refine our second finding by applying Distinctive Collexeme Analysis to determine that tak strongly attracts specific verbs predominantly in the [ter-X-kan] schema compared to tidak; this finding offers a deeper characterisation for tidak and tak.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Linguistik Indonesia

自引率

0.00%

发文量