Genetic analysis of cabbages and related cultivated plants using the bag-of-words model

Hana Owsianková, Dan Faltýnek, O. Kucera
{"title":"Genetic analysis of cabbages and related cultivated plants using the bag-of-words model","authors":"Hana Owsianková, Dan Faltýnek, O. Kucera","doi":"10.2478/lf-2018-0011","DOIUrl":null,"url":null,"abstract":"Abstract In this study, we aim to introduce the analytical method bag-of-words, which is mainly used as a tool for the analysis (document classification, authorship attribution and so on; e.g. [1, 2]) of natural languages. Quantitative linguistic methods similar to bag-of-words (e.g. Damerau–Levenshtein distance in the paper by Serva and Petroni [3]) have been used for the mapping of language evolution within the field of glottochronology. We attempt to apply this method in the field of biological taxonomy – on the Brassicaceae (Cruciferae) family. The subjects of our interest are well-known cultivated crops, which at first sight are morphologically very different and culturally perceived as objects of different interests (e.g. oil from oilseed rape, turnip as animal feed and cabbage as a side dish). Despite the phenotypic divergence of these crops, they are very closely related, which is not morphologically obvious at first sight. For this reason, we think that Brassicaceae crops are appropriate illustrative examples for introducing the method. For the analysis, we use genetic markers (internal transcribed spacer [ITS] and maturase K [matK]). Until now, the bag-of-words model has not been used for biological taxonomisation purposes; therefore, the results of the bagof-words analysis are compared with the existing very well-developed Brassica taxonomy. Our goal is to present a method that is suitable for language development reconstruction as well as possibly being usable for biological taxonomy purposes.","PeriodicalId":354532,"journal":{"name":"Linguistic Frontiers","volume":"24 7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistic Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/lf-2018-0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract In this study, we aim to introduce the analytical method bag-of-words, which is mainly used as a tool for the analysis (document classification, authorship attribution and so on; e.g. [1, 2]) of natural languages. Quantitative linguistic methods similar to bag-of-words (e.g. Damerau–Levenshtein distance in the paper by Serva and Petroni [3]) have been used for the mapping of language evolution within the field of glottochronology. We attempt to apply this method in the field of biological taxonomy – on the Brassicaceae (Cruciferae) family. The subjects of our interest are well-known cultivated crops, which at first sight are morphologically very different and culturally perceived as objects of different interests (e.g. oil from oilseed rape, turnip as animal feed and cabbage as a side dish). Despite the phenotypic divergence of these crops, they are very closely related, which is not morphologically obvious at first sight. For this reason, we think that Brassicaceae crops are appropriate illustrative examples for introducing the method. For the analysis, we use genetic markers (internal transcribed spacer [ITS] and maturase K [matK]). Until now, the bag-of-words model has not been used for biological taxonomisation purposes; therefore, the results of the bagof-words analysis are compared with the existing very well-developed Brassica taxonomy. Our goal is to present a method that is suitable for language development reconstruction as well as possibly being usable for biological taxonomy purposes.
白菜及相关栽培植物的词袋模型遗传分析
摘要本文主要介绍词袋分析方法,该方法主要用于文献分类、作者归属等分析工具;例如[1,2])的自然语言。类似于词袋的定量语言学方法(如Serva和Petroni[3]论文中的Damerau-Levenshtein距离)已被用于绘制语言年表领域的语言演化图。我们试图将这种方法应用于十字花科植物的生物分类。我们感兴趣的主题是众所周知的栽培作物,乍一看,它们在形态上非常不同,在文化上被视为不同的兴趣对象(例如,油菜的油,作为动物饲料的萝卜和作为配菜的卷心菜)。尽管这些作物在表型上存在差异,但它们之间的亲缘关系非常密切,这种亲缘关系乍一看并不明显。因此,我们认为十字花科作物是介绍该方法的合适例子。为了进行分析,我们使用了遗传标记(内部转录间隔物[ITS]和成熟酶K [matK])。到目前为止,词袋模型还没有被用于生物分类目的;因此,将袋词分析的结果与现有的非常发达的芸苔属分类学进行比较。我们的目标是提出一种既适用于语言发展重建,又可能用于生物分类学目的的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信