{"title":"基于距离树混合方法的微卫星基因组数据挖掘","authors":"Umang, P. Bharti, Akhtar Husain","doi":"10.1109/ICCCS55188.2022.10079353","DOIUrl":null,"url":null,"abstract":"Microsatellites are molecular markers with ubiquitous repeat patterns found in genome sequences. They are the primary source of studying interspecies variations, gene discovery, disease identification, and hypervariability in genome sequences. However, in vivo analysis of sequence data is costly and time-consuming. In addition, many next-generation sequencing tools have been developed to analyze sequence data having several new features and objectives. Therefore, researchers always need a wide range of in silico microsatellite exploring tools to examine research data apart from extracting microsatellites. This study aims to identify additional features of simple sequence repeats, such as their location in CDS, t- RNA, and rRNA regions. Also, to mark them as coding, the noncoding and coding-non-coding areas of upper bounds and lower bounds, the flanking sequences, statistics information of repeats, and other genomic features using pattern matching and computational geometry applications using range tree search algorithms. This study may help researchers retrieve and analyze the enhanced simple sequence repeats features and fill the gap for future studies and applications.","PeriodicalId":149615,"journal":{"name":"2022 7th International Conference on Computing, Communication and Security (ICCCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Mining of Microsatellites in Genomic Data Using a Hybrid Approach with Range Tree Applications\",\"authors\":\"Umang, P. Bharti, Akhtar Husain\",\"doi\":\"10.1109/ICCCS55188.2022.10079353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microsatellites are molecular markers with ubiquitous repeat patterns found in genome sequences. They are the primary source of studying interspecies variations, gene discovery, disease identification, and hypervariability in genome sequences. However, in vivo analysis of sequence data is costly and time-consuming. In addition, many next-generation sequencing tools have been developed to analyze sequence data having several new features and objectives. Therefore, researchers always need a wide range of in silico microsatellite exploring tools to examine research data apart from extracting microsatellites. This study aims to identify additional features of simple sequence repeats, such as their location in CDS, t- RNA, and rRNA regions. Also, to mark them as coding, the noncoding and coding-non-coding areas of upper bounds and lower bounds, the flanking sequences, statistics information of repeats, and other genomic features using pattern matching and computational geometry applications using range tree search algorithms. This study may help researchers retrieve and analyze the enhanced simple sequence repeats features and fill the gap for future studies and applications.\",\"PeriodicalId\":149615,\"journal\":{\"name\":\"2022 7th International Conference on Computing, Communication and Security (ICCCS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Conference on Computing, Communication and Security (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCS55188.2022.10079353\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Computing, Communication and Security (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS55188.2022.10079353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mining of Microsatellites in Genomic Data Using a Hybrid Approach with Range Tree Applications
Microsatellites are molecular markers with ubiquitous repeat patterns found in genome sequences. They are the primary source of studying interspecies variations, gene discovery, disease identification, and hypervariability in genome sequences. However, in vivo analysis of sequence data is costly and time-consuming. In addition, many next-generation sequencing tools have been developed to analyze sequence data having several new features and objectives. Therefore, researchers always need a wide range of in silico microsatellite exploring tools to examine research data apart from extracting microsatellites. This study aims to identify additional features of simple sequence repeats, such as their location in CDS, t- RNA, and rRNA regions. Also, to mark them as coding, the noncoding and coding-non-coding areas of upper bounds and lower bounds, the flanking sequences, statistics information of repeats, and other genomic features using pattern matching and computational geometry applications using range tree search algorithms. This study may help researchers retrieve and analyze the enhanced simple sequence repeats features and fill the gap for future studies and applications.