Top-k utility-based gene regulation sequential pattern discovery

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2016-12-01 DOI:10.1109/BIBM.2016.7822529

Morteza Zihayat, Heydar Davoudi, Aijun An

{"title":"Top-k utility-based gene regulation sequential pattern discovery","authors":"Morteza Zihayat, Heydar Davoudi, Aijun An","doi":"10.1109/BIBM.2016.7822529","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining has been used in bioinformatics to discover frequent gene regulation sequential patterns based on time course microarray datasets. While mining frequent sequences are important in biological studies for disease treatment, to date, most of the approaches do not consider the importance of the genes with respect to a disease being studied when identifying gene regulation sequential patterns. In addition, they focus on the more general up/down effects of genes in a microarray dataset and do not take into account the various degrees of expression during the mining process. As a result, the current techniques return too many sequences which may not be informative enough for biologists to explore relationships between the disease and underlying causes encoded in gene regulation sequences. In this paper, we propose a utility model by considering both the importance of genes with respect to a disease and their degrees of expression levels under a biological investigation. Then, we design a new method, called TU-SEQ, for identifying top-k high utility gene regulation sequential patterns from a time-course microarray dataset. The evaluation results show that our approach can effectively and efficiently discover key patterns representing meaningful gene regulation sequential patterns in a time course microarray dataset.","PeriodicalId":345384,"journal":{"name":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2016.7822529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Sequential pattern mining has been used in bioinformatics to discover frequent gene regulation sequential patterns based on time course microarray datasets. While mining frequent sequences are important in biological studies for disease treatment, to date, most of the approaches do not consider the importance of the genes with respect to a disease being studied when identifying gene regulation sequential patterns. In addition, they focus on the more general up/down effects of genes in a microarray dataset and do not take into account the various degrees of expression during the mining process. As a result, the current techniques return too many sequences which may not be informative enough for biologists to explore relationships between the disease and underlying causes encoded in gene regulation sequences. In this paper, we propose a utility model by considering both the importance of genes with respect to a disease and their degrees of expression levels under a biological investigation. Then, we design a new method, called TU-SEQ, for identifying top-k high utility gene regulation sequential patterns from a time-course microarray dataset. The evaluation results show that our approach can effectively and efficiently discover key patterns representing meaningful gene regulation sequential patterns in a time course microarray dataset.

查看原文本刊更多论文

Top-k基于效用的基因调控序列模式发现

序列模式挖掘已被应用于生物信息学中，用于发现基于时序微阵列数据集的频繁基因调控序列模式。虽然挖掘频繁序列在疾病治疗的生物学研究中很重要，但迄今为止，在确定基因调控序列模式时，大多数方法都没有考虑到基因对所研究疾病的重要性。此外，他们关注的是基因在微阵列数据集中更普遍的上/下效应，而没有考虑到挖掘过程中不同程度的表达。因此，目前的技术返回了太多的序列，这些序列可能不足以为生物学家探索基因调控序列中编码的疾病和潜在原因之间的关系提供足够的信息。在本文中，我们提出了一种实用新型，通过考虑基因对疾病的重要性及其在生物学研究中的表达程度。然后，我们设计了一种名为TU-SEQ的新方法，用于从时间过程微阵列数据集中识别top-k高效用基因调控序列模式。评估结果表明，我们的方法可以有效地在时间过程微阵列数据集中发现代表有意义的基因调控序列模式的关键模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量