Protein Data Modelling for Concurrent Sequential Patterns

2014 25th International Workshop on Database and Expert Systems Applications Pub Date : 2014-12-04 DOI:10.1109/DEXA.2014.19

Jing Lu, M. Keech, Cuiqing Wang

引用次数: 2

Abstract

Protein sequences from the same family typically share common patterns which imply their structural function and biological relationship. The challenge of identifying protein motifs is often addressed through mining frequent item sets and sequential patterns, where post-processing is a useful technique. Earlier work has shown that Concurrent Sequential Patterns mining can be applied in bioinformatics, e.g. to detect frequently occurring concurrent protein sub-sequences. This paper presents a companion approach to data modelling and visualisation, applying it to real-world protein datasets from the PROSITE and NCBI databases. The results show the potential for graph-based modelling in representing the integration of higher level patterns common to all or nearly all of the protein sequences.

查看原文本刊更多论文

并发序列模式的蛋白质数据建模

来自同一家族的蛋白质序列通常具有共同的模式，这暗示了它们的结构功能和生物学关系。识别蛋白质基序的挑战通常通过挖掘频繁的项目集和顺序模式来解决，其中后处理是一种有用的技术。早期的工作表明，并发序列模式挖掘可以应用于生物信息学，例如检测频繁发生的并发蛋白质亚序列。本文提出了一种数据建模和可视化的配套方法，将其应用于来自PROSITE和NCBI数据库的真实蛋白质数据集。结果表明，基于图的建模在表示所有或几乎所有蛋白质序列共有的更高级别模式的集成方面具有潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 25th International Workshop on Database and Expert Systems Applications

自引率

0.00%

发文量