{"title":"生物序列中基序发现的贪婪两阶段Gibbs采样方法","authors":"L. Lifang, Jiao Licheng, Huo Hong-wei","doi":"10.1109/BMEI.2008.111","DOIUrl":null,"url":null,"abstract":"For the motif discovery problem of DNA sequences, a greedy two-stage Gibbs sampling algorithm is presented, and the related software package is called Greedy MotifSAM. Based on position weight matrix (PWM) motif model, a greedy strategy for choosing the initial parameters of PWM is employed. Two sampling methods, site sampler and motif sampler, are used. Site sampler is used to find one occurrence per sequence of the motif in the dataset. Motif sampler is used to find zero or more non-overlapping occurrences of the motif in each sequence. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset. We use the binding sites (motif) information of eukaryotic transcription factors stored in TRANSFAC database to test our methods. The prediction accuracy, scalability and reliability are compared to several other methods.","PeriodicalId":89462,"journal":{"name":"Proceedings of the ... International Conference on Biomedical Engineering and Informatics. International Conference on Biomedical Engineering and Informatics","volume":"76 1","pages":"13-17"},"PeriodicalIF":0.0000,"publicationDate":"2008-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Greedy Two-stage Gibbs Sampling Method for Motif Discovery in Biological Sequences\",\"authors\":\"L. Lifang, Jiao Licheng, Huo Hong-wei\",\"doi\":\"10.1109/BMEI.2008.111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the motif discovery problem of DNA sequences, a greedy two-stage Gibbs sampling algorithm is presented, and the related software package is called Greedy MotifSAM. Based on position weight matrix (PWM) motif model, a greedy strategy for choosing the initial parameters of PWM is employed. Two sampling methods, site sampler and motif sampler, are used. Site sampler is used to find one occurrence per sequence of the motif in the dataset. Motif sampler is used to find zero or more non-overlapping occurrences of the motif in each sequence. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset. We use the binding sites (motif) information of eukaryotic transcription factors stored in TRANSFAC database to test our methods. The prediction accuracy, scalability and reliability are compared to several other methods.\",\"PeriodicalId\":89462,\"journal\":{\"name\":\"Proceedings of the ... International Conference on Biomedical Engineering and Informatics. International Conference on Biomedical Engineering and Informatics\",\"volume\":\"76 1\",\"pages\":\"13-17\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Conference on Biomedical Engineering and Informatics. International Conference on Biomedical Engineering and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BMEI.2008.111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Biomedical Engineering and Informatics. International Conference on Biomedical Engineering and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEI.2008.111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Greedy Two-stage Gibbs Sampling Method for Motif Discovery in Biological Sequences
For the motif discovery problem of DNA sequences, a greedy two-stage Gibbs sampling algorithm is presented, and the related software package is called Greedy MotifSAM. Based on position weight matrix (PWM) motif model, a greedy strategy for choosing the initial parameters of PWM is employed. Two sampling methods, site sampler and motif sampler, are used. Site sampler is used to find one occurrence per sequence of the motif in the dataset. Motif sampler is used to find zero or more non-overlapping occurrences of the motif in each sequence. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset. We use the binding sites (motif) information of eukaryotic transcription factors stored in TRANSFAC database to test our methods. The prediction accuracy, scalability and reliability are compared to several other methods.