基于项目间隔和项目属性约束的PrefixSpan文本挖掘

Issei Sato, Yu Hirate, H. Yamana
{"title":"基于项目间隔和项目属性约束的PrefixSpan文本挖掘","authors":"Issei Sato, Yu Hirate, H. Yamana","doi":"10.1109/ICDEW.2006.142","DOIUrl":null,"url":null,"abstract":"Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals--the number of items between any two adjacent items in a sequence--and the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Text Mining using PrefixSpan constrained by Item Interval and Item Attribute\",\"authors\":\"Issei Sato, Yu Hirate, H. Yamana\",\"doi\":\"10.1109/ICDEW.2006.142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals--the number of items between any two adjacent items in a sequence--and the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.\",\"PeriodicalId\":331953,\"journal\":{\"name\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2006.142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2006.142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

将传统的顺序模式挖掘方法应用于文本数据中,会提取出许多不感兴趣的模式,这增加了对提取模式的解释时间。为了解决这一问题,我们提出了一种新的序列模式挖掘算法,该算法采用了以下两个约束条件。一种是根据项目间隔(序列中任意两个相邻项目之间的项目数量)选择序列,另一种是根据项目属性选择序列。通过使用图书类别中的Amazon客户评论,我们已经证实,我们的方法能够比传统方法更快地提取模式,并且能够在保留感兴趣的模式的同时更好地排除不感兴趣的模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Text Mining using PrefixSpan constrained by Item Interval and Item Attribute
Applying conventional sequential pattern mining methods to text data extracts many uninteresting patterns, which increases the time to interpret the extracted patterns. To solve this problem, we propose a new sequential pattern mining algorithm by adopting the following two constraints. One is to select sequences with regard to item intervals--the number of items between any two adjacent items in a sequence--and the other is to select sequences with regard to item attributes. Using Amazon customer reviews in the book category, we have confirmed that our method is able to extract patterns faster than the conventional method, and is better able to exclude uninteresting patterns while retaining the patterns of interest.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信