PREPROCESSING STRATEGY IN WEB-MINING: RECOMMENDED OR INEVITABLE?

Dror Ben-Ami
{"title":"PREPROCESSING STRATEGY IN WEB-MINING: RECOMMENDED OR INEVITABLE?","authors":"Dror Ben-Ami","doi":"10.33965/IS2019_201905R002","DOIUrl":null,"url":null,"abstract":"Web-browsing users' behavior is one of the fascinating, attractive and interesting subjects, specially from socio-technological perspectives. Companies and learning-organizations are investing huge human resources, efforts and capital to follow their users' behavior, trying to find out and plot their users' profiles. Knowing, understanding and predicting users' tangible and intangible behavior help these organizations to focus on users' needs and interests. Thus, companies and learning-organizations can generate significant intellectual capital and use this asset efficiently and effectively later. The intellectual capital asset can target into business-based and knowledge-based activities, such as pointed-newsletters, targeted markets, crystalizing pricing policies and much more. The users' profiles usually categorized into different classifications, such as social aspects, personality traits, purchase behavior, cognitive behavior, and more. The raw data for this research was collected from web users. Around one hundred thousand internet-society users from one the OECD countries were the basis of the research population. The research examines users' behavior through the net, after collecting wide range of data elements during few months period, using progressive data analysis tools and techniques, afterwards. The case study relies of real work (project), conducted from July-2018 till Feb-2019. The purpose was to recover and re-analyze the collected data after performing improper and incorrect analysis procedures by the original staff. The paper deals with the possible missteps which can happen during the data preprocessing steps in data-intensive projects and tries to understand the consequences of those early missteps can have on the end result. Specifically, the paper recounts those missteps based on real experience during a web mining project. The paper presents a guide on the procedures and decisions which should be taken to avoid or at least minimize critical mistakes during the data preprocessing step.","PeriodicalId":155412,"journal":{"name":"12th IADIS International Conference Information Systems 2019","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"12th IADIS International Conference Information Systems 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/IS2019_201905R002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Web-browsing users' behavior is one of the fascinating, attractive and interesting subjects, specially from socio-technological perspectives. Companies and learning-organizations are investing huge human resources, efforts and capital to follow their users' behavior, trying to find out and plot their users' profiles. Knowing, understanding and predicting users' tangible and intangible behavior help these organizations to focus on users' needs and interests. Thus, companies and learning-organizations can generate significant intellectual capital and use this asset efficiently and effectively later. The intellectual capital asset can target into business-based and knowledge-based activities, such as pointed-newsletters, targeted markets, crystalizing pricing policies and much more. The users' profiles usually categorized into different classifications, such as social aspects, personality traits, purchase behavior, cognitive behavior, and more. The raw data for this research was collected from web users. Around one hundred thousand internet-society users from one the OECD countries were the basis of the research population. The research examines users' behavior through the net, after collecting wide range of data elements during few months period, using progressive data analysis tools and techniques, afterwards. The case study relies of real work (project), conducted from July-2018 till Feb-2019. The purpose was to recover and re-analyze the collected data after performing improper and incorrect analysis procedures by the original staff. The paper deals with the possible missteps which can happen during the data preprocessing steps in data-intensive projects and tries to understand the consequences of those early missteps can have on the end result. Specifically, the paper recounts those missteps based on real experience during a web mining project. The paper presents a guide on the procedures and decisions which should be taken to avoid or at least minimize critical mistakes during the data preprocessing step.
web挖掘中的预处理策略:推荐还是不可避免?
网络浏览用户的行为是一个迷人的、有吸引力的和有趣的主题之一,特别是从社会技术的角度来看。公司和学习型组织正在投入大量人力资源、精力和资本来跟踪用户的行为,试图找出并绘制用户的档案。了解、理解和预测用户的有形和无形行为有助于这些组织关注用户的需求和兴趣。因此,公司和学习型组织可以产生重要的智力资本,并在以后有效地使用这种资产。智力资本资产可以瞄准基于业务和基于知识的活动,例如定向通讯、目标市场、明确定价政策等等。用户的个人资料通常分为不同的类别,如社交方面、个性特征、购买行为、认知行为等。这项研究的原始数据是从网络用户那里收集的。来自一个经合组织国家的大约10万互联网社会用户是研究人口的基础。该研究在几个月的时间里收集了大量的数据元素,然后使用先进的数据分析工具和技术,通过网络检查用户的行为。案例研究依赖于2018年7月至2019年2月进行的实际工作(项目)。其目的是在原工作人员执行不正确的分析程序后,对收集到的数据进行恢复和重新分析。本文讨论了在数据密集型项目的数据预处理过程中可能出现的错误,并试图理解这些早期错误可能对最终结果产生的后果。具体来说,本文根据一个web挖掘项目的实际经验,叙述了这些失误。本文介绍了在数据预处理步骤中应采取的程序和决策的指南,以避免或至少最小化关键错误。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信