{"title":"A user-centric approach towards learning noise in web data","authors":"Julius Onyancha, V. Plekhanova","doi":"10.1109/ISKE.2017.8258833","DOIUrl":null,"url":null,"abstract":"The rate at which web data is collected, stored and accessed by web users has led to high levels of noisiness. As the amount of noise in web data increases, it becomes difficult to find useful information based on a specific user interest. Current research works consider noise as any data that does not form part of the main web page, they propose machine learning algorithms aimed at protecting the main web page content from irrelevant data such as advertisements, banners, external links etc. Depending on what a user is interested on the web, noise web data can be useful data but on the other hand, useful data can be noisy. To learn noise data in a web user profile, a new machine learning algorithm/tool is proposed in this paper. An experimental design setup is presented to validate the performance of the proposed algorithms. The results obtained are compared with the currently available noise web data reduction tools. The experimental results show that the proposed algorithms not only eliminate noise from a web user profile but learn prior to elimination. Learning of noise data prior to elimination contributes to the quality of user profile which is not addressed by the currently available tools.","PeriodicalId":208009,"journal":{"name":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE.2017.8258833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The rate at which web data is collected, stored and accessed by web users has led to high levels of noisiness. As the amount of noise in web data increases, it becomes difficult to find useful information based on a specific user interest. Current research works consider noise as any data that does not form part of the main web page, they propose machine learning algorithms aimed at protecting the main web page content from irrelevant data such as advertisements, banners, external links etc. Depending on what a user is interested on the web, noise web data can be useful data but on the other hand, useful data can be noisy. To learn noise data in a web user profile, a new machine learning algorithm/tool is proposed in this paper. An experimental design setup is presented to validate the performance of the proposed algorithms. The results obtained are compared with the currently available noise web data reduction tools. The experimental results show that the proposed algorithms not only eliminate noise from a web user profile but learn prior to elimination. Learning of noise data prior to elimination contributes to the quality of user profile which is not addressed by the currently available tools.