Feature Selection using Machine Learning Techniques Based on Search Engine Parameters

Willy K. Portier, Yujian Li, B. A. Kouassi
{"title":"Feature Selection using Machine Learning Techniques Based on Search Engine Parameters","authors":"Willy K. Portier, Yujian Li, B. A. Kouassi","doi":"10.1145/3432291.3432308","DOIUrl":null,"url":null,"abstract":"In the last two decades, Internet visibility became mandatory for any companies wishing to get exposure and get revenues. Among many ways to be visible on the Internet, one of the most important is to be on top of search engines' results for keywords relative to companies' business. It is the art of Search Engine Optimization (SEO), which is a collection of techniques to get more traffic from a search engine. More a website is SEO optimized, thus more search engines give it a high ranking on results' pages for a maximal exposure. So, Google, with 90% market share worldwide, is the main search engine outside of China (Baidu) and Russia (Yandex), and its algorithm is like a black box all marketers want to discover. Google claims to have more than 200 features in his algorithm made to rank results for queries among billions of pages. This article tries different machine learning methods to determine the most important parameters using a selection of 30 features in a dataset made with around 28,000 observations. A binary classification approach was done to detect if a keyword can be found or not in Top10 search engine result. During the simulation, the importance of features was determined to find the most important parameters used for building related search results. According to the research result, it leads that there are three kinds of parameters which influence the process of ranking the results on search engine Google for web pages: editorial features, notoriety features and technical features. Moreover, few features with minimum importance were found, for example, the low importance of using \"https\" protocol in a web resource.","PeriodicalId":126684,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Signal Processing and Machine Learning","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Signal Processing and Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3432291.3432308","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In the last two decades, Internet visibility became mandatory for any companies wishing to get exposure and get revenues. Among many ways to be visible on the Internet, one of the most important is to be on top of search engines' results for keywords relative to companies' business. It is the art of Search Engine Optimization (SEO), which is a collection of techniques to get more traffic from a search engine. More a website is SEO optimized, thus more search engines give it a high ranking on results' pages for a maximal exposure. So, Google, with 90% market share worldwide, is the main search engine outside of China (Baidu) and Russia (Yandex), and its algorithm is like a black box all marketers want to discover. Google claims to have more than 200 features in his algorithm made to rank results for queries among billions of pages. This article tries different machine learning methods to determine the most important parameters using a selection of 30 features in a dataset made with around 28,000 observations. A binary classification approach was done to detect if a keyword can be found or not in Top10 search engine result. During the simulation, the importance of features was determined to find the most important parameters used for building related search results. According to the research result, it leads that there are three kinds of parameters which influence the process of ranking the results on search engine Google for web pages: editorial features, notoriety features and technical features. Moreover, few features with minimum importance were found, for example, the low importance of using "https" protocol in a web resource.
基于搜索引擎参数的机器学习特征选择
在过去的二十年里,对于任何希望获得曝光和收入的公司来说,互联网的知名度都是必不可少的。在众多让自己在互联网上可见的方式中,最重要的一种是在与公司业务相关的关键词搜索引擎结果中名列前茅。这是搜索引擎优化(SEO)的艺术,它是一种从搜索引擎获得更多流量的技术集合。一个网站的SEO优化得越多,就会有越多的搜索引擎给它在结果页面上的高排名,从而获得最大的曝光率。因此,拥有全球90%市场份额的谷歌是中国(百度)和俄罗斯(Yandex)以外的主要搜索引擎,其算法就像一个所有营销人员都想发现的黑匣子。谷歌声称,在他的算法中有200多个功能,用于在数十亿页中对查询结果进行排名。本文尝试使用不同的机器学习方法来确定最重要的参数,使用大约28,000个观测数据集中的30个特征。采用二值分类方法检测关键词是否能在Top10搜索引擎结果中找到。在模拟过程中,确定特征的重要性,以找到最重要的参数,用于构建相关搜索结果。根据研究结果,得出影响网页在谷歌搜索引擎上排名的参数有三种:编辑特征、恶名特征和技术特征。此外,我们还发现了一些最不重要的特性,例如,在web资源中使用“https”协议的重要性很低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信