Joni O. Salminen, Juan Corporan, Roope Marttila, Tommi Salenius, B. Jansen
{"title":"使用机器学习预测礼品行业网页排名:搜索引擎优化的因素","authors":"Joni O. Salminen, Juan Corporan, Roope Marttila, Tommi Salenius, B. Jansen","doi":"10.1145/3361570.3361578","DOIUrl":null,"url":null,"abstract":"We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.","PeriodicalId":414028,"journal":{"name":"Proceedings of the 9th International Conference on Information Systems and Technologies","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Using Machine Learning to Predict Ranking of Webpages in the Gift Industry: Factors for Search-Engine Optimization\",\"authors\":\"Joni O. Salminen, Juan Corporan, Roope Marttila, Tommi Salenius, B. Jansen\",\"doi\":\"10.1145/3361570.3361578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.\",\"PeriodicalId\":414028,\"journal\":{\"name\":\"Proceedings of the 9th International Conference on Information Systems and Technologies\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th International Conference on Information Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3361570.3361578\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Information Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3361570.3361578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Machine Learning to Predict Ranking of Webpages in the Gift Industry: Factors for Search-Engine Optimization
We use machine learning to predict the search engine rank of webpages. We use a list of keywords for 30 content blogs of an e-commerce company in the gift industry to retrieve 733 content pages occupying the first-page Google rankings and predict their rank using 30 ranking factors. We test two models, Light Gradient Boosting Machine (LightGBM) and Extreme Gradient Boosted Decision Trees (XGBoost), finding that XGBoost performs better for predicting actual search rankings, with an average accuracy of 0.86. The feature analysis shows the most impactful features are (a) internal and external links, (b) security of the web domain, and (c) length of H3 headings, and the least impactful features are (a) keyword mentioned in domain address, (b) keyword mentioned in the H1 headings, and (c) overall number of keyword mentions in the text. The results highlight the persistent importance of links in search-engine optimization. We provide actionable insights for online marketers and content creators.