Design and Visualization of Python Web Scraping Based on Third-Party Libraries and Selenium Tools

{"title":"Design and Visualization of Python Web Scraping Based on Third-Party Libraries and Selenium Tools","authors":"","doi":"10.25236/ajcis.2023.060904","DOIUrl":null,"url":null,"abstract":"The aim of this study is to analyze the data from Chinese movie websites to understand the trend distribution of movie genres and ratings. It used Python third-party libraries and the Selenium tool to crawl data from various movie websites and platforms. Douban Films is one of the most prominent applications. In order to realize the data analysis of Douban movies, the crawler program was designed from multiple perspectives, including two data capture channels, keyword search movies and screening search rankings. By viewing the movie details function module, it can achieve the requirements of obtaining movie ratings, stars, online viewing addresses, cloud disk search links and film and television download resources. Visualization of the data results was conducted using the third-party Python graph library Matplotlib. The results showed that the film rating and the total number of ratings are important factors that ordinary users refer to when watching films. Drama films are the most popular type of film among producers and film companies, while adventure films are the type of film that is easily overlooked by viewers. These data analyses can reflect the public's viewing trends under the guidance of consumers.","PeriodicalId":387664,"journal":{"name":"Academic Journal of Computing & Information Science","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Journal of Computing & Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25236/ajcis.2023.060904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The aim of this study is to analyze the data from Chinese movie websites to understand the trend distribution of movie genres and ratings. It used Python third-party libraries and the Selenium tool to crawl data from various movie websites and platforms. Douban Films is one of the most prominent applications. In order to realize the data analysis of Douban movies, the crawler program was designed from multiple perspectives, including two data capture channels, keyword search movies and screening search rankings. By viewing the movie details function module, it can achieve the requirements of obtaining movie ratings, stars, online viewing addresses, cloud disk search links and film and television download resources. Visualization of the data results was conducted using the third-party Python graph library Matplotlib. The results showed that the film rating and the total number of ratings are important factors that ordinary users refer to when watching films. Drama films are the most popular type of film among producers and film companies, while adventure films are the type of film that is easily overlooked by viewers. These data analyses can reflect the public's viewing trends under the guidance of consumers.
基于第三方库和Selenium工具的Python Web抓取的设计与可视化
本研究的目的是分析来自中国电影网站的数据,了解电影类型和评分的趋势分布。它使用Python第三方库和Selenium工具从各种电影网站和平台抓取数据。豆瓣电影是最突出的应用之一。为了实现豆瓣电影的数据分析,从多个角度设计了爬虫程序,包括两个数据捕获通道,关键词搜索电影和筛选搜索排名。通过查看电影详情功能模块,可以实现获取电影评分、星级、在线观看地址、云盘搜索链接、影视下载资源等需求。使用第三方Python图形库Matplotlib对数据结果进行可视化。结果表明,电影评分和评分总数是普通用户观看电影时参考的重要因素。剧情片是最受制片人和电影公司欢迎的电影类型,而冒险片是最容易被观众忽视的电影类型。这些数据分析可以反映大众在消费者引导下的收视趋势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信