Social Media Data Extraction Method Benchmarking Comparison

Zhenhua Sui
{"title":"Social Media Data Extraction Method Benchmarking Comparison","authors":"Zhenhua Sui","doi":"10.11648/J.IJDST.20190502.12","DOIUrl":null,"url":null,"abstract":"Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.","PeriodicalId":281025,"journal":{"name":"International Journal on Data Science and Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Data Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.IJDST.20190502.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.
社交媒体数据提取方法标杆比较
如今,社交媒体的使用越来越广泛。作为最受欢迎的媒体,很多信息都是通过Twitter传播的,尤其是考虑到美国总统特朗普把Twitter作为他主要的官方免费新闻发布渠道。因此,Twitter等社交媒体平台成为提取信息的重要来源,然后通过文本分析模型对信息进行进一步分析,解决决策问题。在本文中,我们首先研究了几种文本分析方法,然后将研究多种tweet检索方法/软件:Twitter analytics, Application for Twitter, Python + Tweepy和Next analytics。与特征相关的七个标准被应用于比较方法的易用性、提取时间和适应大数据的能力。考虑到我们的结果可能是近似的,因为我们可能无法观察到软件的所有功能和特性,我们的结果表明,Python + Tweepy方法在应用于大数据项目(百万推文及以上)和实时文本数据提取时是最理想的方法。Next Analytics是一款可以通过Excel以更方便的方式检索历史文本信息的软件,并且能够追溯到更久远的时期,这可以为社交媒体分析提供更好的功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信