基于K-means算法的“滴滴打车”微博话题分析

Yonghe Lu, Xin Xiong
{"title":"基于K-means算法的“滴滴打车”微博话题分析","authors":"Yonghe Lu, Xin Xiong","doi":"10.11648/J.AJIST.20190303.13","DOIUrl":null,"url":null,"abstract":"In the age of information and digitization, most users publish and obtain real-time information by microblog in social networks. Through effective means, we can accurately discover, organize, and utilize the valuable information hidden behind the massive short texts of social networks. Then we can explore hot topics in microblog, which is conducive to public opinion monitoring and marketing development. In today's society, Didi Taxi has become a necessary choice for many users to travel. This paper applied K-means clustering algorithm to topic analysis of Sina microblog short text on Didi Taxi. We crawled 17226 search results of microblog relevant to the topic of Didi Taxi from April 2019 to June 2019. After a series of data cleaning and data preprocessing steps, we used TF-IDF method to represent 15054 pieces of text data after processing. Through the evaluation of silhouette coefficient, we set the dimension of text 300 and the number of clusters 34 with K-means. Next, we extracted 8 topic clusters from 34 clusters, which include the advantages and disadvantages of Didi Taxi and its development status. Finally, we discussed the results by human check in semantic perspective. Through the topic analysis of microblog, we can understand the public’s attitude to Didi Taxi and provide the basis for the management of the government or company in the future.","PeriodicalId":50013,"journal":{"name":"Journal of the American Society for Information Science and Technology","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Topic Analysis of Microblog About “Didi Taxi” Based on K-means Algorithm\",\"authors\":\"Yonghe Lu, Xin Xiong\",\"doi\":\"10.11648/J.AJIST.20190303.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the age of information and digitization, most users publish and obtain real-time information by microblog in social networks. Through effective means, we can accurately discover, organize, and utilize the valuable information hidden behind the massive short texts of social networks. Then we can explore hot topics in microblog, which is conducive to public opinion monitoring and marketing development. In today's society, Didi Taxi has become a necessary choice for many users to travel. This paper applied K-means clustering algorithm to topic analysis of Sina microblog short text on Didi Taxi. We crawled 17226 search results of microblog relevant to the topic of Didi Taxi from April 2019 to June 2019. After a series of data cleaning and data preprocessing steps, we used TF-IDF method to represent 15054 pieces of text data after processing. Through the evaluation of silhouette coefficient, we set the dimension of text 300 and the number of clusters 34 with K-means. Next, we extracted 8 topic clusters from 34 clusters, which include the advantages and disadvantages of Didi Taxi and its development status. Finally, we discussed the results by human check in semantic perspective. Through the topic analysis of microblog, we can understand the public’s attitude to Didi Taxi and provide the basis for the management of the government or company in the future.\",\"PeriodicalId\":50013,\"journal\":{\"name\":\"Journal of the American Society for Information Science and Technology\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Society for Information Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11648/J.AJIST.20190303.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Society for Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.AJIST.20190303.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在信息化和数字化的时代,大多数用户在社交网络上通过微博发布和获取实时信息。通过有效的手段,我们可以准确地发现、组织和利用隐藏在社交网络海量短文本背后的有价值的信息。然后我们可以在微博中挖掘热点话题,这有利于舆论监测和营销发展。在当今社会,滴滴打车已经成为很多用户出行的必备选择。本文将k均值聚类算法应用于滴滴打车新浪微博短文本的话题分析。我们抓取了2019年4月至2019年6月与滴滴打车主题相关的微博搜索结果17226条。经过一系列的数据清洗和数据预处理步骤,我们使用TF-IDF方法表示处理后的15054条文本数据。通过对剪影系数的评估,我们设定文本的维数为300,K-means的聚类数为34。接下来,我们从34个聚类中提取了8个主题聚类,包括滴滴打车的优劣势和发展现状。最后,从语义的角度讨论了人工检查的结果。通过对微博的话题分析,可以了解公众对滴滴打车的态度,为未来政府或公司的管理提供依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Topic Analysis of Microblog About “Didi Taxi” Based on K-means Algorithm
In the age of information and digitization, most users publish and obtain real-time information by microblog in social networks. Through effective means, we can accurately discover, organize, and utilize the valuable information hidden behind the massive short texts of social networks. Then we can explore hot topics in microblog, which is conducive to public opinion monitoring and marketing development. In today's society, Didi Taxi has become a necessary choice for many users to travel. This paper applied K-means clustering algorithm to topic analysis of Sina microblog short text on Didi Taxi. We crawled 17226 search results of microblog relevant to the topic of Didi Taxi from April 2019 to June 2019. After a series of data cleaning and data preprocessing steps, we used TF-IDF method to represent 15054 pieces of text data after processing. Through the evaluation of silhouette coefficient, we set the dimension of text 300 and the number of clusters 34 with K-means. Next, we extracted 8 topic clusters from 34 clusters, which include the advantages and disadvantages of Didi Taxi and its development status. Finally, we discussed the results by human check in semantic perspective. Through the topic analysis of microblog, we can understand the public’s attitude to Didi Taxi and provide the basis for the management of the government or company in the future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
3.5 months
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信