{"title":"Web opinions analysis with scalable distance-based clustering","authors":"Christopher C. Yang, T. D. Ng","doi":"10.1109/ISI.2009.5137273","DOIUrl":null,"url":null,"abstract":"Due to the advance of Web 2.0 technologies, a large volume of web opinions are available in computer-mediated communication sites such as forums and blogs. Many of these web opinions involve terrorism and crime related issues. For instances, some terrorist groups may use web forums to propagandize their ideology, some may post threaten messages, and some criminals may recruit members or identify victims through web social networks. Analyzing and clustering Web opinions are extremely challenging. Unlike regular documents, web opinions usually appear as short and sparse text messages. Using typical document clustering techniques on web opinions produce unsatisfying result. In this work, we propose the scalable distance-based clustering technique for web opinions clustering. We have conducted experiments and benchmarked with the density-based algorithm. It shows that it obtains higher micro and macro accuracy. This web opinions clustering technique is useful in identifying the themes of discussions in web social networks and studying their development as well as the interactions of active participants.","PeriodicalId":210911,"journal":{"name":"2009 IEEE International Conference on Intelligence and Security Informatics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Intelligence and Security Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISI.2009.5137273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Due to the advance of Web 2.0 technologies, a large volume of web opinions are available in computer-mediated communication sites such as forums and blogs. Many of these web opinions involve terrorism and crime related issues. For instances, some terrorist groups may use web forums to propagandize their ideology, some may post threaten messages, and some criminals may recruit members or identify victims through web social networks. Analyzing and clustering Web opinions are extremely challenging. Unlike regular documents, web opinions usually appear as short and sparse text messages. Using typical document clustering techniques on web opinions produce unsatisfying result. In this work, we propose the scalable distance-based clustering technique for web opinions clustering. We have conducted experiments and benchmarked with the density-based algorithm. It shows that it obtains higher micro and macro accuracy. This web opinions clustering technique is useful in identifying the themes of discussions in web social networks and studying their development as well as the interactions of active participants.