{"title":"基于侧信息的文本挖掘协同聚类","authors":"Ramya Elizabeth Thomas, S. Khan","doi":"10.1109/SAPIENCE.2016.7684152","DOIUrl":null,"url":null,"abstract":"Many of the text mining applications contain a huge amount of information from document in the form of text. This text can be very helpful for Text Clustering. This text also includes various kind of other information known as Side Information or Metadata. Examples of this side information include links to other web pages, title of the document, author name or date of Publication which are present in the text document. Such metadata may possess a lot of information for the clustering purposes. But this Side information may be sometimes noisy. Using such Side Information for producing clusters without filtering it, can result to bad quality of Clusters. So we use an efficient Feature Selection method to perform the mining process to select that Side Information which is useful for Clustering so as to maximize the advantages from using it. The proposed technique, CCSI (Co-Clustering with Side Information) system makes use of the process of Co-Clustering or Two-mode clustering which is a data mining technique that allows concurrently clustering of the rows and columns of a matrix.","PeriodicalId":340137,"journal":{"name":"2016 International Conference on Data Mining and Advanced Computing (SAPIENCE)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Co-Clustering with Side Information for Text mining\",\"authors\":\"Ramya Elizabeth Thomas, S. Khan\",\"doi\":\"10.1109/SAPIENCE.2016.7684152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many of the text mining applications contain a huge amount of information from document in the form of text. This text can be very helpful for Text Clustering. This text also includes various kind of other information known as Side Information or Metadata. Examples of this side information include links to other web pages, title of the document, author name or date of Publication which are present in the text document. Such metadata may possess a lot of information for the clustering purposes. But this Side information may be sometimes noisy. Using such Side Information for producing clusters without filtering it, can result to bad quality of Clusters. So we use an efficient Feature Selection method to perform the mining process to select that Side Information which is useful for Clustering so as to maximize the advantages from using it. The proposed technique, CCSI (Co-Clustering with Side Information) system makes use of the process of Co-Clustering or Two-mode clustering which is a data mining technique that allows concurrently clustering of the rows and columns of a matrix.\",\"PeriodicalId\":340137,\"journal\":{\"name\":\"2016 International Conference on Data Mining and Advanced Computing (SAPIENCE)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference on Data Mining and Advanced Computing (SAPIENCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAPIENCE.2016.7684152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Data Mining and Advanced Computing (SAPIENCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAPIENCE.2016.7684152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
摘要
许多文本挖掘应用程序都以文本的形式包含大量来自文档的信息。这篇文章对文本聚类很有帮助。该文本还包括各种其他信息,称为附加信息或元数据。这些附加信息的例子包括链接到其他网页、文件标题、作者姓名或出版日期,这些都出现在文本文件中。这样的元数据可能具有用于集群目的的大量信息。但这些附带信息有时可能是嘈杂的。使用这样的Side Information来生成集群而不进行过滤,可能会导致集群的质量很差。因此,我们使用一种高效的特征选择方法来进行挖掘过程,以选择对聚类有用的侧信息,从而最大限度地利用它的优势。所提出的技术CCSI (Co-Clustering with Side Information)系统利用了Co-Clustering或双模式聚类的过程,这是一种允许对矩阵的行和列同时聚类的数据挖掘技术。
Co-Clustering with Side Information for Text mining
Many of the text mining applications contain a huge amount of information from document in the form of text. This text can be very helpful for Text Clustering. This text also includes various kind of other information known as Side Information or Metadata. Examples of this side information include links to other web pages, title of the document, author name or date of Publication which are present in the text document. Such metadata may possess a lot of information for the clustering purposes. But this Side information may be sometimes noisy. Using such Side Information for producing clusters without filtering it, can result to bad quality of Clusters. So we use an efficient Feature Selection method to perform the mining process to select that Side Information which is useful for Clustering so as to maximize the advantages from using it. The proposed technique, CCSI (Co-Clustering with Side Information) system makes use of the process of Co-Clustering or Two-mode clustering which is a data mining technique that allows concurrently clustering of the rows and columns of a matrix.