Rafał Woźniak, Piotr Ożdżyński, Danuata Zakrzewska
{"title":"基于图表示的医学文本文档半聚类聚类分析","authors":"Rafał Woźniak, Piotr Ożdżyński, Danuata Zakrzewska","doi":"10.22630/ISIM.2018.7.3.19","DOIUrl":null,"url":null,"abstract":"The development of Internet resulted in an increasing number of online text re-positories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may signifi-cantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multilabel classification process. The performance of the method is examined by experiments conducted on real medical datasets.","PeriodicalId":148634,"journal":{"name":"Information System in Management","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CLUSTER ANALYSIS OF MEDICAL TEXT DOCUMENTS BY USING SEMI-CLUSTERING APPROACH BASED ON GRAPH REPRESENTATION\",\"authors\":\"Rafał Woźniak, Piotr Ożdżyński, Danuata Zakrzewska\",\"doi\":\"10.22630/ISIM.2018.7.3.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The development of Internet resulted in an increasing number of online text re-positories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may signifi-cantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multilabel classification process. The performance of the method is examined by experiments conducted on real medical datasets.\",\"PeriodicalId\":148634,\"journal\":{\"name\":\"Information System in Management\",\"volume\":\"204 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information System in Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22630/ISIM.2018.7.3.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information System in Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22630/ISIM.2018.7.3.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CLUSTER ANALYSIS OF MEDICAL TEXT DOCUMENTS BY USING SEMI-CLUSTERING APPROACH BASED ON GRAPH REPRESENTATION
The development of Internet resulted in an increasing number of online text re-positories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may signifi-cantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multilabel classification process. The performance of the method is examined by experiments conducted on real medical datasets.