基于形式概念分析的文档向量在文档聚类中的评价

2011 International Conference on Computational Science and Its Applications Pub Date : 2011-06-20 DOI:10.1109/ICCSA.2011.57

Jihn-Chang J. Jehng, Shihchieh Chou, Chin-Yi Cheng, J. Heh

{"title":"基于形式概念分析的文档向量在文档聚类中的评价","authors":"Jihn-Chang J. Jehng, Shihchieh Chou, Chin-Yi Cheng, J. Heh","doi":"10.1109/ICCSA.2011.57","DOIUrl":null,"url":null,"abstract":"In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.","PeriodicalId":428638,"journal":{"name":"2011 International Conference on Computational Science and Its Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Evaluation of the Formal Concept Analysis-Based Document Vector on Document Clustering\",\"authors\":\"Jihn-Chang J. Jehng, Shihchieh Chou, Chin-Yi Cheng, J. Heh\",\"doi\":\"10.1109/ICCSA.2011.57\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.\",\"PeriodicalId\":428638,\"journal\":{\"name\":\"2011 International Conference on Computational Science and Its Applications\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Computational Science and Its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSA.2011.57\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Computational Science and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSA.2011.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在传统方法中，文档由向量表示，其维数相当于从文档集中提取的术语。这些方法被称为术语袋方法，忽略了术语之间的概念关系，如同义词、上义词和下义词。在过去的研究中，已经使用了像Word Net这样的词库来解决这个问题。然而，像Word Net这样的词典是为通用目的而开发的，在特定领域受到限制。因此，需要一个自动构建的术语本体。在之前的研究中，我们提出了一种利用形式概念分析(FCA)这一自动本体构建方法，从文档集中提取术语关系，然后将提取的信息作为术语本体，将文档表示为概念向量的方法。为了评估该方法在信息检索相关应用中的可用性和有效性，我们将为文档生成的概念向量用于文档聚类。在本研究中，我们采用了等分k均值聚类和分层聚类作为平台来评估我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Evaluation of the Formal Concept Analysis-Based Document Vector on Document Clustering

In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Computational Science and Its Applications

自引率

0.00%

发文量