基于形式概念分析的文档向量在文档聚类中的评价

Jihn-Chang J. Jehng, Shihchieh Chou, Chin-Yi Cheng, J. Heh
{"title":"基于形式概念分析的文档向量在文档聚类中的评价","authors":"Jihn-Chang J. Jehng, Shihchieh Chou, Chin-Yi Cheng, J. Heh","doi":"10.1109/ICCSA.2011.57","DOIUrl":null,"url":null,"abstract":"In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.","PeriodicalId":428638,"journal":{"name":"2011 International Conference on Computational Science and Its Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Evaluation of the Formal Concept Analysis-Based Document Vector on Document Clustering\",\"authors\":\"Jihn-Chang J. Jehng, Shihchieh Chou, Chin-Yi Cheng, J. Heh\",\"doi\":\"10.1109/ICCSA.2011.57\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.\",\"PeriodicalId\":428638,\"journal\":{\"name\":\"2011 International Conference on Computational Science and Its Applications\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Computational Science and Its Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSA.2011.57\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Computational Science and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSA.2011.57","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在传统方法中,文档由向量表示,其维数相当于从文档集中提取的术语。这些方法被称为术语袋方法,忽略了术语之间的概念关系,如同义词、上义词和下义词。在过去的研究中,已经使用了像Word Net这样的词库来解决这个问题。然而,像Word Net这样的词典是为通用目的而开发的,在特定领域受到限制。因此,需要一个自动构建的术语本体。在之前的研究中,我们提出了一种利用形式概念分析(FCA)这一自动本体构建方法,从文档集中提取术语关系,然后将提取的信息作为术语本体,将文档表示为概念向量的方法。为了评估该方法在信息检索相关应用中的可用性和有效性,我们将为文档生成的概念向量用于文档聚类。在本研究中,我们采用了等分k均值聚类和分层聚类作为平台来评估我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Evaluation of the Formal Concept Analysis-Based Document Vector on Document Clustering
In conventional approaches, documents are represented by the vector whose dimensionalities are equivalent to the terms extracted from a document set. These approaches, called bag-of-term approaches, ignore the conceptual relationships between terms such as synonyms, hypernyms and hyponyms. In the past, researches have applied thesauri such as Word Net to solve this problem. However, thesauri such as Word Net are developed more for general purposes and are limited in specific domain. Therefore, an automatically built ontology for terms is desired. In our previous study, we proposed a method which applies formal concept analysis (FCA), an automatic ontology building method, to extract the term relationships from a document set, and then apply the extracted information as the ontology of terms to represent the documents as concept vectors. In order to evaluate the usability and effectiveness of the proposed method for information retrieval related applications, we employed the concept vectors generated for the documents to the document clustering. In this study, we apply bisecting k-means clustering and hierarchical agglomerative clustering as the platforms with which to evaluate our method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信