{"title":"基于RDF的面向语义的文本聚类","authors":"Soukaina Fatimi, Chama El Saili, L. Alaoui","doi":"10.1109/ISCV49265.2020.9204133","DOIUrl":null,"url":null,"abstract":"Text clustering is the discipline that purports to find related groups in a collection of documents. Based on text clustering the use of documents can be more salubrious. Researchers have used various methods to implement text clustering either agglomerative, divisive, or itemsets-based clustering. Most of these proposed approaches do not take into account the semantic relationships between words, in this case, the documents are considered only as bags of unrelated words. Our work aims to consider the semantics of the text phrases in the clustering task, and to get full usage and exploitation of documents. The semantic web concept is overloaded with valuable techniques allowing the significant use of documents. Our goal is to take full advantage of these techniques. Using the Resource Description Framework (RDF) to represent textual data as triplets. They provide a semantic representation of data on which the clustering process will be based, to provide a more efficient clustering system. On the other hand, and based on the clustering process, we opt on incorporating other techniques such as ontology representation using RDF, RDF Schemas (RDFS), and Web Ontology Language (OWL) to manipulate and extract meaningful information. In this paper, we propose a framework of semantic oriented text clustering based on RDF by the means of a semantic similarity measure, and we highlight the benefits of using semantic web techniques in clustering, topic modeling, and information extraction based on questioning, reasoning and inferencing processes.","PeriodicalId":313743,"journal":{"name":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic Oriented Text Clustering Based on RDF\",\"authors\":\"Soukaina Fatimi, Chama El Saili, L. Alaoui\",\"doi\":\"10.1109/ISCV49265.2020.9204133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text clustering is the discipline that purports to find related groups in a collection of documents. Based on text clustering the use of documents can be more salubrious. Researchers have used various methods to implement text clustering either agglomerative, divisive, or itemsets-based clustering. Most of these proposed approaches do not take into account the semantic relationships between words, in this case, the documents are considered only as bags of unrelated words. Our work aims to consider the semantics of the text phrases in the clustering task, and to get full usage and exploitation of documents. The semantic web concept is overloaded with valuable techniques allowing the significant use of documents. Our goal is to take full advantage of these techniques. Using the Resource Description Framework (RDF) to represent textual data as triplets. They provide a semantic representation of data on which the clustering process will be based, to provide a more efficient clustering system. On the other hand, and based on the clustering process, we opt on incorporating other techniques such as ontology representation using RDF, RDF Schemas (RDFS), and Web Ontology Language (OWL) to manipulate and extract meaningful information. In this paper, we propose a framework of semantic oriented text clustering based on RDF by the means of a semantic similarity measure, and we highlight the benefits of using semantic web techniques in clustering, topic modeling, and information extraction based on questioning, reasoning and inferencing processes.\",\"PeriodicalId\":313743,\"journal\":{\"name\":\"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCV49265.2020.9204133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCV49265.2020.9204133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
文本聚类是一门旨在从文档集合中找到相关组的学科。基于文本聚类的文档使用可以更加有益。研究人员使用了各种方法来实现文本聚类,包括聚类、分裂聚类和基于项集的聚类。这些建议的方法大多没有考虑词之间的语义关系,在这种情况下,文档只是被认为是不相关的词的包。我们的工作旨在在聚类任务中考虑文本短语的语义,并充分利用和利用文档。语义网概念包含了大量有价值的技术,允许大量使用文档。我们的目标是充分利用这些技术。使用资源描述框架(RDF)将文本数据表示为三元组。它们提供了数据的语义表示,聚类过程将以此为基础,从而提供更有效的聚类系统。另一方面,基于聚类过程,我们选择结合其他技术,如使用RDF、RDF schema (RDFS)和Web ontology Language (OWL)的本体表示来操作和提取有意义的信息。在本文中,我们提出了一个基于RDF的基于语义相似性度量的面向语义的文本聚类框架,并强调了在聚类、主题建模和基于提问、推理和推理过程的信息提取中使用语义web技术的好处。
Text clustering is the discipline that purports to find related groups in a collection of documents. Based on text clustering the use of documents can be more salubrious. Researchers have used various methods to implement text clustering either agglomerative, divisive, or itemsets-based clustering. Most of these proposed approaches do not take into account the semantic relationships between words, in this case, the documents are considered only as bags of unrelated words. Our work aims to consider the semantics of the text phrases in the clustering task, and to get full usage and exploitation of documents. The semantic web concept is overloaded with valuable techniques allowing the significant use of documents. Our goal is to take full advantage of these techniques. Using the Resource Description Framework (RDF) to represent textual data as triplets. They provide a semantic representation of data on which the clustering process will be based, to provide a more efficient clustering system. On the other hand, and based on the clustering process, we opt on incorporating other techniques such as ontology representation using RDF, RDF Schemas (RDFS), and Web Ontology Language (OWL) to manipulate and extract meaningful information. In this paper, we propose a framework of semantic oriented text clustering based on RDF by the means of a semantic similarity measure, and we highlight the benefits of using semantic web techniques in clustering, topic modeling, and information extraction based on questioning, reasoning and inferencing processes.