基于概率分类器的文本挖掘知识提取

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering Pub Date : 2013-04-15 DOI:10.1109/ICPRIME.2013.6496517

S. Subbaiah

{"title":"基于概率分类器的文本挖掘知识提取","authors":"S. Subbaiah","doi":"10.1109/ICPRIME.2013.6496517","DOIUrl":null,"url":null,"abstract":"Text mining is a process of extracting knowledge from large text documents. A new probabilistic classifier for text mining is proposed in this paper. It uses ODP taxonomy and domain ontology and datasets to cluster and identify the category of the given text document. The proposed work has three steps, namely, preprocessing, rule generation and probability calculation. At the stage of preprocessing the input document is split into paragraphs and statements. In rule generation, the documents from the training set are read. In probability calculation, positive and negative weight factor is calculated. The proposed algorithm calculates the positive probability value and negative probability value for each term set or pattern identified from the document. Based on the calculated probability value the probabilistic classifier indexes the document to the concern group of the cluster.","PeriodicalId":123210,"journal":{"name":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Extracting knowledge using probabilistic classifier for text mining\",\"authors\":\"S. Subbaiah\",\"doi\":\"10.1109/ICPRIME.2013.6496517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text mining is a process of extracting knowledge from large text documents. A new probabilistic classifier for text mining is proposed in this paper. It uses ODP taxonomy and domain ontology and datasets to cluster and identify the category of the given text document. The proposed work has three steps, namely, preprocessing, rule generation and probability calculation. At the stage of preprocessing the input document is split into paragraphs and statements. In rule generation, the documents from the training set are read. In probability calculation, positive and negative weight factor is calculated. The proposed algorithm calculates the positive probability value and negative probability value for each term set or pattern identified from the document. Based on the calculated probability value the probabilistic classifier indexes the document to the concern group of the cluster.\",\"PeriodicalId\":123210,\"journal\":{\"name\":\"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPRIME.2013.6496517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2013.6496517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

文本挖掘是一种从大型文本文档中提取知识的过程。提出了一种新的用于文本挖掘的概率分类器。它使用ODP分类法、领域本体和数据集对给定文本文档进行聚类和分类。本文的工作分为预处理、规则生成和概率计算三个步骤。在预处理阶段，输入文档被分成段落和语句。在规则生成中，从训练集中读取文档。在概率计算中，计算正负权因子。所提出的算法计算从文档中识别的每个术语集或模式的正概率值和负概率值。根据计算出的概率值，概率分类器将文档索引到聚类的关注组中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extracting knowledge using probabilistic classifier for text mining

Text mining is a process of extracting knowledge from large text documents. A new probabilistic classifier for text mining is proposed in this paper. It uses ODP taxonomy and domain ontology and datasets to cluster and identify the category of the given text document. The proposed work has three steps, namely, preprocessing, rule generation and probability calculation. At the stage of preprocessing the input document is split into paragraphs and statements. In rule generation, the documents from the training set are read. In probability calculation, positive and negative weight factor is calculated. The proposed algorithm calculates the positive probability value and negative probability value for each term set or pattern identified from the document. Based on the calculated probability value the probabilistic classifier indexes the document to the concern group of the cluster.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering

自引率

0.00%

发文量