Hybrid categorical expert system for the use in content aggregation

Программные системы и вычислительные методы Pub Date : 2021-04-01 DOI:10.7256/2454-0714.2021.4.37019

Denis Aleksandrovich Kiryanov

{"title":"Hybrid categorical expert system for the use in content aggregation","authors":"Denis Aleksandrovich Kiryanov","doi":"10.7256/2454-0714.2021.4.37019","DOIUrl":null,"url":null,"abstract":"\n The subject of this research is the development of the architecture of expert system for distributed content aggregation system, the main purpose of which is the categorization of aggregated data. The author examines the advantages and disadvantages of expert systems, toolset for development of expert systems, classification of expert systems, as well as application of expert systems for categorization of data. Special attention is given to the description of architecture of the proposed expert system, which consists of spam filter, component for determination of the main category for each type of the processed content, and components for determination of subcategories, one of which is based on the domain rules, and the other uses the methods of machine learning methods and complements the first one. The conclusion is made that expert system can be effectively applied for solution of the problems of categorization of data in the content aggregation systems. The author establishes that hybrid solutions, which combine an approach based on the use of knowledge base and rules with implementation of neural networks allow reducing the cost of the expert system. The novelty of this research lies in the proposed architecture of the system, which is easily extensible and adaptable to workloads by scaling existing modules or adding new ones. The proposed module for spam detection leans on adapting the behavioral algorithm for detecting spam in emails; the proposed module for determination of the key categories of content uses two types of algorithms: fuzzy fingerprints and Twitter topic fuzzy fingerprints that was initially applied for categorization of messages in the social network Twitter. The module that determine subcategory based on the keywords functions in interaction with the thesaurus database. The latter classifier uses the reference vector algorithm for the final determination of subcategories.\n","PeriodicalId":155484,"journal":{"name":"Программные системы и вычислительные методы","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Программные системы и вычислительные методы","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7256/2454-0714.2021.4.37019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The subject of this research is the development of the architecture of expert system for distributed content aggregation system, the main purpose of which is the categorization of aggregated data. The author examines the advantages and disadvantages of expert systems, toolset for development of expert systems, classification of expert systems, as well as application of expert systems for categorization of data. Special attention is given to the description of architecture of the proposed expert system, which consists of spam filter, component for determination of the main category for each type of the processed content, and components for determination of subcategories, one of which is based on the domain rules, and the other uses the methods of machine learning methods and complements the first one. The conclusion is made that expert system can be effectively applied for solution of the problems of categorization of data in the content aggregation systems. The author establishes that hybrid solutions, which combine an approach based on the use of knowledge base and rules with implementation of neural networks allow reducing the cost of the expert system. The novelty of this research lies in the proposed architecture of the system, which is easily extensible and adaptable to workloads by scaling existing modules or adding new ones. The proposed module for spam detection leans on adapting the behavioral algorithm for detecting spam in emails; the proposed module for determination of the key categories of content uses two types of algorithms: fuzzy fingerprints and Twitter topic fuzzy fingerprints that was initially applied for categorization of messages in the social network Twitter. The module that determine subcategory based on the keywords functions in interaction with the thesaurus database. The latter classifier uses the reference vector algorithm for the final determination of subcategories.

查看原文本刊更多论文

用于内容聚合的混合分类专家系统

本研究的主题是分布式内容聚合系统的专家系统体系结构的开发，其主要目的是对聚合数据进行分类。作者探讨了专家系统的优缺点，开发专家系统的工具集，专家系统的分类，以及专家系统在数据分类中的应用。特别注意了所提出的专家系统的体系结构描述，该系统由垃圾邮件过滤器、用于确定每种处理内容的主要类别的组件和用于确定子类别的组件组成，其中一个基于领域规则，另一个使用机器学习方法并补充了第一个方法。结果表明，专家系统可以有效地解决内容聚合系统中的数据分类问题。作者建立了混合解决方案，将基于知识库和规则的方法与神经网络的实现相结合，可以降低专家系统的成本。本研究的新颖之处在于提出的系统架构，通过扩展现有模块或添加新模块，可以轻松扩展和适应工作负载。本文提出的垃圾邮件检测模块采用行为算法对电子邮件中的垃圾邮件进行检测;本文提出的用于确定内容关键类别的模块使用两种类型的算法:模糊指纹和Twitter主题模糊指纹，这两种算法最初用于对社交网络Twitter中的消息进行分类。根据关键字确定子类别的模块与同义词库数据库交互。后一种分类器使用参考向量算法来最终确定子类别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Программные системы и вычислительные методы

自引率

0.00%

发文量