A corpus-based real-time text classification and tagging approach for social data

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS

ACS Applied Bio Materials Pub Date : 2024-03-13 DOI:10.3389/fcomp.2024.1294985

A. Memon, Dileep Kumar Sootahar, K. K. Luhana, Kyrill Meyer

{"title":"A corpus-based real-time text classification and tagging approach for social data","authors":"A. Memon, Dileep Kumar Sootahar, K. K. Luhana, Kyrill Meyer","doi":"10.3389/fcomp.2024.1294985","DOIUrl":null,"url":null,"abstract":"With the rapid accumulation of large amounts of user-generated content through social media, social data reuse and integration have gained increasing attention recently. This has made it almost obsolete for software applications to collect, store, and work with their own data stored on local servers. While, with the provision of Application Programming Interfaces from the leading social networking sites, data acquisition and integration has become possible, the meaningful usage of such unstructured, non-uniform, and incoherent data collections needs special procedures of data summarization, understanding, and visualization. One particular aspect in this regard that needs special attention is the procedures for data (text snippets in the form of social media posts) categorization and concept tagging to filter out the relevant and most suitable data for the particular audience and for the particular purpose. In this regard, we propose a corpus-based approach for searching and successively categorizing and tagging the social data with relevant concepts in real time. The proposed approach is capable of addressing the semantical and morphological similarities, as well as domain-specific vocabularies of query strings and tagged concepts. We demonstrate the feasibility and application of our proposed approach in a web-based tool that allows searching Facebook posts and provides search results together with a concept map for further navigation, filtering, and refining of search results. The tool has been evaluated by performing multiple search queries, and resultant concept maps and annotated texts are analyzed in terms of their precision. The approach is thereby found effective in achieving its stated goal of classifying text snippets in real time.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"22 5","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2024.1294985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid accumulation of large amounts of user-generated content through social media, social data reuse and integration have gained increasing attention recently. This has made it almost obsolete for software applications to collect, store, and work with their own data stored on local servers. While, with the provision of Application Programming Interfaces from the leading social networking sites, data acquisition and integration has become possible, the meaningful usage of such unstructured, non-uniform, and incoherent data collections needs special procedures of data summarization, understanding, and visualization. One particular aspect in this regard that needs special attention is the procedures for data (text snippets in the form of social media posts) categorization and concept tagging to filter out the relevant and most suitable data for the particular audience and for the particular purpose. In this regard, we propose a corpus-based approach for searching and successively categorizing and tagging the social data with relevant concepts in real time. The proposed approach is capable of addressing the semantical and morphological similarities, as well as domain-specific vocabularies of query strings and tagged concepts. We demonstrate the feasibility and application of our proposed approach in a web-based tool that allows searching Facebook posts and provides search results together with a concept map for further navigation, filtering, and refining of search results. The tool has been evaluated by performing multiple search queries, and resultant concept maps and annotated texts are analyzed in terms of their precision. The approach is thereby found effective in achieving its stated goal of classifying text snippets in real time.

查看原文本刊更多论文

基于语料库的社交数据实时文本分类和标记方法

随着大量用户通过社交媒体产生的内容迅速积累，社交数据的重用和整合近来日益受到关注。这使得软件应用程序收集、存储和处理自己存储在本地服务器上的数据的方式几乎已经过时。虽然主要社交网站提供了应用编程接口，使数据采集和整合成为可能，但要有效利用这些非结构化、非统一和不连贯的数据集合，还需要特殊的数据汇总、理解和可视化程序。在这方面，需要特别注意的一个方面是数据（社交媒体帖子形式的文本片段）分类和概念标记程序，以筛选出最适合特定受众和特定目的的相关数据。为此，我们提出了一种基于语料库的方法，用于实时搜索、连续分类和标记具有相关概念的社交数据。所提出的方法能够解决查询字符串和标记概念的语义和形态相似性以及特定领域词汇的问题。我们在一个基于网络的工具中演示了所提方法的可行性和应用，该工具允许搜索 Facebook 帖子，并提供搜索结果和概念图，以便进一步导航、过滤和完善搜索结果。通过执行多个搜索查询对该工具进行了评估，并从精确度的角度对搜索结果概念图和注释文本进行了分析。结果发现，该方法能有效地实现实时对文本片段进行分类的既定目标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Applied Bio Materials Chemistry-Chemistry (all)

CiteScore

9.40

自引率

2.10%

发文量

464

期刊介绍： ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.