A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature

M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi
{"title":"A Machine Learning-Based Tool for Exploring COVID-19 Scientific Literature","authors":"M. Allaoui, Nour El-Houda Sayah Ben Aissa, Abdellah Ben Belghith, M. L. Kherfi","doi":"10.1109/ICRAMI52622.2021.9585958","DOIUrl":null,"url":null,"abstract":"The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.","PeriodicalId":440750,"journal":{"name":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAMI52622.2021.9585958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The advent of the COVID-19 pandemic caused by the Sars-CoV2 virus has caused serious damage in different areas. This has prompted thousands of researchers from different disciplines (biology, medicine, artificial intelligence, economics, etc.) to publish a very large number of scientific articles in a very short period, to answer questions related to this pandemic. This abundance of literature, however, raised another problem. It has indeed become extremely difficult for a researcher or a decision-maker to stay up to date with the latest scientific advances or to locate scientific articles related to a specific aspect of this pandemic. In this paper, we present an intelligent tool based on Machine learning, which automatically organizes a large dataset of Covid-19 related scientific literature and visualizes them in a way that helps these people navigating easily through this dataset and locating the sought documents easily. The documents are first pre-processed and transformed into numerical features. Then, those features are passed through a deep denoising autoencoder followed by Uniform Manifold Approximation and Projection technique (UMAP) to reduce their dimensionality into a 2D space. The projected data are then clustered with Agglomerative Clustering Algorithm. This is followed by a topic modeling step which we performed using Latent Dirichlet Allocation (LDA), in order to assign a label to each cluster. Finally, the documents are visualized to the user in an interactive interface that we developed. The experiments we conducted proved that our tool is efficient and useful.
基于机器学习的COVID-19科学文献探索工具
由Sars-CoV2病毒引起的COVID-19大流行的到来,在不同地区造成了严重破坏。这促使来自不同学科(生物学、医学、人工智能、经济学等)的数千名研究人员在很短的时间内发表了大量的科学文章,以回答与此次大流行有关的问题。然而,如此丰富的文献也带来了另一个问题。对于研究人员或决策者来说,跟上最新的科学进展或找到与这一流行病的特定方面有关的科学文章确实变得极其困难。在本文中,我们提出了一种基于机器学习的智能工具,该工具可以自动组织与Covid-19相关的科学文献的大型数据集,并以一种帮助这些人轻松浏览该数据集并轻松定位所需文档的方式将其可视化。首先对文件进行预处理并转换为数值特征。然后,将这些特征通过深度去噪自动编码器,然后使用均匀流形逼近和投影技术(UMAP)将其降维到二维空间。然后用聚类聚类算法对投影数据进行聚类。接下来是主题建模步骤,我们使用潜狄利克雷分配(Latent Dirichlet Allocation, LDA)执行该步骤,以便为每个集群分配一个标签。最后,这些文档在我们开发的交互界面中显示给用户。我们所做的实验证明了我们的工具是高效和有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信