A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor

Information Pub Date : 2024-05-24 DOI:10.3390/info15060303
Zongfeng Zou, Xiaochen Ji, Yingying Li
{"title":"A Framework Model of Mining Potential Public Opinion Events Pertaining to Suspected Research Integrity Issues with the Text Convolutional Neural Network Model and a Mixed Event Extractor","authors":"Zongfeng Zou, Xiaochen Ji, Yingying Li","doi":"10.3390/info15060303","DOIUrl":null,"url":null,"abstract":"With the development of the Internet, the oversight of research integrity issues has extended beyond the scientific community to encompass the whole of society. If these issues are not addressed promptly, they can significantly impact the research credibility of both institutions and scholars. This article proposes a text convolutional neural network based on SMOTE to identify short texts of potential public opinion events related to suspected scientific integrity issues from common short texts. The SMOTE comprehensive sampling technique is employed to handle imbalanced datasets. To mitigate the impact of short text length on text representation quality, the Doc2vec embedding model is utilized to represent short text, yielding a one-dimensional dense vector. Additionally, the dimensions of the input layer and convolution kernel of TextCNN are adjusted. Subsequently, a short text event extraction model based on TF-IDF and TextRank is proposed to extract crucial information, for instance, names and research-related institutions, from events and facilitate the identification of potential public opinion events related to suspected scientific integrity issues. Results of experiments have demonstrated that utilizing SMOTE to balance the dataset is able to improve the classification results of TextCNN classifiers. Compared to traditional classifiers, TextCNN exhibits greater robustness in addressing the problems of imbalanced datasets. However, challenges such as low information content, non-standard writing, and polysemy in short texts may impact the accuracy of event extraction. The framework can be further optimized to address these issues in the future.","PeriodicalId":510156,"journal":{"name":"Information","volume":"2 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/info15060303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the development of the Internet, the oversight of research integrity issues has extended beyond the scientific community to encompass the whole of society. If these issues are not addressed promptly, they can significantly impact the research credibility of both institutions and scholars. This article proposes a text convolutional neural network based on SMOTE to identify short texts of potential public opinion events related to suspected scientific integrity issues from common short texts. The SMOTE comprehensive sampling technique is employed to handle imbalanced datasets. To mitigate the impact of short text length on text representation quality, the Doc2vec embedding model is utilized to represent short text, yielding a one-dimensional dense vector. Additionally, the dimensions of the input layer and convolution kernel of TextCNN are adjusted. Subsequently, a short text event extraction model based on TF-IDF and TextRank is proposed to extract crucial information, for instance, names and research-related institutions, from events and facilitate the identification of potential public opinion events related to suspected scientific integrity issues. Results of experiments have demonstrated that utilizing SMOTE to balance the dataset is able to improve the classification results of TextCNN classifiers. Compared to traditional classifiers, TextCNN exhibits greater robustness in addressing the problems of imbalanced datasets. However, challenges such as low information content, non-standard writing, and polysemy in short texts may impact the accuracy of event extraction. The framework can be further optimized to address these issues in the future.
利用文本卷积神经网络模型和混合事件提取器挖掘涉嫌科研诚信问题潜在舆情事件的框架模型
随着互联网的发展,对研究诚信问题的监督已经超出了科学界的范围,涵盖了整个社会。如果不及时处理这些问题,会严重影响科研机构和学者的科研信誉。本文提出了一种基于SMOTE的文本卷积神经网络,从普通短文中识别出与疑似科研诚信问题相关的潜在舆情事件短文。本文采用 SMOTE 综合采样技术来处理不平衡数据集。为减轻短文长度对文本表示质量的影响,利用 Doc2vec 嵌入模型表示短文,产生一维密集向量。此外,还调整了 TextCNN 输入层和卷积核的维度。随后,提出了一种基于 TF-IDF 和 TextRank 的短文本事件提取模型,以从事件中提取关键信息,如姓名和研究相关机构,并帮助识别与疑似科学诚信问题相关的潜在舆情事件。实验结果表明,利用 SMOTE 平衡数据集能够改善 TextCNN 分类器的分类结果。与传统分类器相比,TextCNN 在解决不平衡数据集问题方面表现出更强的鲁棒性。然而,短文中的低信息含量、非标准写作和多义词等挑战可能会影响事件提取的准确性。未来可以进一步优化该框架,以解决这些问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信