Data Technologies and Applications最新文献

筛选
英文 中文
Modular framework for similarity-based dataset discovery using external knowledge 使用外部知识进行基于相似性的数据集发现的模块化框架
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-02-15 DOI: 10.1108/dta-09-2021-0261
M. Nečaský, P. Škoda, D. Bernhauer, Jakub Klímek, T. Skopal
{"title":"Modular framework for similarity-based dataset discovery using external knowledge","authors":"M. Nečaský, P. Škoda, D. Bernhauer, Jakub Klímek, T. Skopal","doi":"10.1108/dta-09-2021-0261","DOIUrl":"https://doi.org/10.1108/dta-09-2021-0261","url":null,"abstract":"PurposeSemantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.Design/methodology/approachIn this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.FindingsThe study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.Originality/valueTo the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"38 1","pages":"506-535"},"PeriodicalIF":1.6,"publicationDate":"2022-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78121200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Social recruiting: an application of social network analysis for preselection of candidates 社会招聘:社会网络分析在候选人预选中的应用
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-02-14 DOI: 10.1108/dta-01-2021-0021
Stevan Milovanović, Z. Bogdanović, A. Labus, M. Despotović-Zrakić, Svetlana Mitrovic
{"title":"Social recruiting: an application of social network analysis for preselection of candidates","authors":"Stevan Milovanović, Z. Bogdanović, A. Labus, M. Despotović-Zrakić, Svetlana Mitrovic","doi":"10.1108/dta-01-2021-0021","DOIUrl":"https://doi.org/10.1108/dta-01-2021-0021","url":null,"abstract":"PurposeThe paper aims to studiy social recruiting for finding suitable candidates on social networks. The main goal is to develop a methodological approach that would enable preselection of candidates using social network analysis. The research focus is on the automated collection of data using the web scraping method. Based on the information collected from the users' profiles, three clusters of skills and interests are created: technical, empirical and education-based. The identified clusters enable the recruiter to effectively search for suitable candidates.Design/methodology/approachThis paper proposes a new methodological approach for the preselection of candidates based on social network analysis (SNA). The defined methodological approach includes the following phases: Social network selection according to the defined preselection goals; Automatic data collection from the selected social network using the web scraping method; Filtering, processing and statistical analysis of data. Data analysis to identify relevant information for the preselection of candidates using attributes clustering and SNA. Preselection of candidates is based on the information obtained.FindingsIt is possible to contribute to candidate preselection in the recruiting process by identifying key categories of skills and interests of candidates. Using a defined methodological approach allows recruiters to identify candidates who possess the skills and interests defined by the search. A defined method automates the verification of the existence, or absence, of a particular category of skills or interests on the profiles of the potential candidates. The primary intention is reflected in the screening and filtering of the skills and interests of potential candidates, which contributes to a more effective preselection process.Research limitations/implicationsA small sample of the participants is present in the preliminary evaluation. A manual revision of the collected skills and interests is conducted. The recruiters should have basic knowledge of the SNA methodology in order to understand its application in the described method. The reliability of the collected data is assessed, because users provide data themselves when filling out their social network profiles.Practical implicationsThe presented method could be applied on different social networks, such as GitHub or AngelList for clustering profile skills. For a different social network, only the web scraping instructions would change. This method is composed of mutually independent steps. This means that each step can be implemented differently, without changing the whole process. The results of a pilot project evaluation indicate that the HR experts are interested in the proposed method and that they would be willing to include it in their practice.Social implicationsThe social implication should be the determination of relevant skills and interests during the preselection phase of candidates in the process of social re","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"58 1","pages":"536-557"},"PeriodicalIF":1.6,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83579165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring the effectiveness of word embedding based deep learning model for improving email classification 探索基于词嵌入的深度学习模型改进电子邮件分类的有效性
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-02-02 DOI: 10.1108/dta-07-2021-0191
D. Asudani, N. K. Nagwani, Pradeep Singh
{"title":"Exploring the effectiveness of word embedding based deep learning model for improving email classification","authors":"D. Asudani, N. K. Nagwani, Pradeep Singh","doi":"10.1108/dta-07-2021-0191","DOIUrl":"https://doi.org/10.1108/dta-07-2021-0191","url":null,"abstract":"PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.Design/methodology/approachIn this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.FindingsIn the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.Originality/valueThe experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"95 1","pages":"483-505"},"PeriodicalIF":1.6,"publicationDate":"2022-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79693644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising 基于网络广告用户点击数据的特征提取与累积选择,实现虚假发布者自动分类
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-01-06 DOI: 10.1108/dta-09-2021-0233
D. Sisodia, Dilip Singh Sisodia
{"title":"Feature distillation and accumulated selection for automated fraudulent publisher classification from user click data of online advertising","authors":"D. Sisodia, Dilip Singh Sisodia","doi":"10.1108/dta-09-2021-0233","DOIUrl":"https://doi.org/10.1108/dta-09-2021-0233","url":null,"abstract":"PurposeThe problem of choosing the utmost useful features from hundreds of features from time-series user click data arises in online advertising toward fraudulent publisher's classification. Selecting feature subsets is a key issue in such classification tasks. Practically, the use of filter approaches is common; however, they neglect the correlations amid features. Conversely, wrapper approaches could not be applied due to their complexities. Moreover, in particular, existing feature selection methods could not handle such data, which is one of the major causes of instability of feature selection.Design/methodology/approachTo overcome such issues, a majority voting-based hybrid feature selection method, namely feature distillation and accumulated selection (FDAS), is proposed to investigate the optimal subset of relevant features for analyzing the publisher's fraudulent conduct. FDAS works in two phases: (1) feature distillation, where significant features from standard filter and wrapper feature selection methods are obtained using majority voting; (2) accumulated selection, where we enumerated an accumulated evaluation of relevant feature subset to search for an optimal feature subset using effective machine learning (ML) models.FindingsEmpirical results prove enhanced classification performance with proposed features in average precision, recall, f1-score and AUC in publisher identification and classification.Originality/valueThe FDAS is evaluated on FDMA2012 user-click data and nine other benchmark datasets to gauge its generalizing characteristics, first, considering original features, second, with relevant feature subsets selected by feature selection (FS) methods, third, with optimal feature subset obtained by the proposed approach. ANOVA significance test is conducted to demonstrate significant differences between independent features.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"101 1","pages":"602-625"},"PeriodicalIF":1.6,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73733228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Techniques to detect terrorists/extremists on the dark web: a review 在暗网上发现恐怖分子/极端分子的技术:综述
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-01-06 DOI: 10.1108/dta-07-2021-0177
H. Alghamdi, A. Selamat
{"title":"Techniques to detect terrorists/extremists on the dark web: a review","authors":"H. Alghamdi, A. Selamat","doi":"10.1108/dta-07-2021-0177","DOIUrl":"https://doi.org/10.1108/dta-07-2021-0177","url":null,"abstract":"PurposeWith the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.Design/methodology/approachThis study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.FindingsBased on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.Originality/valueAt the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"434 1","pages":"461-482"},"PeriodicalIF":1.6,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86851666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Artificial intelligence technologies for more flexible recommendation in uniforms 人工智能技术更灵活地推荐制服
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-01-04 DOI: 10.1108/dta-09-2021-0230
Chih-Hao Wen, Chih-Chan Cheng, Y. Shih
{"title":"Artificial intelligence technologies for more flexible recommendation in uniforms","authors":"Chih-Hao Wen, Chih-Chan Cheng, Y. Shih","doi":"10.1108/dta-09-2021-0230","DOIUrl":"https://doi.org/10.1108/dta-09-2021-0230","url":null,"abstract":"PurposeThis research aims to collect human body variables via 2D images captured by digital cameras. Based on those human variables, the forecast and recommendation of the Digital Camouflage Uniforms (DCU) for Taiwan's military personnel are made.Design/methodology/approachA total of 375 subjects are recruited (male: 253; female: 122). In this study, OpenPose converts the photographed 2D images into four body variables, which are compared with those of a tape measure and 3D scanning simultaneously. Then, the recommendation model of the DCU is built by the decision tree. Meanwhile, the Euclidean distance of each size of the DCU in the manufacturing specification is calculated as the best three recommendations.FindingsThe recommended size established by the decision tree is only 0.62 and 0.63. However, for the recommendation result of the best three options, the DCU Fitting Score can be as high as 0.8 or more. The results of OpenPose and 3D scanning have the highest correlation coefficient even though the method of measuring body size is different. This result confirms that OpenPose has significant measurement validity. That is, inexpensive equipment can be used to obtain reasonable results.Originality/valueIn general, the method proposed in this study is suitable for applications in e-commerce and the apparel industry in a long-distance, non-contact and non-pre-labeled manner when the world is facing Covid-19. In particular, it can reduce the measurement troubles of ordinary users when purchasing clothing online.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"13 1","pages":"626-643"},"PeriodicalIF":1.6,"publicationDate":"2022-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75047026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Credit default swap prediction based on generative adversarial networks 基于生成对抗网络的信用违约互换预测
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2022-01-01 DOI: 10.1108/DTA-09-2021-0260
Shu-Ying Lin, Duen-Ren Liu, Hsien-Pin Huang
{"title":"Credit default swap prediction based on generative adversarial networks","authors":"Shu-Ying Lin, Duen-Ren Liu, Hsien-Pin Huang","doi":"10.1108/DTA-09-2021-0260","DOIUrl":"https://doi.org/10.1108/DTA-09-2021-0260","url":null,"abstract":"","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"73 1","pages":"720-740"},"PeriodicalIF":1.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87009668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing 面向大数据挖掘处理的动态分布式并行机器学习算法
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2021-12-21 DOI: 10.1108/dta-06-2021-0153
Laouni Djafri
{"title":"Dynamic Distributed and Parallel Machine Learning algorithms for big data mining processing","authors":"Laouni Djafri","doi":"10.1108/dta-06-2021-0153","DOIUrl":"https://doi.org/10.1108/dta-06-2021-0153","url":null,"abstract":"PurposeThis work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.Design/methodology/approachIn the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.FindingsThe authors got very satisfactory classification results.Originality/valueDDPML system is specially designed to smoothly handle big data mining classification.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"59 1","pages":"558-601"},"PeriodicalIF":1.6,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78852754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A robust framework for shoulder implant X-ray image classification 一个强健的肩部植入物x线图像分类框架
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2021-11-30 DOI: 10.1108/dta-08-2021-0210
M. Vo, Anh H. Vo, Tuong Le
{"title":"A robust framework for shoulder implant X-ray image classification","authors":"M. Vo, Anh H. Vo, Tuong Le","doi":"10.1108/dta-08-2021-0210","DOIUrl":"https://doi.org/10.1108/dta-08-2021-0210","url":null,"abstract":"PurposeMedical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.Design/methodology/approachThis study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.FindingsExperiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.Originality/valueThe proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"2 1","pages":"447-460"},"PeriodicalIF":1.6,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83468297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A novel semi-supervised self-training method based on resampling for Twitter fake account identification 一种基于重采样的半监督自训练方法用于Twitter虚假账户识别
IF 1.6 4区 计算机科学
Data Technologies and Applications Pub Date : 2021-11-29 DOI: 10.1108/dta-07-2021-0196
Ziming Zeng, Tingting Li, Shouqiang Sun, Jingjing Sun, Jie Yin
{"title":"A novel semi-supervised self-training method based on resampling for Twitter fake account identification","authors":"Ziming Zeng, Tingting Li, Shouqiang Sun, Jingjing Sun, Jie Yin","doi":"10.1108/dta-07-2021-0196","DOIUrl":"https://doi.org/10.1108/dta-07-2021-0196","url":null,"abstract":"PurposeTwitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective identification of bot accounts is conducive to accurately judge the disseminated information for the public. However, in actual fake account identification, it is expensive and inefficient to manually label Twitter accounts, and the labeled data are usually unbalanced in classes. To this end, the authors propose a novel framework to solve these problems.Design/methodology/approachIn the proposed framework, the authors introduce the concept of semi-supervised self-training learning and apply it to the real Twitter account data set from Kaggle. Specifically, the authors first train the classifier in the initial small amount of labeled account data, then use the trained classifier to automatically label large-scale unlabeled account data. Next, iteratively select high confidence instances from unlabeled data to expand the labeled data. Finally, an expanded Twitter account training set is obtained. It is worth mentioning that the resampling technique is integrated into the self-training process, and the data class is balanced at the initial stage of the self-training iteration.FindingsThe proposed framework effectively improves labeling efficiency and reduces the influence of class imbalance. It shows excellent identification results on 6 different base classifiers, especially for the initial small-scale labeled Twitter accounts.Originality/valueThis paper provides novel insights in identifying Twitter fake accounts. First, the authors take the lead in introducing a self-training method to automatically label Twitter accounts from the semi-supervised background. Second, the resampling technique is integrated into the self-training process to effectively reduce the influence of class imbalance on the identification effect.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"94 1","pages":"409-428"},"PeriodicalIF":1.6,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91039787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信