Machine learning with applications最新文献_第10页

Spam detection for Youtube video comments using machine learning approaches 利用机器学习方法检测 Youtube 视频评论中的垃圾信息

Machine learning with applications Pub Date : 2024-04-16 DOI: 10.1016/j.mlwa.2024.100550

Andrew S. Xiao , Qilian Liang

{"title":"Spam detection for Youtube video comments using machine learning approaches","authors":"Andrew S. Xiao , Qilian Liang","doi":"10.1016/j.mlwa.2024.100550","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100550","url":null,"abstract":"<div><p>Machine Learning models have the ability to streamline the process by which Youtube video comments are filtered between legitimate comments (ham) and spam. In order to integrate machine learning models into regular usage on media-sharing platforms, recent approaches have aimed to develop models trained on Youtube comments, which have emerged as valuable tools for the classification and have enabled the identification of spam content and enhancing user experience. In this paper, eight machine learning approaches are applied to spam detection for YouTube comments. The eight machine learning models include Gaussian Naive Bayes, logistic regression, K-nearest neighbors (KNN) classifier, multi-layer perceptron (MLP), support vector machine (SVM) classifier, random forest classifier, decision tree classifier, and voting classifier. All eight models perform very well, specifically random forest approach can achieve almost perfect performance with average precision of 100% and AUC-ROC of 0.9841. The computational complexity of the eight machine learning approaches are compared.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100550"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000264/pdfft?md5=5244427dfd0f509334984878d01998e5&pid=1-s2.0-S2666827024000264-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140607052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation of streamflow predictions from LSTM models in water- and energy-limited regions in the United States 评估 LSTM 模型在美国限水和限能地区的水流预测结果

Machine learning with applications Pub Date : 2024-04-16 DOI: 10.1016/j.mlwa.2024.100551

Kul Khand , Gabriel B. Senay

{"title":"Evaluation of streamflow predictions from LSTM models in water- and energy-limited regions in the United States","authors":"Kul Khand , Gabriel B. Senay","doi":"10.1016/j.mlwa.2024.100551","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100551","url":null,"abstract":"<div><p>The application of Long Short-Term Memory (LSTM) models for streamflow predictions has been an area of rapid development, supported by advancements in computing technology, increasing availability of spatiotemporal data, and availability of historical data that allows for training data-driven LSTM models. Several studies have focused on improving the performance of LSTM models; however, few studies have assessed the applicability of these LSTM models across different hydroclimate regions. This study investigated the single-basin trained local (one model for each basin), multi-basin trained regional (one model for one region), and grand (one model for several regions) models for predicting daily streamflow in water-limited Great Basin (18 basins) and energy-limited New England (27 basins) regions in the United States using the CAMELS (Catchment Attributes and Meteorology for Large-sample Studies) data set. The results show a general pattern of higher accuracy in daily streamflow predictions from the regional model when compared to local or grand models for most basins in the New England region. For the Great Basin region, local models provided smaller errors for most basins and substantially lower for those basins with relatively larger errors from the regional and grand models. The evaluation of one-layer and three-layer LSTM network architectures trained with 1-day lag information indicates that the addition of model complexity by increasing the number of layers may not necessarily increase the model skill for improving streamflow predictions. Findings from our study highlight the strengths and limitations of LSTM models across contrasting hydroclimate regions in the United States, which could be useful for local and regional scale decisions using standalone or potential integration of data-driven LSTM models with physics-based hydrological models.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100551"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000276/pdfft?md5=dcad8cedb4b7394989a5aeeee4ccbf49&pid=1-s2.0-S2666827024000276-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140620882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning-based spatial-temporal graph neural networks for price movement classification in crude oil and precious metal markets 基于深度学习的时空图神经网络用于原油和贵金属市场价格走势分类

Machine learning with applications Pub Date : 2024-04-15 DOI: 10.1016/j.mlwa.2024.100552

Parisa Foroutan, Salim Lahmiri

{"title":"Deep learning-based spatial-temporal graph neural networks for price movement classification in crude oil and precious metal markets","authors":"Parisa Foroutan, Salim Lahmiri","doi":"10.1016/j.mlwa.2024.100552","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100552","url":null,"abstract":"<div><p>In this study, we adapt three spatial-temporal graph neural network models to the unique characteristics of crude oil, gold, and silver markets for forecasting purposes. It aims to be the first to (<em>i</em>) explore the potential of spatial-temporal graph neural networks family for price forecasting of these markets, (<em>ii</em>) examine the role of attention mechanism in improving forecasting accuracy, and (<em>iii</em>) integrate various sources of predictors for better performance. Specifically, we present three distinct models: Multivariate Time Series Graph Neural Networks with Temporal Attention and Learnable Adjacency matrix (MTGNN-TAttLA), Spatial Attention Graph with Temporal Convolutional Networks (SAG-TCN), and Attention-based Spatial-Temporal Graph Convolutional Networks (ASTGCN), to capture the intricate interplay of spatial and temporal dependencies within crude oil and precious metals markets. Moreover, the effectiveness of the attention mechanism in improving models' accuracies is shown. Our empirical results reveal remarkable prediction accuracy, with all three models outperforming conventional deep learning methods such as Temporal Convolutional Networks (TCN), long short-term memory networks (LSTM) and convolutional neural networks (CNN). The MTGNN-TAttLA model, enriched with a temporal attention mechanism, exhibits exceptional performance in predicting the direction of price movement in the WTI, Brent, and silver markets, while ASTGCN is the best-performing model for the gold market. Additionally, we observed that incorporating technical indicators from the crude oil and precious metal markets into the graph structure has improved the classification accuracy of spatial-temporal graph neural networks.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100552"},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000288/pdfft?md5=c10c3dccd1cf1f37ec277af93164392b&pid=1-s2.0-S2666827024000288-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140620883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

INSTRAS: INfrared Spectroscopic imaging-based TRAnsformers for medical image Segmentation INSTRAS：基于红外光谱成像的医学图像分割 TRAnsformers

Machine learning with applications Pub Date : 2024-04-04 DOI: 10.1016/j.mlwa.2024.100549

Hangzheng Lin , Kianoush Falahkheirkhah , Volodymyr Kindratenko , Rohit Bhargava

{"title":"INSTRAS: INfrared Spectroscopic imaging-based TRAnsformers for medical image Segmentation","authors":"Hangzheng Lin , Kianoush Falahkheirkhah , Volodymyr Kindratenko , Rohit Bhargava","doi":"10.1016/j.mlwa.2024.100549","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100549","url":null,"abstract":"<div><p>Infrared (IR) spectroscopic imaging is of potentially wide use in medical imaging applications due to its ability to capture both chemical and spatial information. This complexity of the data both necessitates using machine intelligence as well as presents an opportunity to harness a high-dimensionality data set that offers far more information than today’s manually-interpreted images. While convolutional neural networks (CNNs), including the well-known U-Net model, have demonstrated impressive performance in image segmentation, the inherent locality of convolution limits the effectiveness of these models for encoding IR data, resulting in suboptimal performance. In this work, we propose an INfrared Spectroscopic imaging-based TRAnsformers for medical image Segmentation (INSTRAS). This novel model leverages the strength of the transformer encoders to segment IR breast images effectively. Incorporating skip-connection and transformer encoders, INSTRAS overcomes the issue of pure convolution models, such as the difficulty of capturing long-range dependencies. To evaluate the performance of our model and existing convolutional models, we conducted training on various encoder–decoder models using a breast dataset of IR images. INSTRAS, utilizing 9 spectral bands for segmentation, achieved a remarkable AUC score of 0.9788, underscoring its superior capabilities compared to purely convolutional models. These experimental results attest to INSTRAS’s advanced and improved segmentation abilities for IR imaging.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100549"},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000252/pdfft?md5=4a7c41307b424494799e58f3e63dcbf1&pid=1-s2.0-S2666827024000252-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140539618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey of malware detection using deep learning 利用深度学习检测恶意软件的调查

Machine learning with applications Pub Date : 2024-03-20 DOI: 10.1016/j.mlwa.2024.100546

Ahmed Bensaoud, Jugal Kalita, Mahmoud Bensaoud

{"title":"A survey of malware detection using deep learning","authors":"Ahmed Bensaoud, Jugal Kalita, Mahmoud Bensaoud","doi":"10.1016/j.mlwa.2024.100546","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100546","url":null,"abstract":"<div><p>The problem of malicious software (malware) detection and classification is a complex task, and there is no perfect approach. There is still a lot of work to be done. Unlike most other research areas, standard benchmarks are difficult to find for malware detection. This paper aims to investigate recent advances in malware detection on MacOS, Windows, iOS, Android, and Linux using deep learning (DL) by investigating DL in text and image classification, the use of pre-trained and multi-task learning models for malware detection approaches to obtain high accuracy and which the best approach if we have a standard benchmark dataset. We discuss the issues and the challenges in malware detection using DL classifiers by reviewing the effectiveness of these DL classifiers and their inability to explain their decisions and actions to DL developers presenting the need to use Explainable Machine Learning (XAI) or Interpretable Machine Learning (IML) programs. Additionally, we discuss the impact of adversarial attacks on deep learning models, negatively affecting their generalization capabilities and resulting in poor performance on unseen data. We believe there is a need to train and test the effectiveness and efficiency of the current state-of-the-art deep learning models on different malware datasets. We examine eight popular DL approaches on various datasets. This survey will help researchers develop a general understanding of malware recognition using deep learning.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100546"},"PeriodicalIF":0.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000227/pdfft?md5=0d351b2213d1dac7e256e39ac5cc38ab&pid=1-s2.0-S2666827024000227-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140180212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A machine learning approach feature to forecast the future performance of the universities in Canada 预测加拿大大学未来表现的机器学习方法特征

Machine learning with applications Pub Date : 2024-03-19 DOI: 10.1016/j.mlwa.2024.100548

Leslie J. Wardley , Enayat Rajabi , Saman Hassanzadeh Amin , Monisha Ramesh

{"title":"A machine learning approach feature to forecast the future performance of the universities in Canada","authors":"Leslie J. Wardley , Enayat Rajabi , Saman Hassanzadeh Amin , Monisha Ramesh","doi":"10.1016/j.mlwa.2024.100548","DOIUrl":"10.1016/j.mlwa.2024.100548","url":null,"abstract":"<div><p>University ranking is a technique of measuring the performance of Higher Education Institutions (HEIs) by evaluating them on various criteria like student satisfaction, expenditure, research and teaching quality, citation count, grants, and enrolment. Ranking has been determined as a vital factor that helps students decide which institution to attend. Hence, universities seek to increase their overall rank and use these measures of success in their marketing communications and prominently place their ranked status on their institution's websites. Despite decades of research on ranking methods, a limited number of studies have leveraged predictive analytics and machine learning to rank universities. In this article, we collected 49 Canadian universities’ data for 2017–2021 and divided them based on Maclean's categories into Primarily Undergraduate, Comprehensive, and Medical/Doctoral Universities. After identifying the input and output components, we leveraged various feature engineering and machine learning techniques to predict the universities’ ranks. We used Pearson Correlation, Feature Importance, and Chi-Square as the feature engineering methods, and the results show that “student to faculty ratio,” “total number of citations”, and “total number of Grants” are the most important factors in ranking Canadian universities. Also, the Random Forest machine learning model for the “primarily undergraduate category,” the Voting classifier model for the “comprehensive category” and the Gradient Boosting model for the “medical/doctoral category” performed the best. The selected machine learning models were evaluated based on accuracy, precision, F1 score, and recall.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100548"},"PeriodicalIF":0.0,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000240/pdfft?md5=8a9f7f98d8a5d63dd8dd9ea9fa0bafa4&pid=1-s2.0-S2666827024000240-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140275653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Augmenting roadway safety with machine learning and deep learning: Pothole detection and dimension estimation using in-vehicle technologies 利用机器学习和深度学习增强道路安全：利用车载技术进行坑洞检测和尺寸估算

Machine learning with applications Pub Date : 2024-03-13 DOI: 10.1016/j.mlwa.2024.100547

Cuthbert Ruseruka , Judith Mwakalonge , Gurcan Comert , Saidi Siuhi , Frank Ngeni , Quincy Anderson

{"title":"Augmenting roadway safety with machine learning and deep learning: Pothole detection and dimension estimation using in-vehicle technologies","authors":"Cuthbert Ruseruka , Judith Mwakalonge , Gurcan Comert , Saidi Siuhi , Frank Ngeni , Quincy Anderson","doi":"10.1016/j.mlwa.2024.100547","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100547","url":null,"abstract":"<div><p>Detection and estimation of pothole dimensions is an essential step in road maintenance. Aging, heavy rainfall, traffic, and weak underlying layers may cause pavement potholes. Potholes can cause accidents when drivers lose control after hitting or swerving to avoid them, which may lead to injuries or fatal crashes. Also, potholes may result in property damages, such as flat tires, scrapes, dents, and leaks. Additionally, potholes are costly; for example, in the United States, potholes cost drivers about $3 Billion annually. Traditional ways of attending to potholes involve field surveys carried out by skilled personnel to determine their sizes for quantity and cost estimates. This process is expensive, prone to errors, subjectivity, unsafe, and time-consuming. Some authorities use sensor vehicles to carry out the surveys, a method that is accurate, safer, and faster than the traditional approach but much more expensive; therefore, not all authorities can afford them. To avoid these challenges, a modern, real-time, cost-effective approach is proposed to ensure the efficient and fast process of pothole maintenance. This paper presents a Deep Learning model trained using the You Only Look Once (YOLO) algorithm to capture potholes and estimate their dimensions and locations using only built-in vehicle technologies. The model attained 93.0 % precision, 91.6 % recall, 87.0 % F1-score, and 96.3 % mAP. A statistical analysis of the on-site test results indicates that the results are significant at a 5 % level, with a p-value of 0.037. This approach provides an economical and faster way of monitoring road surface conditions.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100547"},"PeriodicalIF":0.0,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000239/pdfft?md5=9e1de9f000eb26d823c6415a80d9cb9a&pid=1-s2.0-S2666827024000239-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140180391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient surrogate models for materials science simulations: Machine learning-based prediction of microstructure properties 用于材料科学模拟的高效替代模型：基于机器学习的微观结构特性预测

Machine learning with applications Pub Date : 2024-03-11 DOI: 10.1016/j.mlwa.2024.100544

Binh Duong Nguyen , Pavlo Potapenko , Aytekin Demirci , Kishan Govind , Sébastien Bompas , Stefan Sandfeld

{"title":"Efficient surrogate models for materials science simulations: Machine learning-based prediction of microstructure properties","authors":"Binh Duong Nguyen , Pavlo Potapenko , Aytekin Demirci , Kishan Govind , Sébastien Bompas , Stefan Sandfeld","doi":"10.1016/j.mlwa.2024.100544","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100544","url":null,"abstract":"<div><p>Determining, understanding, and predicting the so-called structure–property relation is an important task in many scientific disciplines, such as chemistry, biology, meteorology, physics, engineering, and materials science. <em>Structure</em> refers to the spatial distribution of, e.g., substances, material, or matter in general, while <em>property</em> is a resulting characteristic that usually depends in a non-trivial way on spatial details of the structure. Traditionally, forward simulations models have been used for such tasks. Recently, several machine learning algorithms have been applied in these scientific fields to enhance and accelerate simulation models or as surrogate models. In this work, we develop and investigate the applications of six machine learning techniques based on two different datasets from the domain of materials science: data from a two-dimensional Ising model for predicting the formation of magnetic domains and data representing the evolution of dual-phase microstructures from the Cahn–Hilliard model. We analyze the accuracy and robustness of all models and elucidate the reasons for the differences in their performances. The impact of including domain knowledge through tailored features is studied, and general recommendations based on the availability and quality of training data are derived from this.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100544"},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000203/pdfft?md5=704229ebef7de217e095e5b120fe2b7a&pid=1-s2.0-S2666827024000203-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140145396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Dark Side of Language Models: Exploring the Potential of LLMs in Multimedia Disinformation Generation and Dissemination 语言模型的阴暗面：探索语言模型在多媒体虚假信息生成和传播中的潜力

Machine learning with applications Pub Date : 2024-03-11 DOI: 10.1016/j.mlwa.2024.100545

Dipto Barman, Ziyi Guo, Owen Conlan

{"title":"The Dark Side of Language Models: Exploring the Potential of LLMs in Multimedia Disinformation Generation and Dissemination","authors":"Dipto Barman, Ziyi Guo, Owen Conlan","doi":"10.1016/j.mlwa.2024.100545","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100545","url":null,"abstract":"<div><p>Disinformation - the deliberate spread of false or misleading information poses a significant threat to our society by undermining trust, exacerbating polarization, and manipulating public opinion. With the rapid advancement of artificial intelligence and the growing prominence of large language models (LLMs) such as ChatGPT, new avenues for the dissemination of disinformation are emerging. This review paper explores the potential of LLMs to initiate the generation of multi-media disinformation, encompassing text, images, audio, and video. We begin by examining the capabilities of LLMs, highlighting their potential to create compelling, context-aware content that can be weaponized for malicious purposes. Subsequently, we examine the nature of disinformation and the various mechanisms through which it spreads in the digital landscape. Utilizing these advanced models, malicious actors can automate and scale up disinformation effectively. We describe a theoretical pipeline for creating and disseminating disinformation on social media. Existing interventions to combat disinformation are also reviewed. While these efforts have shown success, we argue that they need to be strengthened to effectively counter the escalating threat posed by LLMs. Digital platforms have, unfortunately, enabled malicious actors to extend the reach of disinformation. The advent of LLMs poses an additional concern as they can be harnessed to significantly amplify the velocity, variety, and volume of disinformation. Thus, this review proposes augmenting current interventions with AI tools like LLMs, capable of assessing information more swiftly and comprehensively than human fact-checkers. This paper illuminates the dark side of LLMs and highlights their potential to be exploited as disinformation dissemination tools.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100545"},"PeriodicalIF":0.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000215/pdfft?md5=62d261346a52f0843148ea85c02785d0&pid=1-s2.0-S2666827024000215-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140162418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An automated machine learning approach for detecting anomalous peak patterns in time series data from a research watershed in the northeastern United States critical zone 从美国东北部临界区研究流域的时间序列数据中检测异常峰值模式的自动机器学习方法

Machine learning with applications Pub Date : 2024-03-07 DOI: 10.1016/j.mlwa.2024.100543

Ijaz Ul Haq , Byung Suk Lee , Donna M. Rizzo , Julia N. Perdrial

{"title":"An automated machine learning approach for detecting anomalous peak patterns in time series data from a research watershed in the northeastern United States critical zone","authors":"Ijaz Ul Haq , Byung Suk Lee , Donna M. Rizzo , Julia N. Perdrial","doi":"10.1016/j.mlwa.2024.100543","DOIUrl":"https://doi.org/10.1016/j.mlwa.2024.100543","url":null,"abstract":"<div><p>This paper presents an automated machine learning framework designed to assist hydrologists in detecting anomalies in time series data generated by sensors in a research watershed in the northeastern United States critical zone. The framework specifically focuses on identifying <em>peak-pattern</em> anomalies, which may arise from sensor malfunctions or natural phenomena. However, the use of classification methods for anomaly detection poses challenges, such as the requirement for labeled data as ground truth and the selection of the most suitable deep learning model for the given task and dataset. To address these challenges, our framework generates labeled datasets by injecting synthetic peak patterns into synthetically generated time series data and incorporates an automated hyperparameter optimization mechanism. This mechanism generates an optimized model instance with the best architectural and training parameters from a pool of five selected models, namely Temporal Convolutional Network (TCN), InceptionTime, MiniRocket, Residual Networks (ResNet), and Long Short-Term Memory (LSTM). The selection is based on the user’s preferences regarding anomaly detection accuracy and computational cost. The framework employs Time-series Generative Adversarial Networks (TimeGAN) as the synthetic dataset generator. The generated model instances are evaluated using a combination of accuracy and computational cost metrics, including training time and memory, during the anomaly detection process. Performance evaluation of the framework was conducted using a dataset from a watershed, demonstrating consistent selection of the most fitting model instance that satisfies the user’s preferences.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100543"},"PeriodicalIF":0.0,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000197/pdfft?md5=2510bcf29d309e109b6368c90dc183ef&pid=1-s2.0-S2666827024000197-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140113000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0