Statistical Analysis and Data Mining最新文献_第7页

A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model. 基于一个新的文本多重ARMA模型的COVID-19对中国股市影响研究

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2022-04-04 DOI: 10.1002/sam.11582

Weijun Xu, Zhineng Fu, Hongyi Li, Jinglong Huang, Weidong Xu, Yiyang Luo

{"title":"A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model.","authors":"Weijun Xu, Zhineng Fu, Hongyi Li, Jinglong Huang, Weidong Xu, Yiyang Luo","doi":"10.1002/sam.11582","DOIUrl":"10.1002/sam.11582","url":null,"abstract":"Coronavirus 2019 (COVID-19) has caused violent fluctuation in stock markets, and led to heated discussion in stock forums. The rise and fall of any specific stock is influenced by many other stocks and emotions expressed in forum discussions. Considering the transmission effect of emotions, we propose a new Textual Multiple Auto Regressive Moving Average (TM-ARMA) model to study the impact of COVID-19 on the Chinese stock market. The TM-ARMA model contains a new cross-textual term and a new cross-auto regressive (AR) term that measure the cross impacts of textual emotions and price fluctuations, respectively, and the adjacent matrix which measures the relationships among stocks is updated dynamically. We compute the textual sentiment scores by an emotion dictionary-based method, and estimate the parameter matrices by a maximum likelihood method. Our dataset includes the textual posts from the Eastmoney Stock Forum and the price data for the constituent stocks of the FTSE China A50 Index. We conduct a sliding-window online forecast approach to simulate the real-trading situations. The results show that TM-ARMA performs very well even after the attack of COVID-19.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2022-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9111149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44032709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sample Selection Bias in Evaluation of Prediction Performance of Causal Models. 因果模型预测性能评价中的样本选择偏差。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2022-02-01 DOI: 10.1002/sam.11559

James P Long, Min Jin Ha

{"title":"Sample Selection Bias in Evaluation of Prediction Performance of Causal Models.","authors":"James P Long, Min Jin Ha","doi":"10.1002/sam.11559","DOIUrl":"https://doi.org/10.1002/sam.11559","url":null,"abstract":"Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However prediction performance does depend on the selection of training and test sets. In particular biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren [5]. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association based estimators such as Lasso. Finally we compare the performance of causal estimators in simulation studies which reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"15 1","pages":"5-14"},"PeriodicalIF":1.3,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053600/pdf/nihms-1746637.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10589307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A framework for stability-based module detection in correlation graphs. 相关图中基于稳定性的模块检测框架。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2021-04-01 Epub Date: 2021-01-08 DOI: 10.1002/sam.11495

Mingmei Tian, Rachael Hageman Blair, Lina Mu, Matthew Bonner, Richard Browne, Han Yu

{"title":"A framework for stability-based module detection in correlation graphs.","authors":"Mingmei Tian, Rachael Hageman Blair, Lina Mu, Matthew Bonner, Richard Browne, Han Yu","doi":"10.1002/sam.11495","DOIUrl":"10.1002/sam.11495","url":null,"abstract":"Graphs can be used to represent the direct and indirect relationships between variables, and elucidate complex relationships and interdependencies. Detecting structure within a graph is a challenging problem. This problem is studied over a range of fields and is sometimes termed community detection, module detection, or graph partitioning. A popular class of algorithms for module detection relies on optimizing a function of modularity to identify the structure. In practice, graphs are often learned from the data, and thus prone to uncertainty. In these settings, the uncertainty of the network structure can become exaggerated by giving unreliable estimates of the module structure. In this work, we begin to address this challenge through the use of a nonparametric bootstrap approach to assessing the stability of module detection in a graph. Estimates of stability are presented at the level of the individual node, the inferred modules, and as an overall measure of performance for module detection in a given graph. Furthermore, bootstrap stability estimates are derived for complexity parameter selection that ultimately defines a graph from data in a way that optimizes stability. This approach is utilized in connection with correlation graphs but is generalizable to other graphs that are defined through the use of dissimilarity measures. We demonstrate our approach using a broad range of simulations and on a metabolomics dataset from the Beijing Olympics Air Pollution study. These approaches are implemented using bootcluster package that is available in the R programming language.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"14 2","pages":"129-143"},"PeriodicalIF":1.3,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25525965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised random forests. 无监督随机森林。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2021-04-01 Epub Date: 2021-02-05 DOI: 10.1002/sam.11498

Alejandro Mantero, Hemant Ishwaran

引用次数: 14

A clustering method for graphical handwriting components and statistical writership analysis. 用于图形笔迹成分和统计笔迹分析的聚类方法。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2021-02-01 Epub Date: 2020-11-24 DOI: 10.1002/sam.11488

Amy M Crawford, Nicholas S Berry, Alicia L Carriquiry

{"title":"A clustering method for graphical handwriting components and statistical writership analysis.","authors":"Amy M Crawford, Nicholas S Berry, Alicia L Carriquiry","doi":"10.1002/sam.11488","DOIUrl":"10.1002/sam.11488","url":null,"abstract":"Handwritten documents can be characterized by their content or by the shape of the written characters. We focus on the problem of comparing a person's handwriting to a document of unknown provenance using the shape of the writing, as is done in forensic applications. To do so, we first propose a method for processing scanned handwritten documents to decompose the writing into small graphical structures, often corresponding to letters. We then introduce a measure of distance between two such structures that is inspired by the graph edit distance, and a measure of center for a collection of the graphs. These measurements are the basis for an outlier tolerant K-means algorithm to cluster the graphs based on structural attributes, thus creating a template for sorting new documents. Finally, we present a Bayesian hierarchical model to capture the propensity of a writer for producing graphs that are assigned to certain clusters. We illustrate the methods using documents from the Computer Vision Lab dataset. We show results of the identification task under the cluster assignments and compare to the same modeling, but with a less flexible grouping method that is not tolerant of incidental strokes or outliers.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"14 1","pages":"41-60"},"PeriodicalIF":1.3,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7894190/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25432031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable network estimation with L ₀ penalty. 具有l0损失的可扩展网络估计。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2021-02-01 Epub Date: 2020-10-21 DOI: 10.1002/sam.11483

Junghi Kim, Hongtu Zhu, Xiao Wang, Kim-Anh Do

引用次数: 1

Knot selection in sparse Gaussian processes with a variational objective function. 具有变异目标函数的稀疏高斯过程中的结点选择。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2020-08-01 Epub Date: 2020-04-20 DOI: 10.1002/sam.11459

Nathaniel Garton, Jarad Niemi, Alicia Carriquiry

引用次数: 0

An algorithm to compare two-dimensional footwear outsole images using maximum cliques and speeded-up robust feature. 利用最大聚类和加速鲁棒特征比较二维鞋底图像的算法。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2020-04-01 Epub Date: 2020-02-21 DOI: 10.1002/sam.11449

Soyoung Park, Alicia Carriquiry

{"title":"An algorithm to compare two-dimensional footwear outsole images using maximum cliques and speeded-up robust feature.","authors":"Soyoung Park, Alicia Carriquiry","doi":"10.1002/sam.11449","DOIUrl":"10.1002/sam.11449","url":null,"abstract":"Footwear examiners are tasked with comparing an outsole impression (Q) left at a crime scene with an impression (K) from a database or from the suspect's shoe. We propose a method for comparing two shoe outsole impressions that relies on robust features (speeded-up robust feature; SURF) on each impression and aligns them using a maximum clique (MC). After alignment, an algorithm we denote MC-COMP is used to extract additional features that are then combined into a univariate similarity score using a random forest (RF). We use a database of shoe outsole impressions that includes images from two models of athletic shoes that were purchased new and then worn by study participants for about 6 months. The shoes share class characteristics such as outsole pattern and size, and thus the comparison is challenging. We find that the RF implemented on SURF outperforms other methods recently proposed in the literature in terms of classification precision. In more realistic scenarios where crime scene impressions may be degraded and smudged, the algorithm we propose-denoted MC-COMP-SURF-shows the best classification performance by detecting unique features better than other methods. The algorithm can be implemented with the R-package shoeprintr.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"13 2","pages":"188-199"},"PeriodicalIF":1.3,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079556/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37774263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments. 适度计算环境下海量空间数据集的实用贝叶斯建模与推理。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2019-06-01 DOI: 10.1002/sam.11413

Lu Zhang, Abhirup Datta, Sudipto Banerjee

{"title":"Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments.","authors":"Lu Zhang, Abhirup Datta, Sudipto Banerjee","doi":"10.1002/sam.11413","DOIUrl":"https://doi.org/10.1002/sam.11413","url":null,"abstract":"With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial datasets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article devises massively scalable Bayesian approaches that can rapidly deliver inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (e.g., a standard desktop or laptop) using easily available statistical software packages. Key insights are offered regarding assumptions and approximations concerning practical efficiency.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"12 3","pages":"197-209"},"PeriodicalIF":1.3,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Fused Lasso Regression for Identifying Differential Correlations in Brain Connectome Graphs. 融合Lasso回归识别脑连接组图的差异相关性。

IF 1.3 4区数学

Statistical Analysis and Data Mining Pub Date : 2018-10-01 Epub Date: 2018-07-11 DOI: 10.1002/sam.11382

Donghyeon Yu, Sang Han Lee, Johan Lim, Guanghua Xiao, R Cameron Craddock, Bharat B Biswal

引用次数: 3