Statistical Analysis and Data Mining最新文献

筛选
英文 中文
A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model. 基于一个新的文本多重ARMA模型的COVID-19对中国股市影响研究
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2022-04-04 DOI: 10.1002/sam.11582
Weijun Xu, Zhineng Fu, Hongyi Li, Jinglong Huang, Weidong Xu, Yiyang Luo
{"title":"A study of the impact of COVID-19 on the Chinese stock market based on a new textual multiple ARMA model.","authors":"Weijun Xu, Zhineng Fu, Hongyi Li, Jinglong Huang, Weidong Xu, Yiyang Luo","doi":"10.1002/sam.11582","DOIUrl":"10.1002/sam.11582","url":null,"abstract":"<p><p>Coronavirus 2019 (COVID-19) has caused violent fluctuation in stock markets, and led to heated discussion in stock forums. The rise and fall of any specific stock is influenced by many other stocks and emotions expressed in forum discussions. Considering the transmission effect of emotions, we propose a new Textual Multiple Auto Regressive Moving Average (TM-ARMA) model to study the impact of COVID-19 on the Chinese stock market. The TM-ARMA model contains a new cross-textual term and a new cross-auto regressive (AR) term that measure the cross impacts of textual emotions and price fluctuations, respectively, and the adjacent matrix which measures the relationships among stocks is updated dynamically. We compute the textual sentiment scores by an emotion dictionary-based method, and estimate the parameter matrices by a maximum likelihood method. Our dataset includes the textual posts from the Eastmoney Stock Forum and the price data for the constituent stocks of the FTSE China A50 Index. We conduct a sliding-window online forecast approach to simulate the real-trading situations. The results show that TM-ARMA performs very well even after the attack of COVID-19.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9111149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44032709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample Selection Bias in Evaluation of Prediction Performance of Causal Models. 因果模型预测性能评价中的样本选择偏差。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2022-02-01 DOI: 10.1002/sam.11559
James P Long, Min Jin Ha
{"title":"Sample Selection Bias in Evaluation of Prediction Performance of Causal Models.","authors":"James P Long,&nbsp;Min Jin Ha","doi":"10.1002/sam.11559","DOIUrl":"https://doi.org/10.1002/sam.11559","url":null,"abstract":"<p><p>Causal models are notoriously difficult to validate because they make untestable assumptions regarding confounding. New scientific experiments offer the possibility of evaluating causal models using prediction performance. Prediction performance measures are typically robust to violations in causal assumptions. However prediction performance does depend on the selection of training and test sets. In particular biased training sets can lead to optimistic assessments of model performance. In this work, we revisit the prediction performance of several recently proposed causal models tested on a genetic perturbation data set of Kemmeren [5]. We find that sample selection bias is likely a key driver of model performance. We propose using a less-biased evaluation set for assessing prediction performance and compare models on this new set. In this setting, the causal models have similar or worse performance compared to standard association based estimators such as Lasso. Finally we compare the performance of causal estimators in simulation studies which reproduce the Kemmeren structure of genetic knockout experiments but without any sample selection bias. These results provide an improved understanding of the performance of several causal models and offer guidance on how future studies should use Kemmeren.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053600/pdf/nihms-1746637.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10589307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A framework for stability-based module detection in correlation graphs. 相关图中基于稳定性的模块检测框架。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2021-04-01 Epub Date: 2021-01-08 DOI: 10.1002/sam.11495
Mingmei Tian, Rachael Hageman Blair, Lina Mu, Matthew Bonner, Richard Browne, Han Yu
{"title":"A framework for stability-based module detection in correlation graphs.","authors":"Mingmei Tian, Rachael Hageman Blair, Lina Mu, Matthew Bonner, Richard Browne, Han Yu","doi":"10.1002/sam.11495","DOIUrl":"10.1002/sam.11495","url":null,"abstract":"<p><p>Graphs can be used to represent the direct and indirect relationships between variables, and elucidate complex relationships and interdependencies. Detecting structure within a graph is a challenging problem. This problem is studied over a range of fields and is sometimes termed community detection, module detection, or graph partitioning. A popular class of algorithms for module detection relies on optimizing a function of modularity to identify the structure. In practice, graphs are often learned from the data, and thus prone to uncertainty. In these settings, the uncertainty of the network structure can become exaggerated by giving unreliable estimates of the module structure. In this work, we begin to address this challenge through the use of a nonparametric bootstrap approach to assessing the <i>stability</i> of module detection in a graph. Estimates of stability are presented at the level of the individual node, the inferred modules, and as an overall measure of performance for module detection in a given graph. Furthermore, bootstrap stability estimates are derived for complexity parameter selection that ultimately defines a graph from data in a way that optimizes stability. This approach is utilized in connection with correlation graphs but is generalizable to other graphs that are defined through the use of dissimilarity measures. We demonstrate our approach using a broad range of simulations and on a metabolomics dataset from the Beijing Olympics Air Pollution study. These approaches are implemented using bootcluster package that is available in the R programming language.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7986843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25525965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised random forests. 无监督随机森林。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2021-04-01 Epub Date: 2021-02-05 DOI: 10.1002/sam.11498
Alejandro Mantero, Hemant Ishwaran
{"title":"Unsupervised random forests.","authors":"Alejandro Mantero,&nbsp;Hemant Ishwaran","doi":"10.1002/sam.11498","DOIUrl":"https://doi.org/10.1002/sam.11498","url":null,"abstract":"<p><p>sidClustering is a new random forests unsupervised machine learning algorithm. The first step in sidClustering involves what is called sidification of the features: staggering the features to have mutually exclusive ranges (called the staggered interaction data [SID] main features) and then forming all pairwise interactions (called the SID interaction features). Then a multivariate random forest (able to handle both continuous and categorical variables) is used to predict the SID main features. We establish uniqueness of sidification and show how multivariate impurity splitting is able to identify clusters. The proposed sidClustering method is adept at finding clusters arising from categorical and continuous variables and retains all the important advantages of random forests. The method is illustrated using simulated and real data as well as two in depth case studies, one from a large multi-institutional study of esophageal cancer, and the other involving hospital charges for cardiovascular patients.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11498","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25583970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A clustering method for graphical handwriting components and statistical writership analysis. 用于图形笔迹成分和统计笔迹分析的聚类方法。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2021-02-01 Epub Date: 2020-11-24 DOI: 10.1002/sam.11488
Amy M Crawford, Nicholas S Berry, Alicia L Carriquiry
{"title":"A clustering method for graphical handwriting components and statistical writership analysis.","authors":"Amy M Crawford, Nicholas S Berry, Alicia L Carriquiry","doi":"10.1002/sam.11488","DOIUrl":"10.1002/sam.11488","url":null,"abstract":"<p><p>Handwritten documents can be characterized by their content or by the shape of the written characters. We focus on the problem of comparing a person's handwriting to a document of unknown provenance using the shape of the writing, as is done in forensic applications. To do so, we first propose a method for processing scanned handwritten documents to decompose the writing into small graphical structures, often corresponding to letters. We then introduce a measure of distance between two such structures that is inspired by the graph edit distance, and a measure of center for a collection of the graphs. These measurements are the basis for an outlier tolerant <i>K</i>-means algorithm to cluster the graphs based on structural attributes, thus creating a template for sorting new documents. Finally, we present a Bayesian hierarchical model to capture the propensity of a writer for producing graphs that are assigned to certain clusters. We illustrate the methods using documents from the Computer Vision Lab dataset. We show results of the identification task under the cluster assignments and compare to the same modeling, but with a less flexible grouping method that is not tolerant of incidental strokes or outliers.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7894190/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25432031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable network estimation with L 0 penalty. 具有l0损失的可扩展网络估计。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2021-02-01 Epub Date: 2020-10-21 DOI: 10.1002/sam.11483
Junghi Kim, Hongtu Zhu, Xiao Wang, Kim-Anh Do
{"title":"Scalable network estimation with <i>L</i> <sub>0</sub> penalty.","authors":"Junghi Kim,&nbsp;Hongtu Zhu,&nbsp;Xiao Wang,&nbsp;Kim-Anh Do","doi":"10.1002/sam.11483","DOIUrl":"https://doi.org/10.1002/sam.11483","url":null,"abstract":"<p><p>With the advent of high-throughput sequencing, an efficient computing strategy is required to deal with large genomic data sets. The challenge of estimating a large precision matrix has garnered substantial research attention for its direct application to discriminant analyses and graphical models. Most existing methods either use a lasso-type penalty that may lead to biased estimators or are computationally intensive, which prevents their applications to very large graphs. We propose using an <i>L</i> <sub>0</sub> penalty to estimate an ultra-large precision matrix (scalnetL0). We apply scalnetL0 to RNA-seq data from breast cancer patients represented in The Cancer Genome Atlas and find improved accuracy of classifications for survival times. The estimated precision matrix provides information about a large-scale co-expression network in breast cancer. Simulation studies demonstrate that scalnetL0 provides more accurate and efficient estimators, yielding shorter CPU time and less Frobenius loss on sparse learning for large-scale precision matrix estimation.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2021-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11483","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39933742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Knot selection in sparse Gaussian processes with a variational objective function. 具有变异目标函数的稀疏高斯过程中的结点选择。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2020-08-01 Epub Date: 2020-04-20 DOI: 10.1002/sam.11459
Nathaniel Garton, Jarad Niemi, Alicia Carriquiry
{"title":"Knot selection in sparse Gaussian processes with a variational objective function.","authors":"Nathaniel Garton, Jarad Niemi, Alicia Carriquiry","doi":"10.1002/sam.11459","DOIUrl":"10.1002/sam.11459","url":null,"abstract":"<p><p>Sparse, knot-based Gaussian processes have enjoyed considerable success as scalable approximations of full Gaussian processes. Certain sparse models can be derived through specific variational approximations to the true posterior, and knots can be selected to minimize the Kullback-Leibler divergence between the approximate and true posterior. While this has been a successful approach, simultaneous optimization of knots can be slow due to the number of parameters being optimized. Furthermore, there have been few proposed methods for selecting the number of knots, and no experimental results exist in the literature. We propose using a one-at-a-time knot selection algorithm based on Bayesian optimization to select the number and locations of knots. We showcase the competitive performance of this method relative to optimization of knots simultaneously on three benchmark datasets, but at a fraction of the computational cost.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7386924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38227720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An algorithm to compare two-dimensional footwear outsole images using maximum cliques and speeded-up robust feature. 利用最大聚类和加速鲁棒特征比较二维鞋底图像的算法。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2020-04-01 Epub Date: 2020-02-21 DOI: 10.1002/sam.11449
Soyoung Park, Alicia Carriquiry
{"title":"An algorithm to compare two-dimensional footwear outsole images using maximum cliques and speeded-up robust feature.","authors":"Soyoung Park, Alicia Carriquiry","doi":"10.1002/sam.11449","DOIUrl":"10.1002/sam.11449","url":null,"abstract":"<p><p>Footwear examiners are tasked with comparing an outsole impression (<i>Q</i>) left at a crime scene with an impression (<i>K</i>) from a database or from the suspect's shoe. We propose a method for comparing two shoe outsole impressions that relies on robust features (speeded-up robust feature; SURF) on each impression and aligns them using a maximum clique (MC). After alignment, an algorithm we denote MC-COMP is used to extract additional features that are then combined into a univariate similarity score using a random forest (RF). We use a database of shoe outsole impressions that includes images from two models of athletic shoes that were purchased new and then worn by study participants for about 6 months. The shoes share class characteristics such as outsole pattern and size, and thus the comparison is challenging. We find that the RF implemented on SURF outperforms other methods recently proposed in the literature in terms of classification precision. In more realistic scenarios where crime scene impressions may be degraded and smudged, the algorithm we propose-denoted MC-COMP-SURF-shows the best classification performance by detecting unique features better than other methods. The algorithm can be implemented with the R-package shoeprintr.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7079556/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37774263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments. 适度计算环境下海量空间数据集的实用贝叶斯建模与推理。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2019-06-01 DOI: 10.1002/sam.11413
Lu Zhang, Abhirup Datta, Sudipto Banerjee
{"title":"Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments.","authors":"Lu Zhang,&nbsp;Abhirup Datta,&nbsp;Sudipto Banerjee","doi":"10.1002/sam.11413","DOIUrl":"https://doi.org/10.1002/sam.11413","url":null,"abstract":"<p><p>With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial datasets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article devises massively scalable Bayesian approaches that can rapidly deliver inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (e.g., a standard desktop or laptop) using easily available statistical software packages. Key insights are offered regarding assumptions and approximations concerning practical efficiency.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11413","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Fused Lasso Regression for Identifying Differential Correlations in Brain Connectome Graphs. 融合Lasso回归识别脑连接组图的差异相关性。
IF 1.3 4区 数学
Statistical Analysis and Data Mining Pub Date : 2018-10-01 Epub Date: 2018-07-11 DOI: 10.1002/sam.11382
Donghyeon Yu, Sang Han Lee, Johan Lim, Guanghua Xiao, R Cameron Craddock, Bharat B Biswal
{"title":"Fused Lasso Regression for Identifying Differential Correlations in Brain Connectome Graphs.","authors":"Donghyeon Yu,&nbsp;Sang Han Lee,&nbsp;Johan Lim,&nbsp;Guanghua Xiao,&nbsp;R Cameron Craddock,&nbsp;Bharat B Biswal","doi":"10.1002/sam.11382","DOIUrl":"https://doi.org/10.1002/sam.11382","url":null,"abstract":"<p><p>In this paper, we propose a procedure to find differential edges between two graphs from high-dimensional data. We estimate two matrices of partial correlations and their differences by solving a penalized regression problem. We assume sparsity only on differences between two graphs, not graphs themselves. Thus, we impose an <i>ℓ</i> <sub>2</sub> penalty on partial correlations and an <i>ℓ</i> <sub>1</sub> penalty on their differences in the penalized regression problem. We apply the proposed procedure to finding differential functional connectivity between healthy individuals and Alzheimer's disease patients.</p>","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":null,"pages":null},"PeriodicalIF":1.3,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/sam.11382","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39306532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信