{"title":"Generic Feature Selection with Short Fat Data.","authors":"B Clarke, J-H Chu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Consider a regression problem in which there are many more explanatory variables than data points, <i>i.e</i>., <i>p</i> ≫ <i>n</i>. Essentially, without reducing the number of variables inference is impossible. So, we group the <i>p</i> explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of <i>n</i>, <i>p</i>, classes of statistics, clustering algorithms, penalty terms, and data types. When <i>n</i> is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [<i>n</i>/<i>K</i>] statistics where <i>K</i> is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an <i>L<sup>q</sup></i> norm with high enough <i>q</i>.</p>","PeriodicalId":89431,"journal":{"name":"Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics","volume":"68 2","pages":"145-162"},"PeriodicalIF":0.0,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4208697/pdf/nihms619926.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32773117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molly L Kile, E Andres Houseman, Quazi Quamruzzaman, Mahmuder Rahman, Golam Mahiuddin, Golam Mostofa, Yu-Mei Hsueh, David C Christiani
{"title":"Influence of GSTT1 Genetic Polymorphisms on Arsenic Metabolism.","authors":"Molly L Kile, E Andres Houseman, Quazi Quamruzzaman, Mahmuder Rahman, Golam Mahiuddin, Golam Mostofa, Yu-Mei Hsueh, David C Christiani","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A repeated measures study was conducted in Pabna, Bangladesh to investigate factors that influence biomarkers of arsenic exposure. Drinking water arsenic concentrations were measured by inductively-coupled plasma mass spectrometry (ICP-MS) and urinary arsenic species [arsenite (As<sub>3</sub>), arsenate (As<sub>5</sub>), monomethylarsonic acid (MMA) and dimethylarsinic acid (DMA)] were detected using High Performance Liquid Chromatography (HPLC) and Hydride Generated Atomic Absorption Spectrometry (HGAAS). Linear mixed effects models with random intercepts were used to evaluate the effects of arsenic contaminated drinking water, genetic polymorphisms in glutathione-S-transferase (GSTT1 and GSTM1) on total urinary arsenic, primary methylation index [MMA/(As<sub>3</sub>+As<sub>5</sub>)], secondary methylation index (DMA/MMA), and total methylation index [(MMA+DMA)/(As<sub>3</sub>+As<sub>5</sub>)]. Drinking water arsenic concentrations were positively associated with total urinary arsenic concentrations and total methylation index. A significant gene-environment interaction was observed between urinary arsenic exposure in drinking water GSTT1 but not GSTM1 where GSTT1 <i>null</i> individuals had a slightly higher excretion rate of arsenic compared to GSTT1 <i>wildtypes</i> after adjusting for other factors. Additionally, individuals with GSTT1 <i>null</i> genotypes had a higher primary methylation index and lower secondary methylation index compared to GSTT1 <i>wildtype</i> after adjusting for other factors. This data suggests that GSTT1 contributes to the observed variability in arsenic metabolism. Since individuals with a higher primary methylation index and lower secondary methylation index are more susceptible to arsenic related disease, these results suggest that GSTT1 <i>null</i> individuals may be more susceptible to arsenic-related toxicity. No significant associations were observed between GSTM1 and any of the arsenic methylation indices.</p>","PeriodicalId":89431,"journal":{"name":"Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics","volume":"67 2","pages":"197-207"},"PeriodicalIF":0.0,"publicationDate":"2013-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3916182/pdf/nihms549066.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32104200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust nonlinear regression in applications.","authors":"Changwon Lim, Pranab K Sen, Shyamal D Peddada","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Robust statistical methods, such as M-estimators, are needed for nonlinear regression models because of the presence of outliers/influential observations and heteroscedasticity. Outliers and influential observations are commonly observed in many applications, especially in toxicology and agricultural experiments. For example, dose response studies, which are routinely conducted in toxicology and agriculture, sometimes result in potential outliers, especially in the high dose groups. This is because response to high doses often varies among experimental units (e.g., animals). Consequently, this may result in outliers (i.e., very low values) in that group. Unlike the linear models, in nonlinear models the outliers not only impact the point estimates of the model parameters but can also severely impact the estimate of the information matrix. Note that, the information matrix in a nonlinear model is a function of the model parameters. This is not the case in linear models. In addition to outliers, heteroscedasticity is a major concern when dealing with nonlinear models. Ignoring heteroscedasticity may lead to inaccurate coverage probabilities and Type I error rates. Robustness to outliers/influential observations and to heteroscedasticity is even more important when dealing with thousands of nonlinear regression models in quantitative high throughput screening assays. Recently, these issues have been studied very extensively in the literature (references are provided in this paper), where the proposed estimator is robust to outliers/influential observations as well as to heteroscedasticity. The focus of this paper is to provide the theoretical underpinnings of robust procedures developed recently.</p>","PeriodicalId":89431,"journal":{"name":"Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics","volume":"67 2","pages":"215-234"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286339/pdf/nihms610595.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32967741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inferences on Small Area Proportions.","authors":"Shijie Chen, P Lahiri","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Design-based methods are generally inefficient for making inferences about small area proportions for rare events. In this paper, we discuss an alternative hierarchical model and the associated hierarchical Bayes methodology. Sufficient conditions for propriety of the posterior distributions of relevant parameters are presented.</p>","PeriodicalId":89431,"journal":{"name":"Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics","volume":"66 1","pages":"121-124"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3896051/pdf/nihms535133.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32054276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Correlated Gene Expression Data on Ordered Categories.","authors":"Shyamal D Peddada, Shawn F Harris, Ori Davidov","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>A bootstrap based methodology is introduced for analyzing repeated measures/longitudinal microarray gene expression data over ordered categories. The proposed non-parametric procedure uses order-restricted inference to compare gene expressions among ordered experimental conditions. The null distribution for determining significance is derived by suitably bootstrapping the residuals. The procedure addresses two potential sources of correlation in the data, namely, (a) correlations among genes within a chip (\"intra-chip\" correlation), and (b) correlation within subject due to repeated/longitudinal measurements (\"temporal\" correlation). To make the procedure computationally efficient, the adaptive bootstrap methodology of Guo and Peddada (2008) is implemented such that the resulting procedure controls the false discovery rate (FDR) at the desired nominal level.</p>","PeriodicalId":89431,"journal":{"name":"Journal of the Indian Society of Agricultural Statistics. Indian Society of Agricultural Statistics","volume":"64 1","pages":"45-60"},"PeriodicalIF":0.0,"publicationDate":"2010-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3190572/pdf/nihms250300.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"30206889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}