Question: How to check whether experimental covariates are confounded with our group of interest?
0
7 weeks ago by
english.server20 wrote:

Hi

In the context of microarray analysis, how to check whether experimental covariates (age/gender) are confounded with our grouping of interest (ie diseased vs normal)?

goi = sample(letters[1:2], 20, T) # group of interest

cov=list() #3 covariates
cov$c1=sample(letters[1:4], 20, T) cov$c2=sample(letters[1:5], 20, T)
cov$c3=sample(letters[1:2], 20, T) numeric_covar= sample (c(25:60), 20, T)  Approach1- chisq.test sapply(names(cov), function (x) chisq.test (cov[[x]], goi)$p.value) # not for numeric_covar


Appraoch2- Anova/t.test

sapply(names(cov), function (x)  # suitable for *numeric_covar* as well.
anova(lm( as.numeric(as.factor(cov[[x]])) ~ as.numeric(as.factor(goi))))$'Pr(>F)'[1])  I think the numeric_covar can only be dealt with the second approach. microarray covariates • 141 views ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by english.server20 Answer: How to check whether experimental covariates are confounded with our group of in 1 7 weeks ago by Gordon Smyth39k Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Gordon Smyth39k wrote: Well, I tried to help you previously: https://support.bioconductor.org/p/124670/ and I think I might even have been the one who introduced you to the term "confounded" for experimental designs. I advised you against trying to use tests like the ones you give here. The tests you propose are simply testing for correlation rather than confounding and neither seems informative to me. I wonder what real problem you are trying to solve. I wonder whether you are perhaps trying to come up with a series of tests so that you can analyse microarray datasets automatically without having to look at plots or think about the variables. That would be an unrealistic idea. If you are trying to decide which variables to include in a microarray analysis, it is better to look at your data and think about the meaning of the variables, perhaps making a table or a plot along the way to help you. There are mathematical methods for examining collinearity in linear models using an eigenvalue decomposition of the design matrix, but this is strictly for mathematicians and I do not think it would be helpful anyway for any real microarray dataset. BTW, the term "covariate" always refers to a numeric variable. Categorical variables are instead called factors. ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Gordon Smyth39k Thank you Gordon Smyth. Regarding my previous post, I thought the concepts asked there are different from what is asked here! Thanks for clarification. You are absolutely right about me wanting to "analyse microarray datasets automatically without having to look at plots or think about the variables." and that's partly because I think my knowledge in statistics is way too shallow and I'm trying to simplify things, ie to look for a number (threshold) to decide about samples. I was happy to find a tutorial on github using somehing like what I wote above: sapply(names(cov), function (x) # suitable for *numeric_covar* as well. anova(lm( as.numeric(as.factor(cov[[x]])) ~ as.numeric(as.factor(goi))))$'Pr(>F)'[1])


but it seems that I have been overgeneralizing. The link to the github tutorial: https://github.com/icnn/Microarray-Tutorials/wiki/Affymetrix#7

If goi has two groups, then the code you've written is a very complicated way of doing a two-sample t-test. If goi has more than two groups, then the code will give nonsense results.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Gordon Smyth39k

Thank you for your response. I think I've to study a little more to digest the concept.