Question: sva: how to incorporate adjusting variables
6.6 years ago by
Guest User • 12k
Guest User • 12k wrote:
Dear Jeffrey, I am using sva to estimate potential surrogate variables of a microarray derived expression dataset, as a previous step to perform differential gene expression analysis. The aim of my work is to study how one multifactorial variable ( inversion genotype, three categories -> STD,HET,INV ) affects the gene expression profile of a set of human individuals. However, there are some other variables ( population, gender ) with a partial effect, that is, they account for variation in the expression of a subset of genes. I don't know how to deal with these variables. Which of the following options is the most appropriate one (if any) ? A) "Protect" them by their inclusion in the both the null and and full model mod0 = model.matrix(~as.factor(Gender)+as.factor(Population), data=pheno) mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as .factor(Population), data=pheno) svobj = sva(edata,mod,mod0) B) Include them only in the full model mod0 = model.matrix(~1, data=pheno) mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as .factor(Population)+, data=pheno) svobj = sva(edata,mod,mod0) C) Not include them at all ( and expect to get some surrogate variables with strong correlation with these variables, in case they really affect gene expression ) mod0 = model.matrix(~1, data=pheno) mod = model.matrix(~as.factor(inversion_genotype), data=pheno) svobj = sva(edata,mod,mod0) To summarize: how should adjustment variables with global effect be treated? how should adjustment variables with partial effect ( only in a subset of genes ) be treated? I would really appreciate any piece of advice. Thanks a lot! Meri -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-redhat-linux-gnu (64-bit) locale:  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8  LC_PAPER=C LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages:  stats graphics grDevices utils datasets methods base -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENT • link •