Entering edit mode
Guest User
★
13k
@guest-user-4897
Last seen 10.3 years ago
Dear Jeffrey,
I am using sva to estimate potential surrogate variables of a
microarray derived expression dataset, as a previous step to perform
differential gene expression analysis. The aim of my work is to study
how one multifactorial variable ( inversion genotype, three
categories -> STD,HET,INV ) affects the gene expression profile of a
set of human individuals. However, there are some other variables (
population, gender ) with a partial effect, that is, they account for
variation in the expression of a subset of genes. I don't know how to
deal with these variables. Which of the following options is the most
appropriate one (if any) ?
A) "Protect" them by their inclusion in the both the null and and full
model
mod0 = model.matrix(~as.factor(Gender)+as.factor(Population),
data=pheno)
mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as
.factor(Population), data=pheno)
svobj = sva(edata,mod,mod0)
B) Include them only in the full model
mod0 = model.matrix(~1, data=pheno)
mod = model.matrix(~as.factor(inversion_genotype)+as.factor(Gender)+as
.factor(Population)+, data=pheno)
svobj = sva(edata,mod,mod0)
C) Not include them at all ( and expect to get some surrogate
variables with strong correlation with these variables, in case they
really affect gene expression )
mod0 = model.matrix(~1, data=pheno)
mod = model.matrix(~as.factor(inversion_genotype), data=pheno)
svobj = sva(edata,mod,mod0)
To summarize: how should adjustment variables with global effect be
treated? how should adjustment variables with partial effect ( only in
a subset of genes ) be treated?
I would really appreciate any piece of advice.
Thanks a lot!
Meri
-- output of sessionInfo():
R version 2.15.2 (2012-10-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
--
Sent via the guest posting facility at bioconductor.org.