Entering edit mode
Meritxell Oliva
▴
120
@meritxell-oliva-6129
Last seen 10.2 years ago
Dear Bioconductor list,
Dear Jeff Leek & Gordon Smith,
I want to use sva() to estimate potential surrogate variables of a
RNASeq derived expression dataset, as a previous step to perform
differential gene expression analysis with limma(), previously using
voom() to transform RNASeq to microarray-like expression data.
As far as I know, SVA was originally designed to deal with ( normally
distibuted ) microarray expression data, but can also be used to work
with RNSeq data. Please, correct me if I am wrong here!
So, I first transform the raw counts into cpm-log2 values, by using
edgeR function calcNormFactors() and voom(). I apply sva() on the
transformed dataset to compute the surrogate variables. Then, I build
the design matrix to create a linear model with my primary variable of
interest (InvGeno, a quantitative discrete variable: 0,1,2 ) and the
set of surrogate variables, and I finally apply
voom()+lmFit()+eBayes() to obtain DE candidates:
###
y <- calcNormFactors(rawCounts_epression_dataset_matrix);
mod1 <- model.matrix(~InvGeno);
mod0 <- model.matrix(~1,data=InvGeno);
v <- voom(counts=y, design = mod1)
sva.obj <- sva(v$E, mod1, mod0,method="irw",n.sv=10);
mod1 <- model.matrix(~InvGeno+sva.obj$sv);
v <- voom(counts=y, design = mod1);
fit.obj <- lmFit(v, design);
fit.obj <- eBayes(fit,trend=TRUE);
###
As voom needs to be fed by raw counts and performs the cpm+log2 steps
internally, I am not sure about the properness of including in the
linear model the sva-computed surrogate variables from cpm-log2
values, and the implications that this step may produce in the DE
analysis.
Could you suggest an appropriate strategy so as to achieve my
purposes?
Thanks a lot!!!
Meritxell Oliva
PhD student
IBB (Biotechnology and Biomedicine Institute)
Comparative and Functional Genomics group
Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès -
Barcelona
[[alternative HTML version deleted]]