Hello,
I am not completely sure I am doing the analysis correctly (see below). I have
RNA seq data for 20 cell lines for the same tissue, each cell line in
triplicate. The pdata file thus basically looks like this:
celltype A1 A A2 A A3 A B1 B B2 B B3 B ....
I would like to contrast the expression in one cell line with the baseline
expression in all other cell lines. I set up the counts like this:
dge <- DGEList(counts=thedata) dge <- calcNormFactors(dge)
and the contrast like this:
myfac <- relevel(myfac, ref="A") design <- model.matrix(~myfac) colnames(design) <- as.character(levels(myfac)) v <- voom(dge,design,plot=TRUE) fit <- lmFit(v,design) fit2 <- fit[,as.character(levels(myfac))[-1]] fit2 <- eBayes(fit2, trend=TRUE)
Is this a (the) correct approach? Session info is at the bottom, although I
assume a bit tangential to the question.
I have plotted the median expression for genes called upregulated and
downregulated, plotting reference expression on one axis and regulation across
the other cell lines on the other axis. The plots for genes upregulated in the
reference look convincing (everything on one side of the x=y diagonal), but
the plots for genes downregulated in the reference less so, leading me to
worry about whether my code is correct.
Any help/criticism/pointers would be greatly appreciated.
regards,
Stijn
Don't forget to remove lowly expressed genes from your
DGE
object before you shoot it throughvoom
!