Question

voom/limma contrast for comparing one level against base-line

0

Entering edit mode

Stijn van Dongen ▴ 80

@stijn-van-dongen-3896

Last seen 10.9 years ago

United Kingdom

Hello,

I am not completely sure I am doing the analysis correctly (see below). I have
RNA seq data for 20 cell lines for the same tissue, each cell line in
triplicate. The pdata file thus basically looks like this:

   celltype
A1 A
A2 A
A3 A
B1 B
B2 B
B3 B
....

I would like to contrast the expression in one cell line with the baseline
expression in all other cell lines. I set up the counts like this:

dge <- DGEList(counts=thedata)
dge <- calcNormFactors(dge)

and the contrast like this:

myfac <- relevel(myfac, ref="A")

design <- model.matrix(~myfac)
colnames(design) <- as.character(levels(myfac))

v <- voom(dge,design,plot=TRUE)
fit <- lmFit(v,design)
fit2 <- fit[,as.character(levels(myfac))[-1]]
fit2 <- eBayes(fit2, trend=TRUE)

Is this a (the) correct approach? Session info is at the bottom, although I
assume a bit tangential to the question.

I have plotted the median expression for genes called upregulated and
downregulated, plotting reference expression on one axis and regulation across
the other cell lines on the other axis. The plots for genes upregulated in the
reference look convincing (everything on one side of the x=y diagonal), but
the plots for genes downregulated in the reference less so, leading me to
worry about whether my code is correct.

Any help/criticism/pointers would be greatly appreciated.

regards,
Stijn

limma voom contrast • 4.1k views

ADD COMMENT • link updated 10.9 years ago by Steve Lianoglou ★ 13k • written 10.9 years ago by Stijn van Dongen ▴ 80

0

Entering edit mode

Don't forget to remove lowly expressed genes from your DGE object before you shoot it through voom!

ADD REPLY • link 10.9 years ago Steve Lianoglou ★ 13k

score 1 · Answer 1 · 2015-03-25

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 16 months ago

Icahn School of Medicine at Mount Sinai…

You generally shouldn't be subsetting the fit object like that. I believe the correct way to perform that operation is using contrasts.fit. But in your case that step is not required because the effect of interest is already a coefficient in the model. So your code should look more like this:

v <- voom(dge,design,plot=TRUE)
fit <- lmFit(v,design)
fit2 <- eBayes(fit, trend=TRUE)
topTable(fit2, coef=2)

Add additional options to the topTable call as needed, or use some other testing function such as treat.

ADD COMMENT • link 10.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thanks very much for the feedback. I tested Ryan's code, and it gives the same result as my original code. The code by Aaron seems to give different results. As Aaron's description "compare cell line B with the average expression of all other cell lines" is in fact what I aim for, I now wonder what the correct interpretation of Ryan's code (and mine) is. Would be very grateful for elucidation.

ADD REPLY • link 10.9 years ago Stijn van Dongen ▴ 80

0

Entering edit mode

Whoops, I didn't read the question carefully and thought that you had only A and B cell lines, and you wanted to compare A vs B. That's why my code would do. (Usually when on talks about "base line", they are talking about a single group being used as a reference, not one group versus everything else.)

ADD REPLY • link 10.9 years ago Ryan C. Thompson ★ 7.9k

score 1 · Answer 2 · 2015-03-25

Following up on Ryan's answer, the contrast to be used depends on what you mean by comparing "the expression in one cell line with the baseline expression in all other cell lines". If you intend to compare cell line B with the average expression of all other cell lines, you can do:

contrast <- c(0, 1, rep(-1/(ncol(design)-1), ncol(design)-2))
fit3 <- contrasts.fit(fit, contrasts=contrast)
fit3 <- eBayes(fit3, trend=TRUE)
topTable(fit3)

This specifies a null hypothesis of B* - (A* + C* + D* + ...)/(n-1) = 0, for n different cell lines, where X* is the average expression of cell line X. Alternatively, you can do an ANOVA-style analysis looking for differences between cell line B and any other cell line:

contrast <- matrix(0, ncol(design), ncol(design) - 1)
contrast[2,] <- 1
contrast[3:nrow(contrast), 2:ncol(contrast)] <- -diag(ncol(contrast)-1)
fit4 <- contrasts.fit(fit, contrasts=contrast)
fit4 <- eBayes(fit4, trend=TRUE)
topTable(fit4)

You could also do each contrast separately (set coef in the topTable call for fit4, to get the comparison corresponding to contrast[,coef]) and intersect them, to find genes that are DE between B and each of the other cell lines.