voom/limma contrast for comparing one level against base-line
2
0
Entering edit mode
@stijn-van-dongen-3896
Last seen 9.1 years ago
United Kingdom

 

Hello,

I am not completely sure I am doing the analysis correctly (see below).  I have
RNA seq data for 20 cell lines for the same tissue, each cell line in
triplicate. The pdata file thus basically looks like this:

   celltype
A1 A
A2 A
A3 A
B1 B
B2 B
B3 B
....


I would like to contrast the expression in one cell line with the baseline
expression in all other cell lines.  I set up the counts like this:

dge <- DGEList(counts=thedata)
dge <- calcNormFactors(dge)

and the contrast like this:

myfac <- relevel(myfac, ref="A")

design <- model.matrix(~myfac)
colnames(design) <- as.character(levels(myfac))

v <- voom(dge,design,plot=TRUE)
fit <- lmFit(v,design)
fit2 <- fit[,as.character(levels(myfac))[-1]]
fit2 <- eBayes(fit2, trend=TRUE)

Is this a (the) correct approach? Session info is at the bottom, although I
assume a bit tangential to the question.

I have plotted the median expression for genes called upregulated and
downregulated, plotting reference expression on one axis and regulation across
the other cell lines on the other axis.  The plots for genes upregulated in the
reference look convincing (everything on one side of the x=y diagonal), but
the plots for genes downregulated in the reference less so, leading me to
worry about whether my code is correct.

Any help/criticism/pointers would be greatly appreciated.

regards,
Stijn

 

limma voom contrast • 3.3k views
ADD COMMENT
0
Entering edit mode

Don't forget to remove lowly expressed genes from your DGE object before you shoot it through voom!

ADD REPLY
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA

You generally shouldn't be subsetting the fit object like that. I believe the correct way to perform that operation is using contrasts.fit. But in your case that step is not required because the effect of interest is already a coefficient in the model. So your code should look more like this:

v <- voom(dge,design,plot=TRUE)
fit <- lmFit(v,design)
fit2 <- eBayes(fit, trend=TRUE)
topTable(fit2, coef=2)

Add additional options to the topTable call as needed, or use some other testing function such as treat.

ADD COMMENT
0
Entering edit mode

Thanks very much for the feedback. I tested Ryan's code, and it gives the same result as my original code. The code by Aaron seems to give different results.  As Aaron's description "compare cell line B with the average expression of all other cell lines" is in fact what I aim for, I now wonder what the correct interpretation of Ryan's code (and mine) is. Would be very grateful for elucidation.

ADD REPLY
0
Entering edit mode

Whoops, I didn't read the question carefully and thought that you had only A and B cell lines, and you wanted to compare A vs B. That's why my code would do. (Usually when on talks about "base line", they are talking about a single group being used as a reference, not one group versus everything else.)

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

Following up on Ryan's answer, the contrast to be used depends on what you mean by comparing "the expression in one cell line with the baseline expression in all other cell lines". If you intend to compare cell line B with the average expression of all other cell lines, you can do:

contrast <- c(0, 1, rep(-1/(ncol(design)-1), ncol(design)-2))
fit3 <- contrasts.fit(fit, contrasts=contrast)
fit3 <- eBayes(fit3, trend=TRUE)
topTable(fit3)

This specifies a null hypothesis of B* - (A* + C* + D* + ...)/(n-1) = 0, for n different cell lines, where X* is the average expression of cell line X. Alternatively, you can do an ANOVA-style analysis looking for differences between cell line B and any other cell line:

contrast <- matrix(0, ncol(design), ncol(design) - 1)
contrast[2,] <- 1
contrast[3:nrow(contrast), 2:ncol(contrast)] <- -diag(ncol(contrast)-1)
fit4 <- contrasts.fit(fit, contrasts=contrast)
fit4 <- eBayes(fit4, trend=TRUE)
topTable(fit4)

You could also do each contrast separately (set coef in the topTable call for fit4, to get the comparison corresponding to contrast[,coef]) and intersect them, to find genes that are DE between B and each of the other cell lines.

ADD COMMENT
0
Entering edit mode

Hi Aaron, I converted your comment into an answer, since the OP indicated that it answered his question. (I was hoping the replies to your comment would come along for the ride, but I guess not. I'm fairly new to the moderator tools.)

ADD REPLY

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6