Question: edgeR: experiments with multiple factors comparisons
0
2.8 years ago by
xluong0
xluong0 wrote:

Hello,

I am trying to perform DE gene analysis on RNA-Seq data. My groups consist of animals from two different populations (SD v. XE) and two different treatments (CON v. E2). I'm following section 3.3.1 in the edgeR user guide.

   samples treat pop  group 1   THXL1A    E2  SD  E2.SD 2   THXL1B   CON  SD CON.SD 3   THXL1C   CON  XE CON.XE 4   THXL1D    E2  XE  E2.XE 5   THXL1E   CON  SD CON.SD 6   THXL1F    E2  SD  E2.SD 7    THXL7   CON  XE CON.XE 8    THXL8    E2  XE  E2.XE 9    THXL9   CON  SD CON.SD 10  THXL10   CON  SD CON.SD 11  THXL11    E2  SD  E2.SD 12  THXL12    E2  SD  E2.SD

I am trying to make the following contrasts:

colnames(design) <- levels(group)

fit <- glmFit(dispData, design)

contrasts <- makeContrasts(

    XE.E2vsCON = XE.E2-XE.CON,

    SD.E2vsCON = SD.E2-SD.CON,

    SDvsXE.CON = SD.CON-XE.CON,

    SDvsXE.E2 = (SD.E2-SD.CON)-(XE.E2-XE.CON),

levels=design)

I'm having trouble interpreting this specific contrast:

lrt_treat <- glmLRT(fit, contrast=contrasts[,"SDvsXE.E2"])

From following the guide, it says that I should be finding genes that have responded differently to the SD and XE with E2 treatment. Does this mean I'm comparing

A) SD-E2 v. XE-E2 (baseline)

or am I looking at

B) SD-E2 and XE-E2 combined v. SD-CON and XE-CON combined (baseline)?

Thanks!

Susan

modified 2.8 years ago by Gordon Smyth37k • written 2.8 years ago by xluong0
Answer: edgeR: experiments with multiple factors comparisons
1
2.8 years ago by
United States
James W. MacDonald50k wrote:

It's actually easier to interpret this algebraically. The first term (SD.E2 - SD.CON) is the difference between the E2 and control treatments for the SD population type. And the second term (XE.E2 - XE.CON) is the difference between the E2 and control for the XE population type. So that part is pretty straightforward, right?

Now the whole term is computing the difference between those two differences. So as an example, let's say E2 vs Con is about a 2-fold difference for both population types. In that case, the difference would compute to zero. This is true any time the difference in the first term is pretty similar to the difference in the second term, regardless of how big those differences might be. But if the difference between E2 and Con for one population is different from the other, then you end up with a big number. As an example, let's use some fake numbers

SD.E2 = 5

SD.CON = 3

XE.E2 = 2

XE.CON = 8

So (5-3) - (2 - 8) = 8 and we are testing to see if that value is equal to zero or not. In this case it is obviously not, so we get a really small p-value. And the biological interpretation is that the E2 treatment up-regulates this gene in the SD population, but down-regulates the gene in the XE population. Which might be a really cool thing to know.

The tough thing about an interaction term is that the estimate (in this case 8) isn't interpretable like it would be for a simple contrast, say (SD.E2 - SD.CON). Just knowing that the parameter estimate is 8 doesn't tell you what the biological interpretation is - instead you have to either inspect the individual contrasts or plot the gene expression values to see what is going on.

Answer: edgeR: experiments with multiple factors comparisons
1
2.8 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

Dear Susan,

I don't really understand your question. I don't see what you mean by (A) or (B) or why you think that either of these might be the same as SDvsXE.E2.

The contrast is doing exactly what you defined it to be, which is (SD.E2 - SD.CON) - (XE.E2 - XE.CON). This formula shows you exactly what it is testing! First it computes the log-fold-change for treatment E2 vs the control in the SD population (SD.E2-SD.CON). Then it computes the corresponding logFC in the XE population (XE.E2 - XE.CON). Then it compares the two logFCs to see if they are significantly different.

In other words, the contrast is testing whether the treatment effect depends on the population. Is the treatment only effective in one of the populations? Or does it act differently in the two populations? Surely this is something you would want to know!