Getting way too many DEGs with LRT for time-series RNA-seq!
1
0
Entering edit mode
Denise • 0
@860ad694
Last seen 6 days ago
Spain

Hi,

I have a time-series RNA-seq experiment where Drosophila samples are exposed to a certain chemical, call it compound, and subsequently, RNA samples were collected and sequenced at several time points post-exposure, call them time points A, B, C, D, and E. I want to use the LR test from the DESeq2 package to find genes that are significantly up-/down-regulated in treated versus control samples at any of the five time points.

Here are my full and reduced models:

mod ~ extraction_batch + ~SV1 + SV2 + ~timepoint_code + ~compound + ~timepoint_code:compound
mod0 ~ extraction_batch + SV1 + SV2 + timepoint_code

SV1 and SV2 are surrogate variables estimated with SVA, but including them or not doesn't make any difference to the results. Before running DESeq, I have pre-filtered the genes down to 10,911 using the following criteria.

smallestGroupSize <- 2
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

The problem is that when I look at how many genes have an LRT padj < 0.05, I get that 7374 genes out of 10911 total are differentially expressed in response to treatment at one time point or another! This number is way too high and not something I can work with in downstream analyses! It also makes no biological sense that the compound triggers so many genes!

What am I doing wrong, and how can I fix it?

Thank you very much in advance, Denise

TimeCourse DESeq2 • 386 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

There's two problems. First, this formula

~ extraction_batch + ~SV1 + SV2 + ~timepoint_code + ~compound + ~timepoint_code:compound

should not have a tilde (~) except for at the beginning, preceding extraction_batch. I did a quick check, and model.matrix seems to be able to handle a formula with multiple tildes, but it's not the expectation for module formulae in R.

Second, your reduced model is too reduced. You should only drop the interaction term, so mod0 should be

~ extraction_batch + SV1 + SV2 + timepoint_code + compound

Which will test the model fit with and without the interaction term, which I believe is what you want.

0
Entering edit mode

Hi,

Indeed, I am not sure why the ~ are being added! I just wrote

timepoint_code * compound

and let R expand it, but something goes awry!

The reduced model I want is actually

~ extraction_batch + SV1 + SV2 + timepoint_code

since I want to find genes responding to the compound at ANY time point, including time point A!

If I keep compound in the reduced model, LRT will test if there is a compound-specific effect at any of the time points B-E, but will say nothing about the compound effect at time point A! However, in my case, all of the A-E time points are measured after exposure, so I want to treat them in the same manner.

With the reduced model you propose, I simply cannot get the genes activated at time point A, except by doing an additional Wald test, which seems messy and inelegant.

Denise

ADD REPLY
0
Entering edit mode

I think you misunderstand the interpretation of the interaction coefficients. They estimate the interaction of a given time and time A, between the compound and control. So every interaction term includes time A. For example, the timepoint E interaction term is (time_E_compound - time_E_control) - (time_A_compound - time_A_control). Heuristically you can think of the interaction term allowing the log fold change to vary, depending on the time point. If you remove the interaction term, you no longer allow the fold change between compound and control to vary by timepoint, and instead estimate an overall value for the difference between compound and control.

The LRT between the model with and without the interaction term is asking if the model fits the data better if you allow timepoint-specific differences for the treatment, or if the model that doesn't allow the timepoint-specific differences fits the data just as well. Which is another way of asking if there is a compound-specific effect at any timepoint.

ADD REPLY
0
Entering edit mode

In section 9 Time course experiments of the RNA-seq workflow tutorial, it says

library("fission")
data("fission")
ddsTC <- DESeqDataSet(fission, ~ strain + minute + strain:minute)
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + minute)

that the full design "models the strain difference at time 0, the difference over time, and any strain-specific differences over time", whereas, with this reduced design, "genes with small p values from this LT test are those which at one or more time points after time 0 showed a strain-specific effect". Therefore, the effect at time 0 is nested within the strain term and not tested by the LR test!

In my case, I have compound instead of strain, so similar logic applies. A gene can be activated at any of the B-E time points while being either active or inactive at time A. LRT says nothing about its status at time A, so if I only look at the p-values from the LRT you suggest I will miss genes which are only active at time A.

ADD REPLY
0
Entering edit mode

Okay, I used the LRT you suggested, and I am now down to 3,026 DEGs, which is a much more reasonable number. However, now I am running into my second stumbling block regarding the use of LRT!

If I need per-timepoint LFCs for each DE gene, while removing weekly/uncertain DE genes as much as possible, it seems to me there are two options:

(1) Retrieve shrunken LFCs from the LRT results

resLRT <- results(dds, alpha=alpha)
resA <- lfcShrink(dds, coef = name_main_effect, type = "apeglm")
resB <- lfcShrink(dds, coef = name_B_effect, type = "apeglm")
resC <- lfcShrink(dds, coef = name_C_effect, type = "apeglm")
resD <- lfcShrink(dds, coef = name_D_effect, type = "apeglm")
resE <- lfcShrink(dds, coef = name_E_effect, type = "apeglm")

and keep genes having LRT padj < 0.05 and max |LFC| across A-E >= 1.

(2) Run Wald tests for each time point separately, and here even use an lfcThreshold = 1.

resA <- results(dds, name = name_main_effect, test = "Wald", lfcThreshold = 1, alpha = 0.05)
resB <- results(dds, name = name_B_effect, test = "Wald", lfcThreshold = 1, alpha = 0.05)
resC <- results(dds, name = name_C_effect, test = "Wald", lfcThreshold = 1, alpha = 0.05)
resD <- results(dds, name = name_D_effect, test = "Wald", lfcThreshold = 1, alpha = 0.05)
resE <- results(dds, name = name_E_effect, test = "Wald", lfcThreshold = 1, alpha = 0.05)

and keep genes having LRT padj < 0.05 and Wald padj < 0.05 at at least one time point.

However, the second option seems to be statistically incorrect as it tests the same gene multiple times, producing multiple time point-specific adjusted p-values for each gene.

What is the statistically solid way to obtain a robust set of DEGs and associated LFCs following the LRT?

Thank you.

PS One last question: with the updated LRT, when going from 0 to one SVA SV (surrogate variable), the number of significant LRT genes drops from 3,026 to 2,586 and then rises again to 2,789 with two SVs. Does that mean there is significant hidden technical structure beyond the extraction batch, and that one SV is a useful addition to the model?

ADD REPLY
0
Entering edit mode

I don't use DESeq2 for RNA-Seq analyses, but it's my understanding that lfcShrink is meant only for plotting and other visualizations rather than statistical analyses.

The closest thing to the LRT when using either edgeR or limma-voom is to do an F-test for all the interaction coefficients, and then go back and do what you have done, which is to identify genes that are significant in each contrast. Which has never made sense to me. Why would I do the former if I have to do the latter anyway? In addition, the F-test is less powerful than individual contrasts, so I have never followed that paradigm. Instead, what I normally do is your choice #2 without having done the F-test (LRT in your case) first.

I believe that the lfcThreshold argument for results works similarly to treat in edgeR and limma, specifying a discontinuous null and then testing against a null of H_0 < C, where C is the logFC you care about. If so, a logFC is pretty big. Remember, you are testing for evidence that the difference in populations exceeds the logFC, not that the observed samples exceed the logFC. In practice this means that the observed logFC has to be much larger than the threshold, so two-fold threshold is quite large.

Your choice #2 isn't statistically incorrect. While you are 'testing the same gene multiple times', it's the same gene in a different set of flies. Those are independent measurements. You could argue that the number of comparisons is actually five times what you are adjusting for (e.g., you should adjust all the p-values simultaneously). Both edgeR and limma have the ability to do a 'global' p-value adjustment, and maybe DESeq2 does as well, but in practice it seems like adjusting the individual contrasts is the way to go. I had a recent analysis where only a subset of the contrasts I made had significant results, but if we used the global adjustment, we gained significant results for each of the contrasts, so the global adjustment isn't necessarily more stringent than the contrast-wise adjustment.

ADD REPLY
0
Entering edit mode

It doesn't make sense that global adjustment of p-values would produce more, not fewer DEGs! How do you treat multiple copies of each gene in the adjustment?

ADD REPLY
0
Entering edit mode

The BH p-adjustment algorithm is as follows:

BH = {
        i <- lp:1L
        o <- order(p, decreasing = TRUE)
        ro <- order(o)
        pmin(1, cummin(n/i * p[o]))[ro]
}

n/i is the total number of tests divided by the index of which test we are looking at. For the smallest p-value, this will adjust using Bonferroni (n * p), and for each successive p-value, the penalty is reduced, so for the second smallest p-value, it's half the Bonferroni cutoff, then a third, then a quarter, etc.

If you have four contrasts, one of which contains a p-value that will be Bonferroni significant if you combine all the p-values, and the other three containing p-values that get really close to the Bonferroni cutoff (for that contrast), then when you combine all the p-values, all those 'close' p-values will get relaxed penalties and will get under the cutoff. As an example, consider this.

## for reproducibility
> set.seed(0xabeef)
## we expect no significant p-values from this
> z <- runif(1000)
> head(sort(p.adjust(z, "BH")))
[1] 0.2806372 0.2806372 0.2806372
[4] 0.2806372 0.2806372 0.3232420

## so no significant p-values. Let's add one that is right at Bonferonni

> z <- c(z, 0.05/1001)
> head(sort(p.adjust(z, "BH")))
[1] 0.0500000 0.2340982 0.2340982
[4] 0.2340982 0.2340982 0.2340982

## now we have one significant gene, and the FDR for some of the others is smaller
## let's add more genes, none of which meet Bonferroni

> z <- c(z[1:1000], 0.05/seq(1000, 990, -1))
## none are significant at Bonferroni cutoff
> head(sort(p.adjust(z, "bonferroni")))
[1] 0.05055000 0.05060060 0.05065130
[4] 0.05070211 0.05075301 0.05080402

## but now all ten p-values we added are significant using BH, and the ones that were 
## originally at a BH of 0.28 are now at 0.082!
> head(sort(p.adjust(z, "BH")), 16)
 [1] 0.004641873 0.004641873
 [3] 0.004641873 0.004641873
 [5] 0.004641873 0.004641873
 [7] 0.004641873 0.004641873
 [9] 0.004641873 0.004641873
[11] 0.004641873 0.053937065
[13] 0.081557728 0.081557728
[15] 0.081557728 0.088663830

And to reiterate. There aren't multiple copies of the same gene. There are multiple p-values that were generated using data from the same gene, using different sets of flies. Those are not the same thing, and there is no need to adjust, other than for the fact that you have made multiple comparisons. It's no different from a situation where one might have three groups. One control, and two treated with two different compounds. Would you expect that you have to make some adjustment for the fact that the comparison of compound A and control includes all the same genes that are compared in compound B and control? If so, why?

ADD REPLY
0
Entering edit mode

Sequential post-exposure measurements of the same set of genes is not the same thing as having multiple treated groups! And I suppose there is a reason why DESeq2 provides an LR test, and edgeR/limma provides an F-test. I just wish the authors of DESeq2 had explained when and why the LRT is preferable to the Wald test. It seems the LRT and time course experiments question keeps recurring on this forum with no clear resolution.

ADD REPLY
0
Entering edit mode

Sequential post-exposure measurements of the same set of genes is not the same thing as having multiple treated groups!

They are not the same thing? Can you explain why you think that?

ADD REPLY
0
Entering edit mode

I guess I am still not convinced that the reduced LRT model should be

~ extraction_batch + SV1 + SV2 + timepoint_code + compound

instead of

~ extraction_batch + SV1 + SV2 + timepoint_code

in my case, where the reference level time point, A, is also a treatment point, just like the others.

ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6