Question

FDR adjustment in Time-Course DATA DESEQ2

2

Entering edit mode

jakob.petereit ▴ 20

@jakobpetereit-13576

Last seen 7.0 years ago

Australia,Perth, ARC PEB

Hi,

I am analyzing time course RNA seq data.

My design looks like this:

ID	Genotype	SA.treatment
1	jc66	0h
2	jc66	0h
3	jc66	0h
4	dsr	0h
5	dsr	0h
6	dsr	0h
7	comp	0h
8	comp	0h
9	comp	0h
10	jc66	0.5h
11	jc66	0.5h
12	jc66	0.5h
13	dsr	0.5h
14	dsr	0.5h
15	dsr	0.5h
16	comp	0.5h
17	comp	0.5h
18	comp	0.5h
20	jc66	3h
21	jc66	3h
22	dsr	3h
23	dsr	3h
24	dsr	3h
25	comp	3h
26	comp	3h
27	comp	3h
28	jc66	6h
29	jc66	6h
30	jc66	6h
31	dsr	6h
32	dsr	6h
33	dsr	6h
34	comp	6h
35	comp	6h
36	comp	6h

To sum up: 3 Genotypes, 4 time points.

I mapped the bam files with star aligner, and extracted raw counts with genomicalignments

I run deseq2 on it with mixed success. My design formula looks like this:

ddstc <- DESeqDataSet(se,~Genotype + SA.treatment + Genotype:SA.treatment)
ddstc <- DESeq(ddstc_universal, test="LRT", reduced = ~Genotype + SA.treatment)

If I look at the pvalue histograms of the individual comparisons, I see, that things are not going all right, here an example pvalue histogramm:

(I switched the filters off, cause there is a heave expression induction upon treatment, and the filter filter all of these)

res <- results(ddstc, name='Genotype_dsr_vs_jc66', test='Wald', independentFiltering = F, cooksCutoff = F) 

hist(res$pvalue, col = "lavender", main = n, xlab = "p-values")

I found that this could be cause by the Wald test and I can run a false discovery rate correction, like explained here:

http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html#inspection-and-correction-of-pvalues

and it indeed improves my pvalue histogram and i can add a corrected adjusted pvalue (not perfect, but already a lot better):

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)
FDR.res$param[1, "sd"]
res[,"padj"]  <- p.adjust(FDR.res$pval, method = "BH")

hist(FDR.res$pval, col = "royalblue4",main = paste0(n,' - corrected'), xlab = "CORRECTED p-values")

This is all fine and good, but can I apply the same technique for the whole time course experiment?

I took the time course workflow from the RNAseq workflow:

https://www.bioconductor.org/help/workflows/rnaseqGene/#time-course-experiments

resTC <- results(ddsTC)
betas <- coef(ddstc) %>% na.omit()
colnames(betas)
topGenes <- head(order(resTC$padj),40)
mat <- betas[topGenes, -c(1)]
thr <- 3
mat[mat < -thr] <- -thr
mat[mat > thr] <- thr
pheatmap(mat, breaks=seq(from=-thr, to=thr, length=101),
         cluster_col=FALSE)

If I understand it correctly, it takes the results of the last resultsName, and identifies the most significant genes, by ordering them with their adjusted pvalue (which is the same in all results, as it's globally Hochberg corrected). Then extracting the coefficients for these genes and plotting it into a heatmap.

But can I run the FDR tool on this result and will it correct the pvalues for all result contrasts? My assumption is that the FDR will correct the pvalue only for the last called results (resTC) and adjust the pvalue based on that and will not take any other contrast into account. Which then in turn would mean, that I have different adjusted pvalues in different contrasts, which then in turn means, that the topgenes I want to extract, are not the real topgenes.

So, can I do a FDR correction on the whole experiment ? Does it make sense to do it?

And I have a second question about the interacting terms:

If I understand correctly the results of our example here:

results(ddsTC , name='Genotypedsr.SA.treatment6h')

this shows the results for my Genotype dsr at 6h compared to the Genotype dsr at 0h, minus the effect on the control at the same time points ?

Is there a way to call the results without reducing it by the control levels ?

at least that's my interpretation of:

# the condition effect for genotype II
# this is, by definition, the main effect *plus* the interaction term
# (the extra condition effect in genotype II compared to genotype I).
results(dds, list( c("condition_B_vs_A","genotypeII.conditionB") ))

Cheer

Jakob

deseq2 fdrtool genomicalignments • 2.6k views

ADD COMMENT • link updated 7.5 years ago by Michael Love 43k • written 7.5 years ago by jakob.petereit ▴ 20

0

Entering edit mode

hi Jakob,

Can you confirm that in the original pvalue histogram the spike at p=1 is not simply genes with very low counts.

Can you try using the filter from the beginning, so before running DESeq():

keep <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 3

You don't show the pvalue distribution for the original LRT. Can you please show this? E.g. hist(results(dds)$pvalue, ...)

ADD REPLY • link 7.5 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

the histogram of the original LRT looks like this:

hist(results(ddstc)$pvalue, col = "lavender", main = 'LRT - original', xlab = "p-values")

part of the spike at p=1 will be from very low count genes.

Do you mean that i should run the filter before using results() ? (I assume sow, cause I cannot get normalized counts before running DESeq())

I tried your filter suggestion with some adjustments:

keep <- rowSums(counts(ddstc, normalized=TRUE) >= 10) >= 3
keep <- which(keep==T) %>% as.data.frame() %>% rownames_to_column() %>% select(rowname)

res <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) %>% filter(row %in% keep$rowname)
hist(res$pvalue, col = "lavender", main = 'LRT - original - manual filter', xlab = "p-values")

It reduces the spike at p=1, cause it will remove all genes, which would have an adjusted pvalue of NA from filtering.

The problematic part is, that a result of it is, that the FDRtool has nothing to correct anymore

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)

And that it again removes induced genes, genes you can normally not find, but in treatment they show up.

I wrote a custom filter that is not removing high variance genes, but still filters for rowsums:

##high pvalue genes
high_pvalue_genes <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) %>% filter(pvalue >0.9) %>% select(row)
## normalized counts
norm_high_pvalue_counts <- counts(ddstc, normalized=T) %>%
  as.data.frame() %>%
  rownames_to_column() %>%
  filter(rowname %in% high_pvalue_genes$row) %>%
  column_to_rownames() %>% 
  rowSums() %>% as.data.frame() %>%
  rownames_to_column() %>% filter(`.` < 100)
`%notin%` = function(x,y) !(x %in% y)
res <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) %>% filter(row %notin% norm_high_pvalue_counts$.)
hist(res$pvalue, col = "lavender", main = 'LRT - original - my filter', xlab = "p-values")

and FDR works again

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)
hist(FDR.res$pval, col = "royalblue4",main = 'LRT - original -  my filter - corrected', xlab = "CORRECTED p-values")

But I don't know if that's a reasonable approach or not.

And back to my original question, is it statistically right, to run FDR on the LRT ?

And question two: can I extract fold changes without having them reduced by the main effect ?

Cheers

Jakob

ADD REPLY • link 7.5 years ago jakob.petereit ▴ 20

score 0 · Answer 1 · 2017-10-10

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

hi,

So my answer to this would be, to not use fdrtool without filtering, because this downstream method requires certain properties of the distribution of p-values.

The very small count genes do not produce smooth, uniform p-values (this was noted in the DESeq2 paper, Supp Fig 26).

You should filter first, if you want to achieve a smooth p-value distribution to then operate on. You can do this like so:

dds <- estimateSizeFactors(dds)
keep <- ...
dds[keep,]
dds <- DESeq(dds, ...)
res <- results(dds)

ADD COMMENT • link 7.5 years ago Michael Love 43k

0

Entering edit mode

I corrected my answer just now.

ADD REPLY • link 7.5 years ago Michael Love 43k

0

Entering edit mode

Hi,

cheers for the help with the filter. Sadly the FDR doesn't work anymore and my significant pvalues are low in number.

ddstc <- DESeqDataSet(se_overrep_plus_universal, ~Genotype + SA.treatment + Genotype:SA.treatment)
ddstc <- estimateSizeFactors(ddstc)
keep <- rowSums(counts(ddstc, normalized=TRUE) >= 10) >= 3
ddstc <- ddstc[keep,]
ddstc <- DESeq(ddstc,  test="LRT", reduced = ~ ~Genotype + SA.treatment)
res <- results(ddstc,independentFiltering = F, cooksCutoff = F, tidy = T) 
hist(res$pvalue, col = "lavender", main = 'LRT - Michael filter', xlab = "p-values")

res <- res[ !is.na(res$padj), ]
res <- res[ !is.na(res$pvalue), ]
res <- res[, -which(names(res) == "padj")]
FDR.res <- fdrtool(res$stat, statistic= "normal", plot = T)
hist(FDR.res$pval, col = "royalblue4",main = 'LRT - Michael filter - corrected', xlab = "CORRECTED p-values")

But I guess I have to live with that.

I am also still struggling a little bit with the resultsNames of the dds, can you confirm, that I interpret them correctly ?:

(I reduces the output for visibility)

> resultsNames(ddstc)[c(3,4,8)]

[1] "Genotype_dsr_vs_jc66" - overall differences between Genotype comp and jc66 (the reference genotype)
[2] "SA.treatment_0.5h_vs_0h" - differences between timepoint 0.5h and 0h in the reference Genotype (jc66)
[3] "Genotypedsr.SA.treatment0.5h" -differences between timepoint 0.5h and 0h in Genotype dsr, minus the response of reference Genotype at the same timepoint

Is there a way to get the output without the reduction of the differences by the reference response? (it would be pretty useful for comparisons excluding the reference genotype) Or do I have to change my reference genotype and run it all again ?

Cheers

J

ADD REPLY • link 7.5 years ago jakob.petereit ▴ 20

0

Entering edit mode

This code above skips the important step of applying the filter ‘keep’

dds <- dds[keep,]

See my code suggestion above.

ADD REPLY • link 7.5 years ago Michael Love 43k

0

Entering edit mode

You are totally right, I corrected the code and pvalue histograms, sadly the result is pretty underachieving.

Do you have any advice for my second question about resultnames and reference genotype ?

Cheers

J

ADD REPLY • link 7.5 years ago jakob.petereit ▴ 20

0

Entering edit mode

Try ~genotype + genotype:treatment

ADD REPLY • link 7.5 years ago Michael Love 43k

0

Entering edit mode

Cheers, that works !!!

Thanks for the advice

ADD REPLY • link 7.5 years ago jakob.petereit ▴ 20