Question

Question about testing main factors and doing FDR correction in a multiple-factor design in DESeq2.

0

Entering edit mode

marieke • 0

@marieke-12331

Last seen 7.2 years ago

Germany

Hi all,

I have a question about testing main effects in a multiple-factor design with interaction term and another one on how to use the fdrtool. I have used it in the following way and I would like to know if this is correct or if it should be done differently.

In my case, I have RNAseq data from an experiment in which I reciprocally infected hosts with parasites from two sources. The design is full rank and I have three replicates per experimental host-parasite combination.

dds <- DESeqDataSetFromMatrix(countData = countData,

                              colData = colData,

                              design = ~ Host_origin + Parasite_origin + Host_origin:Parasite_origin)

dds <- DESeq(dds)


res <- results(dds, pAdjustMethod = "BH")

res

summary(res)

resultsNames(dds)


Host = results(dds, contrast=c("Host_origin","1","2"))


Host2 <- Host[ !is.na(Host$padj), ]

Host2 <- Host2[, -which(names(Host2) == "padj")]

FDR.Host2 <- fdrtool(Host2$stat, statistic = "normal", plot = T)

FDR.Host2$param[1, "sd"]

Host2[,"padj"]  <- p.adjust(FDR.Host2$pval, method = "BH")

par(mfrow=c(1,2))

hist(Host$pvalue, col="lavender")

hist(FDR.Host2$pval, col = "royalblue4",

     main = "WT vs Deletion, correct null model", xlab = "CORRECTED p-values")


res_Host_sig = subset(Host2, padj<0.05 & abs(log2FoldChange)>=1.5)

I repeated the FDR correction steps for the main factor Parasite_origin and for the interaction term Host_origin:Parasite_origin. In each of the cases, the histogram with corrected p-values looks better.

I am interested in which contigs are up/down regulated for the main factors, host origin and parasite origin. In an ANOVA, if the interaction term is not significant, one would leave out the interaction term and test the new model without interaction. Does the results function in DESeq2 automatically do that? If not, is it possible to say anything about significantly up/down-regulated contigs depending on the main factors in a multiple factor design, and how?

My second question is about the FDR correction that I have done. Is it fine to do an FDR correction on each of the result contrasts or should an FDR correction be done only once for the overall dataset, after which the results should be extracted? If the FDR correction should only be done once (overall), how can I get the wanted contrasts for the different main effects and interaction?

Best regards,
Marieke

deseq2 multiple factor design FDR fdrtool • 1.5k views

ADD COMMENT • link updated 7.2 years ago by Michael Love 41k • written 7.2 years ago by marieke • 0

score 1 · Accepted Answer · 2017-02-10

I can answer some of the DESeq2 questions:

"Does the results function in DESeq2 automatically do that?"

No, DESeq() only performs the Wald test or LRT you specify. It doesn't do sequential nested LRTs like the anova() function.

"If not, is it possible to say anything about significantly up/down-regulated contigs depending on the main factors in a multiple factor design, and how?"

You can look at the Wald test for the main effects. If the interaction is not significantly different than 0, but the main effects show evidence of the alternative, then that tells you something. You can simply use the 'name' argument of results().

My answer regarding whether to do multiple test correction for each contrast or over all tests from all contrasts is in the devel version of the vignette:

https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#can-i-run-deseq2-to-contrast-the-levels-of-100-groups

To expand, I think if you are going to build many results tables, but only present results from a subset of those contrasts (e.g. the ones that had significant gene lists) then you certainly need to do multiple test correction over all tests from all results tables. However, for a small number of contrasts, I think doing multiple test correction only within each contrast makes sense, as long as it's clear that the FDR sets are defined among tests within each contrast, and that you report all the contrasts that were tested. For me, the important thing is correct interpretation of the FDR sets you report, and then reporting when a test was performed.