Question

DESEQ2 'results' filter by value for multiple conditions (contrast off?)

0

Entering edit mode

courtney.stairs • 0

@courtneystairs-13043

Last seen 5.1 years ago

EDIT:

I think this post might answer my question?
https://www.biostars.org/p/115685/

Hi there,

I was hoping for some clarification regarding the 'results' function of DESEQ2. I see from the manual it says that if 'contrast' is not specified, it will return 'results' from the last comparison. Is it then fair to say if your dataset is Control, ConditionA, ConditionB and you do not specify the contrast the results will be ConditionB vs Control? Therefore when you filter 'results' based on padj value < 0.05 you will only have the genes with significant padj values from that last comparison (i.e. not the whole dataset)? (Warning, I am really bad at R)

With an example:

I have the following dataset: two treatment conditions ('1h oxygen' and 'NAO') and one control ('Control') each with 4 bioreps.

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design=~treatment)

#relevel against control dds$treatment <- relevel(dds$treatment, ref="Control") dds <- DESeq(dds)

# Make three results, one of each comparison + default 'uncontrast')

res.1hvsC <- results(dds,contrast=c("treatment", "1h_Oxygen", "Control")) res.naovsC <- results(dds,contrast=c("treatment","NAO", "Control")) res.uncontrast <- results(dds)

# Grab the significant genes

res.1hvC.Sig <- res.1hvsC[ which(res.1hvsC$padj < 0.05), ] res.naovC.Sig <- res.naovsC[ which(res.naovsC$padj < 0.05), ] res.uncontrast.sig <- res.uncontrast[ which(res.uncontrast$padj < 0.05), ]

Then I examine these dataframes and find:

> res.1hvC.Sig log2 fold change (MAP): treatment 1h_Oxygen vs Control Wald test p-value: treatment 1h_Oxygen vs Control DataFrame with 3838 rows and 6 columns

> res.naovC.Sig log2 fold change (MAP): treatment NAO vs Control Wald test p-value: treatment NAO vs Control DataFrame with 3017 rows and 6 columns

> res.uncontrast.Sig log2 fold change (MAP): treatment NAO vs Control Wald test p-value: treatment NAO vs Control DataFrame with 3017 rows and 6 columns

As you can see, the DataFrame for the last (default contrast) and specified contrast (NAO v Control) have the same size (and values.. not shown). From this toy example, it makes me think that if I want to look at all genes across both samples that have a significant padj I need to either do the intersection of these two gene lists (res.1hvC.Sig and res.nao.Sig) to have all genes that have padj < 0.05 in each condition or a combined list (with only unique identifiers) for each.

Do you have any recommendations or resource suggestions on how to proceed with this conundrum? Specifically with multiple conditions?

Thank you very much for your time (+ Deseq2!!),
/Courtney

deseq2 results contrast padj • 5.2k views

ADD COMMENT • link updated 6.9 years ago by Michael Love 41k • written 6.9 years ago by courtney.stairs • 0

0

Entering edit mode

To extend on this observation, I redid the analysis with the LRT and found that now each comparison has the same number of values where each gene has the same padj but different fold-change (depending on the condition). So would you recommend when doing a 'two (or more) treatments vs control' experiment that we should use the LRT?

Again, apologies for my ignorance in stats (a crutch wielded shamefully by many biologists) could you clarify some wording for me from the vignette - by 'testing 3 or more levels' does this mean in essence, 3 or more conditions (e.g., control, conditionA, conditionB)?

"The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model."

If so, then I think LRT is the way to go for such experimental designs?

Thanks very much for helping me with this!

Update on code:

dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design=~treatment) dds$treatment <- relevel(dds$treatment, ref="Control")

dds_lrt <- DESeq(dds, test="LRT", reduced=~1)

# get results for each contrast res.1hvsC<-results(dds_lrt,contrast=c("treatment", "1h_Oxygen", "Control")) res.naovsC<-results(dds_lrt,contrast=c("treatment","NAO", "Control")) res.uncontrast <- results(dds_lrt)

# get significant results for each contrast res.1hvC.Sig <- res.1hvsC[ which(res.1hvsC$padj < 0.05), ] res.naovC.Sig <- res.naovsC[ which(res.naovsC$padj < 0.05), ] res.uncontrast.Sig <- res.uncontrast[ which(res.uncontrast$padj < 0.05), ]

> res.1hvC.Sig log2 fold change (MLE): treatment 1h_Oxygen vs Control LRT p-value: '~ treatment' vs '~ 1' DataFrame with 4872 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> SS50377_r0010 89915.71117 -1.361059 0.4515020 8.875474 0.0118226633 0.0215168112

> res.naovC.Sig log2 fold change (MLE): treatment NAO vs Control LRT p-value: '~ treatment' vs '~ 1' DataFrame with 4872 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> SS50377_r0010 89915.71117 -0.6538221 0.4515003 8.875474 0.0118226633 0.0215168112

> res.uncontrast.Sig log2 fold change (MLE): treatment NAO vs Control LRT p-value: '~ treatment' vs '~ 1' DataFrame with 4872 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> SS50377_r0010 89915.71117 -0.6538221 0.4515003 8.875474 0.0118226633 0.0215168112

ADD REPLY • link 6.9 years ago courtney.stairs • 0

score 0 · Answer 1 · 2017-06-13

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

hi Courtney,

Let's start out with what you want to test. What set of genes would you like to report?

ADD COMMENT • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thanks for your reply. I'd like to use this as a method to see genes that are differentially expressed in each condition versus the control condition. I expect some genes to respond similarly to each treatment while others to either be inversely regulated or unchanged. My ultimate goal is to generate a counts-based heatmap of the genes with padj < 0.05 to identify clusters of genes unique and shared between conditions.

I tried reading up a bit about levels and factors and I think I might have a better handle on it now ( http://stattrek.com/statistics/dictionary.aspx?definition=Factor ) Is it right to say that one would use an LRT if you were testing multiple states (levels) of the same treatment (factor)? For example cells treated with 2 concentrations of the same drug and untreated. However, if you are treating cells with 2 different drugs (2 factors), in addition to untreated, it is better to do a Wald test contrasting each drug to untreated.

If my interpretations are correct, the best way to proceed is to do the Wald test for each comparison and get a list of genes that have padj < 0.05 in both of my treatment conditions (i.e. and intersect of the oxygen and NAO-treated). Using this gene list, I can generate the counts-based heatmap (or other heatmaps).

I think I should not use the LRT since between my three samples (control, oxygen and NAO-treated) there are two 'factors' (oxygen and NAO) each with one level (the concentration)?

I really wish I took second-year stats now....

If it is not too much trouble, can I ask for a similar clarification with another experiment? We have data from an organism we performed surgery on and isolated RNA from the head and tail from three time points after surgery (4 bio-replicates each). From this, we have 2 factors (heads and tails) with 3 levels (3 time points after surgery) to compare with control cells (no surgery). For this study, I think it is better to use the LRT for each factor to see time-related DE differences. But can you do a LRT on the whole dataset instead? That is, dds_lrt <- DESeq(dds, test="LRT", reduced=~1) on all head and tail cells across all time points and the control cells?

Thank you very much for your help!

/C

ADD REPLY • link 6.9 years ago courtney.stairs • 0

2

Entering edit mode

hi,

To find genes which change in either condition, you would use a likelihood ratio test. You just need to use ~1 as the reduced formula. so

dds <- DESeq(dds, test="LRT", reduced=~1)
res <- results(dds)

You can then build a heatmap for these genes, to show how the transformed counts look for the significant genes.

I'd recommend using:

vsd <- vst(dds)

Then define the set of genes passing an FDR cutoff:

select <- which(res$padj < .05)

And then use the code for the variance stabilized data here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#heatmap-of-the-count-matrix

For the other study, it's important to know that listing the types of samples in the experiment in not sufficient to decide the design to use in R. You need to specify exactly what you want to test.

Do you just want to compare each group with the control group? In this case you can use the example code at the beginning of the Interactions section:

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions

There are often multiple possible designs for the same set of samples, depending on what you are looking for and how you want to model the samples.

ADD REPLY • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Thank you Michael!

I will think about this more and come back if I have any questions.

/Courtney

ADD REPLY • link 6.9 years ago courtney.stairs • 0