Question: DESeq2 - controlling for block vs analysing blocks separately and lfcShrink changes the DE genes drastically
0
4 weeks ago by
rmash0
rmash0 wrote:

I have an experiment which I did in blocks (week 1 vs week 2) and 4 groups (A, B, C and D).

My understanding is that I can do the analysis controlling for block by doing this.

dds_results <- DESeqDataSetFromMatrix(countData = genes.matrix,
colData = sample_info,
design = ~ block + group)


I can then extract results of the contrast of interest:

results.AB <- results(dds_results, contrast=c("group", "A", "B"),
alpha=0.05,
lfcThreshold = lfc)


Which gives me some number of up- and down-regulated DE genes (lets say 50)

Problem 1 I used lfcShrink to get shrunken log2foldChanges but when I do the summary() it seems to indicate far fewer genes are significant now.

 results.AB  <- lfcShrink(dds_results, contrast=c("group", "A", "B"),
alpha=0.05,
lfcThreshold = lfc)


The p-values between results() and lfcShrink() are not the same -- shouldn't the p-values stay the same?

Does this mean that I need to adjust type=? but I get an error that type can only be adjusted for coeff.

Problem 2 I wanted to see how consistent blocks were but I reran the above splitting the data by block and essentially do the same as above and the 2 blocks yields drastically different numbers, though there is a high degree of gene overlap.

block 1: 60 DE genes block 2: 25 DE genes

I did some descriptives on the data and both look the same, maybe overall levels of expression are lower in block 2.

Why would this occur? And what is the best way to control for block?

deseq2 • 62 views
modified 4 weeks ago by Michael Love25k • written 4 weeks ago by rmash0
Answer: DESeq2 - controlling for block vs analysing blocks separately and lfcShrink chan
0
4 weeks ago by
Michael Love25k
United States
Michael Love25k wrote:

When you use the threshold on the shrunken LFC it computes different pvalues — because the LFC are not the same. Note that we recommend using “apeglm” or “ashr” now, see the vignette section on LFC shrinkage.

It sounds like you ran an analysis on subsets of data which means reducing the power. I’m not surprised you don’t get the same list of genes across subsets of the data. I would focus on ranks of shrunken LFC to examine stability across subsets.

1) So which p-values should I be going by? This tutorial had me confused.

"NOTE: Shrinking the log2 fold changes will not change the total number of genes that are identified as significantly differentially expressed. The shrinkage of fold change is to help with downstream assessment of results. For example, if you wanted to subset your significant genes based on fold change for further evaluation, you may want to use shruken values. "

2) how do you suggest looking at ranks?

I would recommend either the p-values produced by results(), or using s-values associated with type="apeglm" or "ashr". These are different analyses, with different statistical objectives though. One is focused on null hypothesis testing and one is focused on bounding the probability of wrong sign estimation. Most users use the p-values and FDR bounds from results(), probably because the community is most familiar with this type of error control.

That tutorial above isn't using LFC thresholds. That's the key difference.

You don't have to look at ranks, I was just saying it would give you a better sense to think about whether the top genes in terms of LFC are preserved. But I'm not saying this is an analysis you have to do, and I don't have time to provide code for you to perform such an analysis.

Ok that makes more sense re thresholds.

Re: rankings I was just curious what your inclination would be -- as one could do it in a simple way (literally ranking and looking at if there is overlap) or doing something more heavy duty (rank-rank hypogeometric mean testing) across the 2 blocks.

Thanks again.