Question

DESeq2 - controlling for block vs analysing blocks separately and lfcShrink changes the DE genes drastically

0

Entering edit mode

rmash ▴ 20

@rmash-21927

Last seen 3.9 years ago

I have an experiment which I did in blocks (week 1 vs week 2) and 4 groups (A, B, C and D).

My understanding is that I can do the analysis controlling for block by doing this.

dds_results <- DESeqDataSetFromMatrix(countData = genes.matrix,
                                    colData = sample_info,
                                    design = ~ block + group)

I can then extract results of the contrast of interest:

results.AB <- results(dds_results, contrast=c("group", "A", "B"), 
                     alpha=0.05, 
                     pAdjustMethod = "BH",
                     lfcThreshold = lfc)

Which gives me some number of up- and down-regulated DE genes (lets say 50)

Problem 1 I used lfcShrink to get shrunken log2foldChanges but when I do the summary() it seems to indicate far fewer genes are significant now.

 results.AB  <- lfcShrink(dds_results, contrast=c("group", "A", "B"), 
                         alpha=0.05, 
                         pAdjustMethod = "BH",
                         lfcThreshold = lfc)

The p-values between results() and lfcShrink() are not the same -- shouldn't the p-values stay the same?

Does this mean that I need to adjust type=? but I get an error that type can only be adjusted for coeff.

Problem 2 I wanted to see how consistent blocks were but I reran the above splitting the data by block and essentially do the same as above and the 2 blocks yields drastically different numbers, though there is a high degree of gene overlap.

block 1: 60 DE genes block 2: 25 DE genes

I did some descriptives on the data and both look the same, maybe overall levels of expression are lower in block 2.

Why would this occur? And what is the best way to control for block?

deseq2 • 897 views

ADD COMMENT • link updated 4.6 years ago by Michael Love 41k • written 4.6 years ago by rmash ▴ 20

score 0 · Answer 1 · 2019-09-18

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

When you use the threshold on the shrunken LFC it computes different pvalues — because the LFC are not the same. Note that we recommend using “apeglm” or “ashr” now, see the vignette section on LFC shrinkage.

It sounds like you ran an analysis on subsets of data which means reducing the power. I’m not surprised you don’t get the same list of genes across subsets of the data. I would focus on ranks of shrunken LFC to examine stability across subsets.

ADD COMMENT • link 4.6 years ago Michael Love 41k

0

Entering edit mode

Thanks for the advice!

1) So which p-values should I be going by? This tutorial had me confused.

"NOTE: Shrinking the log2 fold changes will not change the total number of genes that are identified as significantly differentially expressed. The shrinkage of fold change is to help with downstream assessment of results. For example, if you wanted to subset your significant genes based on fold change for further evaluation, you may want to use shruken values. "

2) how do you suggest looking at ranks?

ADD REPLY • link 4.6 years ago rmash ▴ 20

0

Entering edit mode

I would recommend either the p-values produced by results(), or using s-values associated with type="apeglm" or "ashr". These are different analyses, with different statistical objectives though. One is focused on null hypothesis testing and one is focused on bounding the probability of wrong sign estimation. Most users use the p-values and FDR bounds from results(), probably because the community is most familiar with this type of error control.

That tutorial above isn't using LFC thresholds. That's the key difference.

You don't have to look at ranks, I was just saying it would give you a better sense to think about whether the top genes in terms of LFC are preserved. But I'm not saying this is an analysis you have to do, and I don't have time to provide code for you to perform such an analysis.

ADD REPLY • link 4.6 years ago Michael Love 41k

0

Entering edit mode

Ok that makes more sense re thresholds.

Re: rankings I was just curious what your inclination would be -- as one could do it in a simple way (literally ranking and looking at if there is overlap) or doing something more heavy duty (rank-rank hypogeometric mean testing) across the 2 blocks.

Thanks again.

ADD REPLY • link 4.6 years ago rmash ▴ 20