Question

DESeq: large difference in number of replicates per condition

0

Entering edit mode

krc3004 ▴ 10

@krc3004-12978

Last seen 6.8 years ago

Hi all,

I have a general question about the model estimation used by DESeq. There have been many posts on whether or not DESeq works well with a small number/no replicates, but I'm wondering if it's appropriate to use DESeq for differential analysis across two conditions, where one condition has a small number of replicates (say, 5) and the other has a huge number (in the 100s). The particular phenotype that we are looking at in our (clinical) data is quite rare, but we'd still like to test for differential expression. Usually I would provide a reproducible example but these data are sensitive...

In this case, does it make sense to use DESeq? Would it make sense to, say, randomly sample some of the replicates from the condition with 100s of replicates and run multiple tests? I'm reading the original DESeq2 paper to try to understand how the model is built but any tips would be much appreciated. Thank you!

deseq2 rnaseq differential expression • 1.4k views

ADD COMMENT • link updated 7.4 years ago by Michael Love 43k • written 7.4 years ago by krc3004 ▴ 10

score 1 · Answer 1 · 2017-10-04

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

You shouldn't downsample the condition with 100s of replicates. There is nothing special to do, but you should note that the dispersion estimate will be mostly influenced from the group with more replicates. You can look at plotCounts afterward to verify that the top genes make sense, and are not affected by any artifact.

ADD COMMENT • link 7.4 years ago Michael Love 43k

0

Entering edit mode

Michael, thanks very much for your help! I will proceed without downsampling.

ADD REPLY • link 7.4 years ago krc3004 ▴ 10

0

Entering edit mode

I would add that you should also pay attention to outlier filtering/replacement (see DESeq minReplicatesForReplace) if you expect your large group to be heterogenous. I've seen people run into trouble with that when handling large groups.

ADD REPLY • link 7.4 years ago igor ▴ 50