Question: DEseq2: any problem with unbalanced number of sample in normal/tumor study?
gravatar for bharata1803
4 months ago by
bharata180320 wrote:


I have downloaded TCGA datasets (htseq count file) for several cancer disease. I realized that each dataset has large number of tumor sample but not the normal sample. For example only 60 samples normal and up to ~500 or more tumor samples. Will this unbalance sample cause any problem if I use DEseq2 to get the differentially expressed gene profile? Thank you veru much.

ADD COMMENTlink modified 4 months ago by Michael Love16k • written 4 months ago by bharata180320

I don't believe there will be any major problem due to imbalance; I'd be more worried about lack of matched tumour:normal samples (seems unlikely that they've taken 9 tumour samples from each patient providing a normal), but that's the nature of public clinical data.

ADD REPLYlink written 4 months ago by Gavin Kelly550
gravatar for Michael Love
4 months ago by
Michael Love16k
United States
Michael Love16k wrote:

It's not a problem for DESeq2 to have unbalanced sample sizes.

Note that with more than 100 samples per group, there is a substantial speed-up from using a linear model, such as limma-voom, instead of a generalized linear model. I tend to use limma when I have hundreds of samples per group.

ADD COMMENTlink written 4 months ago by Michael Love16k

I am not in a hurry and my computer is quite good. For almost 600 samples, it took around 1 hour so I think no problem. As for getting the log transform of read count for expression level from the sample, maybe it will take really long time. In this post : DESeq2 rlog function takes too long I have asked this problem and you gave some tweak. I tried that code long time ago and had some increase in speed. I will try that again now. Thank you.

ADD REPLYlink written 3 months ago by bharata180320

That tweak is now a fully supported function (I'll make a note on that post):

vsd <- vst(dds, blind=FALSE)
ADD REPLYlink written 3 months ago by Michael Love16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 105 users visited in the last hour