Question

DEseq2: any problem with unbalanced number of sample in normal/tumor study?

0

Entering edit mode

bharata1803 ▴ 60

@bharata1803-7698

Last seen 5.1 years ago

Japan

Hello,

I have downloaded TCGA datasets (htseq count file) for several cancer disease. I realized that each dataset has large number of tumor sample but not the normal sample. For example only 60 samples normal and up to ~500 or more tumor samples. Will this unbalance sample cause any problem if I use DEseq2 to get the differentially expressed gene profile? Thank you veru much.

deseq2 rnaseq tcga • 2.0k views

ADD COMMENT • link updated 6.6 years ago by Michael Love 42k • written 6.6 years ago by bharata1803 ▴ 60

0

Entering edit mode

I don't believe there will be any major problem due to imbalance; I'd be more worried about lack of matched tumour:normal samples (seems unlikely that they've taken 9 tumour samples from each patient providing a normal), but that's the nature of public clinical data.

ADD REPLY • link 6.6 years ago Gavin Kelly ▴ 680

score 0 · Answer 1 · 2017-10-20

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 16 minutes ago

United States

It's not a problem for DESeq2 to have unbalanced sample sizes.

Note that with more than 100 samples per group, there is a substantial speed-up from using a linear model, such as limma-voom, instead of a generalized linear model. I tend to use limma when I have hundreds of samples per group.

ADD COMMENT • link 6.6 years ago Michael Love 42k

0

Entering edit mode

I am not in a hurry and my computer is quite good. For almost 600 samples, it took around 1 hour so I think no problem. As for getting the log transform of read count for expression level from the sample, maybe it will take really long time. In this post : DESeq2 rlog function takes too long I have asked this problem and you gave some tweak. I tried that code long time ago and had some increase in speed. I will try that again now. Thank you.

ADD REPLY • link 6.5 years ago bharata1803 ▴ 60

0

Entering edit mode

That tweak is now a fully supported function (I'll make a note on that post):

vsd <- vst(dds, blind=FALSE)

ADD REPLY • link 6.5 years ago Michael Love 42k