DEseq2: any problem with unbalanced number of sample in normal/tumor study?
1
0
Entering edit mode
bharata1803 ▴ 60
@bharata1803-7698
Last seen 5.1 years ago
Japan

Hello,

I have downloaded TCGA datasets (htseq count file) for several cancer disease. I realized that each dataset has large number of tumor sample but not the normal sample. For example only 60 samples normal and up to ~500 or more tumor samples. Will this unbalance sample cause any problem if I use DEseq2 to get the differentially expressed gene profile? Thank you veru much.

deseq2 rnaseq tcga • 2.0k views
ADD COMMENT
0
Entering edit mode

I don't believe there will be any major problem due to imbalance; I'd be more worried about lack of matched tumour:normal samples (seems unlikely that they've taken 9 tumour samples from each patient providing a normal), but that's the nature of public clinical data.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

It's not a problem for DESeq2 to have unbalanced sample sizes.

Note that with more than 100 samples per group, there is a substantial speed-up from using a linear model, such as limma-voom, instead of a generalized linear model. I tend to use limma when I have hundreds of samples per group.

ADD COMMENT
0
Entering edit mode

I am not in a hurry and my computer is quite good. For almost 600 samples, it took around 1 hour so I think no problem. As for getting the log transform of read count for expression level from the sample, maybe it will take really long time. In this post : DESeq2 rlog function takes too long I have asked this problem and you gave some tweak. I tried that code long time ago and had some increase in speed. I will try that again now. Thank you.

ADD REPLY
0
Entering edit mode

That tweak is now a fully supported function (I'll make a note on that post):

vsd <- vst(dds, blind=FALSE)
ADD REPLY

Login before adding your answer.

Traffic: 555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6