DESeq2: DE Analysis with very imbalanced samples per condition
1
0
Entering edit mode
thanos5541 • 0
@thanos5541-22407
Last seen 22 months ago

Hello everyone,

My group has been conducting a large scale analysis using TCGA data. I'm using the expression results to identify DE genes following the DESeq2 vignette along with lfcShrink (apeglm). I apply the analysis between healthy and diseased samples for multiple organs.

However the healthy samples for almost every organ are about 1-15% of the diseased samples (eg. 44 healthy vs 525 diseased,130 vs 903 or even 3 vs 309!). I do get results for almost every organ studied, but I am skeptical on the actual statistical significance of said results and the amount of bias introduced by such a big difference in the sample numbers representing each condition.

Should I do something differently in the analysis because of such imbalance in the samples per condition or is such an analysis pointless because of this? Are the results with adjusted p-value < 0.1 still considered significant as indicated by DESeq2? Should I decrease the required adjusted p-value to less then 0.05 or find a formula for the significance cutoff?

I have searched for similar cases online, but I could not find any so extremely imbalanced as ours, which is why I am asking this here. I have read that DESeq2 does not need equal samples per condition to provide significant results, but I am not sure if that covers extreme cases like ours.

Thanks in advance

deseq2 cancer • 266 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

There is nothing to change with a large imbalance in the DESeq2 code.

I will mention that you can also easily rely on nonparametric tests such as Wilcoxon and permutation for FDR computation.

ADD COMMENT
0
Entering edit mode

I see, thank you very much for your quick response!

ADD REPLY

Login before adding your answer.

Traffic: 464 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6