Question

Are unequal sample sizes in differential gene expression (DGE) analysis a problem for edgeR, DESeq2, and NOISeq? Could you give me some advice on how to address this issue?

0

Entering edit mode

Jonathan • 0

@864c7192

Last seen 12 months ago

Brazil

I am facing an issue with the design of my experiment. I am performing DEG analyses on my RNA-seq samples as part of my PhD research. I am studying and aiming to evaluate the differences in gene expression in cribriform prostate cancer compared to non-cribriform prostate cancer.

The problem I am encountering is that my control group (non-cribriform prostate cancer) has 205 samples, whereas my treatment group (cribriform prostate cancer) has only 65 samples. I understand that this imbalance can affect the performance of the methods, but I would like some suggestions on how to adjust the control group without biasing my data.

Could I randomly select 65 samples from the control group? Or could I use a methodology to cluster the count data from the control samples and choose representatives from each cluster to reduce this discrepancy? These ideas have already crossed my mind.

RNA RNASeqPower DEGseq • 573 views

ADD COMMENT • link written 12 months ago by Jonathan • 0

0

Entering edit mode

I also tried filtering by another clinical variable, but the problem is that this significantly reduces the sample size of my treatment group (cribriform), from 65 to half. Since it is my limiting group, I do not want to lose samples from it. That is why I am considering only reducing the control group. I look forward to your suggestions. Thank you!

ADD REPLY • link 12 months ago Jonathan • 0

score 0 · Answer 1 · 2024-11-19

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

The imbalance in sample sizes doesn't effect the performance of any of the methods. It is not a problem. You should not be downsampling -- that's just throwing away data for no reason.

ADD COMMENT • link 12 months ago Gordon Smyth 53k