Are unequal sample sizes in differential gene expression (DGE) analysis a problem for edgeR, DESeq2, and NOISeq? Could you give me some advice on how to address this issue?
1
0
Entering edit mode
Jonathan • 0
@864c7192
Last seen 4 weeks ago
Brazil

I am facing an issue with the design of my experiment. I am performing DEG analyses on my RNA-seq samples as part of my PhD research. I am studying and aiming to evaluate the differences in gene expression in cribriform prostate cancer compared to non-cribriform prostate cancer.

The problem I am encountering is that my control group (non-cribriform prostate cancer) has 205 samples, whereas my treatment group (cribriform prostate cancer) has only 65 samples. I understand that this imbalance can affect the performance of the methods, but I would like some suggestions on how to adjust the control group without biasing my data.

Could I randomly select 65 samples from the control group? Or could I use a methodology to cluster the count data from the control samples and choose representatives from each cluster to reduce this discrepancy? These ideas have already crossed my mind.

RNA RNASeqPower DEGseq • 241 views
ADD COMMENT
0
Entering edit mode

I also tried filtering by another clinical variable, but the problem is that this significantly reduces the sample size of my treatment group (cribriform), from 65 to half. Since it is my limiting group, I do not want to lose samples from it. That is why I am considering only reducing the control group. I look forward to your suggestions. Thank you!

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 4 hours ago
WEHI, Melbourne, Australia

The imbalance in sample sizes doesn't effect the performance of any of the methods. It is not a problem. You should not be downsampling -- that's just throwing away data for no reason.

ADD COMMENT

Login before adding your answer.

Traffic: 897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6