Dataset imbalance
1
0
Entering edit mode
@14ef1b09
Last seen 8 weeks ago
Egypt

Enter the body of text here How to solve the problem of dataset imbalance? I have 82 cancer samples and 390 control samples and limma gives different results when I randomly select an equivalent number of control samples ? I read that imbalance sample size has a significant effect on identifying differential expressed genes. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-7-S4-S8#Sec9

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )
RNASeq DifferentialExpression RnaSeqSampleSizeData RnaSeqSampleSize • 584 views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

There is no problem with unequal sample sizes. As long as you have enough cancer samples and enough control samples to be representative, the fact that one sample size is greater than the other does not bias the DE results at all.

With a large human RNA-seq study there are lots of things that one needs to give attention to, like outliers and variations in sample quality or batch effects, but unequal sample sizes is not one of them.

I do not agree with the conclusions of the paper you cite. The paper doesn't compare to limma anyway. The paper seems concerned about unequal variances between the groups but limma can easily handle that through the use of empirical quality weights.

ADD COMMENT

Login before adding your answer.

Traffic: 637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6