I'm trying to do differential expression on label imbalanced data; my case:control ratio is 2:1. I know that regressions are at the core of DESeq2 machinery and I know regressions have internal machinery for coping with such imbalances. Specifically, you can down-weight observations from the over-represented group to force an equal contribution to the learning. Is observation weighting available to the user in DESeq2?
group 'a' over-represented in learning
glm( gene_i ~ effect , data[,c('a','a','a','a','b','b')] )
group 'a' and 'b' equal representation in learning
glm( gene_i ~ effect , data[,c('a','a','a','a','b','b')] , weight=c(.5,.5,.5,.5,1,1) )
I'm asking because all of my differentially expressed genes have a negative log fold change. I've controlled for rin, rrna, library size/size-factor, and a bunch of other things to no avail. My last best guess is that this is due to a class imbalance.
Thanks in advance,