I am analysing RNASeq data from human tissue samples for disease and normal patients. I have in total 3 types of data: disease1 (n=12), disease2 (n=10) and normal (n=20). Some of my data were sequenced later and I see a clear batch effect. I am using EdgeR and design matrix is: design<-model.matrix(~SeqTag + DiseaseTypes)
My BCV value is 0.2 which suggests around 20% variation between replicates. But using tagwise dispersion no differentially expressed genes were predicted at 0.05/0.1 pvalue for any type. My tagwise dispersion range is 0 to 4.4.
If I used only common dispersion then I am getting around 50 DEgenes. I also see more than 4000 genes are differntially expressed between two batches.
My questions are
(1) Can I used common dispersion only as it is not recommended
(2) What could be the reason that no DE gene predicted although BCV is around 0.2 ?
The value of the BCV doesn't tell you anything about differential expression, it just tells you the variability between your replicates. A low BCV improves your ability to detect DE, because it's easier to distinguish systematic changes in expression between conditions from random noise between samples. However, detection also depends on the actual presence of DE genes. If there isn't any DE between your conditions, there obviously won't be anything to detect. More realistically, there is likely to be some DE between conditions, but if it's weak, the log-fold changes may be obscured by even a small amount of inter-replicate noise.
And no, don't use the common dispersion. Modelling the mean-dispersion trend is important for the accuracy of the model, and for the edgeR p-values to make statistical sense. Trying to mix and match settings to get the most DE genes tends to increase your false positive rate instead. Sure, you might get a warm and fuzzy feeling when you get a non-empty DE list, but that quickly evaporates when none of them hold up in validation.