Question: EdgeR dispersion and FDR
gravatar for dsarbashis
4 months ago by
dsarbashis0 wrote:

Hi All,

         I am analysing RNASeq data from human tissue samples for disease and normal patients. I have in  total 3 types of data: disease1 (n=12), disease2 (n=10) and normal (n=20). Some of my data were sequenced later and I see a clear batch effect. I am using EdgeR and design matrix is: design<-model.matrix(~SeqTag + DiseaseTypes)

My BCV value is 0.2 which suggests around 20% variation between replicates. But using tagwise dispersion no differentially expressed genes were predicted at 0.05/0.1 pvalue for any type. My tagwise dispersion range is 0 to 4.4.

If I used only common dispersion then I am getting around 50 DEgenes. I also see more than 4000 genes are differntially expressed between two batches.

My questions are

(1) Can I used common dispersion only as it is not recommended

(2) What could be the reason that no DE gene predicted although BCV is around 0.2 ?

Any other suggestion. Thanks in advance.


ADD COMMENTlink modified 4 months ago by Aaron Lun17k • written 4 months ago by dsarbashis0
gravatar for Aaron Lun
4 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

The value of the BCV doesn't tell you anything about differential expression, it just tells you the variability between your replicates. A low BCV improves your ability to detect DE, because it's easier to distinguish systematic changes in expression between conditions from random noise between samples. However, detection also depends on the actual presence of DE genes. If there isn't any DE between your conditions, there obviously won't be anything to detect. More realistically, there is likely to be some DE between conditions, but if it's weak, the log-fold changes may be obscured by even a small amount of inter-replicate noise.

And no, don't use the common dispersion. Modelling the mean-dispersion trend is important for the accuracy of the model, and for the edgeR p-values to make statistical sense. Trying to mix and match settings to get the most DE genes tends to increase your false positive rate instead. Sure, you might get a warm and fuzzy feeling when you get a non-empty DE list, but that quickly evaporates when none of them hold up in validation.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Aaron Lun17k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 138 users visited in the last hour