Hi,
I am trying to find DE genes using two groups (4 biological samples with two different types of tissues) using RNAseq. I obtained a raw gene count table from FeatureCount. I am using both glmLRT and glmQF.
glmLRT identified ~600 DE genes, while glmQF identified only 3. What was also surprising was that there was a big difference in FDR values; FDR values obtained from the glmLRT method were around ~ 4.71E-10, while those from glmQF were around 0.05.
I am aware that glmQF is a lot more conservative than glmLRT so it is understandable that I found a fewer DE genes using glmQF, but I am surprised by the degree of differences, both by the number DE genes and FDR values.
Does anyone happen to know what might have caused these differences?
I have attached my code below, as well as the plots produced by plotBCV() and plotQLDisp().
Many thanks in advance,
Miyako
#################
> x <- DGEList(counts=countData, group=colData$Tissue_Type)
# Filter out genes with no count/lowly expressed genes
> cpm <- cpm(x)
> table(rowSums(x$counts==0)==8) # As I have 8 samples
> keep.exprs <- rowSums(cpm>1)>=2
> x <- x[keep.exprs,]
> dim(x)
[1] 19791 8
> des <- factor(colData$Tissue_Type)
> design <- model.matrix(~0+des)
> colnames(design) <- levels(des)
> design
phloem xylem
1 1 0
2 0 1
3 1 0
4 0 1
5 1 0
6 0 1
7 1 0
8 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$des
[1] "contr.treatment"
> contr.matrix <- makeContrasts(PholemvsXylem = phloem-xylem,levels = colnames(design))
> contr.matrix
Contrasts
Levels PholemvsXylem
phloem 1
xylem -1
> x <- calcNormFactors(x)
> x <- estimateDisp(x, design, robust=TRUE)
plotBCV(x)
# glmLRT Method
> fit <- glmFit(x, design, robust=TRUE)
> lrt <- glmLRT(fit, contrast=contr.matrix)
> significant_glm <- topTags(lrt, n=Inf, p.value=0.05) # FDR ~ 4.71E-10
> nrow(significant_glm)
[1] 598
# glmQL Method
> fit <- glmQLFit(x, design, robust=TRUE)
> plotQLDisp(fit)
> qlf <- glmQLFTest(fit, contrast=contr.matrix)
> significant_glmQL <- topTags(qlf, n=Inf, p.value=0.05) # FDR ~ 0.04156011
> nrow(significant_glmQL)
[1] 3
##################
Please use a separate tag for edgeR, otherwise the maintainers won't be notified.
I just corrected it - thank you for pointing that out :)