edgeR Help: Getting no DGEs with an FDR < 0.05
1
0
Entering edit mode
Rebekah • 0
@df254527
Last seen 13 months ago
Canada

Hello,

I have 9 samples of RNAseq data from a non-model organism (a type of fish, not zebrafish or medaka). Of these 9 samples, 4 are control and 5 are treated, all male. I have gone through the full edgeR differential expression analysis pipeline and the full Ballgown pipeline as well but am running into the same issue. I will be focusing on edgeR with my question and code.

My issue is that my analysis does not give me any Differentially Expressed Genes (DEGs) with an FDR below 0.05 if I use GLMQLFit. Two genes have an FDR of 0.058 but the rest have an FDR of 0.25 or higher. When I go through the pipeline exactTest and estimateDisp, it gives me only four DEGs with an FDR of <0.05. I will input my code for both below, with the results of each command, as I am completely lost on what is wrong.

Please help, I am not sure on how to proceed and whether to just conclude that there is almost no differential expression between the control and treated fish (highly unexpected results). My apologies for linking the images of the code outputs, I was unable to embed several images into this post.


library(BiocManager)
library(edgeR)

genes = read.csv("Tgene_count_matrix.csv", row.names = 1)
group = factor(c("C", "R", "C", "R", "C", "R", "R", "C", "R"))
# C is control and R is treated, I listed them based on the row names of my gene count matrix file

#the code and results below are for estimateDisp and exactTest
listDGE = DGEList(counts = genes, group = group)
keep <- filterByExpr(listDGE, group = group)
listDGE_filt <- listDGE[keep, , keep.lib.sizes=FALSE]
listDGE_filtnorm <- calcNormFactors(object = listDGE_filt)
DGE_fnd <- estimateDisp(listDGE_filtnorm, verbose=TRUE, robust=TRUE)
#this gives me the dispersion values in the image below:
#https://imgur.com/tMqLLYd

plotBCV(DGE_fnd)
#https://imgur.com/oD88scr

plotMDS(DGE_fnd, method="bcv", col=as.numeric(DGE_fnd$samples$group))
legend("bottomleft", as.character(unique(DGE_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/Rh7bPX7

plotMDS(DGE_fnd, method="logFC", col=as.numeric(DGE_fnd$samples$group))
legend("bottomleft", as.character(unique(DGE_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/zhJXO3v

DEGtest <- exactTest(DGE_fnd)
topDGEs <- topTags(DEGtest, n=100000, adjust.method="BH", sort.by="logFC", p.value=0.05)
head(topDGEs)



#the code and results below are what happens if I use GLMQLFit

qlf_fnd = estimateGLMRobustDisp(listDGE_filtnorm)
#https://imgur.com/pOj2Xgu

plotBCV(qlf_fnd)
#https://imgur.com/x2cBrK7

plotMDS(qlf_fnd, method="bcv", col=as.numeric(qlf_fnd$samples$group))
legend("bottomleft", as.character(unique(qlf_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/fU1p0pw

plotMDS(qlf_fnd, method="logFC", col=as.numeric(qlf_fnd$samples$group))
legend("bottomleft", as.character(unique(qlf_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/qNjwkRN

fit <- glmQLFit(qlf_fnd2, robust=TRUE)
plotQLDisp(fit)
#https://imgur.com/fJjI86d

qlf <- glmQLFTest(fit,coef=2)
topTags(qlf)
qlfDGEs <- topTags(qlf, n=100000, adjust.method="BH", sort.by="PValue", p.value=0.05)
head(qlfDGEs)
edgeR RNASeq RNASeqData Bioconductor • 1.3k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 1 minute ago
WEHI, Melbourne, Australia

Getting no significantly DE genes is not an error. One of the key design features of edgeR is that it should give DE results only when there is good reproducible evidence of differences.

The MDS plots for your data show no separation at all between treated and control samples, so getting no DE is the correct result.

ADD COMMENT
0
Entering edit mode

Thank you so much for the response! That is one of my main concerns, I am wondering whether I am doing something incorrectly and need to further filter (or something like that). I think it is very strange that there are is no separation between the treated and control samples, as this was really not expected (the treatment group was treated chronically with a pharmaceutical), so I was considering the possibility that my MDS and BCV plots may be indicating that there's a mass of outlier genes, which would be inflating the variability of the dispersions? What are your thoughts on this?

ADD REPLY
1
Entering edit mode

The MDS plot show sample A45cm to be a massive outlier, and A23cm is also far removed from the other control samples. There is no consistency between replicates, which is a prerequisite for any DE analysis. Have you explored what is unusual about A45cm?

There is certainly a mass of hugely variable genes, but the problems appear more widespread than that. Data problems like this can't be solved by changing the DE testing method.

ADD REPLY
0
Entering edit mode

Ok, that is very helpful! I am a beginner with differential expression analysis so it is great to get a second opinion on what is going on. Thank you so much and I will definitely be looking more into these issues with my samples!

ADD REPLY

Login before adding your answer.

Traffic: 378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6