Hello,
I have 9 samples of RNAseq data from a non-model organism (a type of fish, not zebrafish or medaka). Of these 9 samples, 4 are control and 5 are treated, all male. I have gone through the full edgeR differential expression analysis pipeline and the full Ballgown pipeline as well but am running into the same issue. I will be focusing on edgeR with my question and code.
My issue is that my analysis does not give me any Differentially Expressed Genes (DEGs) with an FDR below 0.05 if I use GLMQLFit. Two genes have an FDR of 0.058 but the rest have an FDR of 0.25 or higher. When I go through the pipeline exactTest and estimateDisp, it gives me only four DEGs with an FDR of <0.05. I will input my code for both below, with the results of each command, as I am completely lost on what is wrong.
Please help, I am not sure on how to proceed and whether to just conclude that there is almost no differential expression between the control and treated fish (highly unexpected results). My apologies for linking the images of the code outputs, I was unable to embed several images into this post.
library(BiocManager)
library(edgeR)
genes = read.csv("Tgene_count_matrix.csv", row.names = 1)
group = factor(c("C", "R", "C", "R", "C", "R", "R", "C", "R"))
# C is control and R is treated, I listed them based on the row names of my gene count matrix file
#the code and results below are for estimateDisp and exactTest
listDGE = DGEList(counts = genes, group = group)
keep <- filterByExpr(listDGE, group = group)
listDGE_filt <- listDGE[keep, , keep.lib.sizes=FALSE]
listDGE_filtnorm <- calcNormFactors(object = listDGE_filt)
DGE_fnd <- estimateDisp(listDGE_filtnorm, verbose=TRUE, robust=TRUE)
#this gives me the dispersion values in the image below:
#https://imgur.com/tMqLLYd
plotBCV(DGE_fnd)
#https://imgur.com/oD88scr
plotMDS(DGE_fnd, method="bcv", col=as.numeric(DGE_fnd$samples$group))
legend("bottomleft", as.character(unique(DGE_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/Rh7bPX7
plotMDS(DGE_fnd, method="logFC", col=as.numeric(DGE_fnd$samples$group))
legend("bottomleft", as.character(unique(DGE_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/zhJXO3v
DEGtest <- exactTest(DGE_fnd)
topDGEs <- topTags(DEGtest, n=100000, adjust.method="BH", sort.by="logFC", p.value=0.05)
head(topDGEs)
#the code and results below are what happens if I use GLMQLFit
qlf_fnd = estimateGLMRobustDisp(listDGE_filtnorm)
#https://imgur.com/pOj2Xgu
plotBCV(qlf_fnd)
#https://imgur.com/x2cBrK7
plotMDS(qlf_fnd, method="bcv", col=as.numeric(qlf_fnd$samples$group))
legend("bottomleft", as.character(unique(qlf_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/fU1p0pw
plotMDS(qlf_fnd, method="logFC", col=as.numeric(qlf_fnd$samples$group))
legend("bottomleft", as.character(unique(qlf_fnd$samples$group)), col=1:3, pch=20)
#https://imgur.com/qNjwkRN
fit <- glmQLFit(qlf_fnd2, robust=TRUE)
plotQLDisp(fit)
#https://imgur.com/fJjI86d
qlf <- glmQLFTest(fit,coef=2)
topTags(qlf)
qlfDGEs <- topTags(qlf, n=100000, adjust.method="BH", sort.by="PValue", p.value=0.05)
head(qlfDGEs)
Thank you so much for the response! That is one of my main concerns, I am wondering whether I am doing something incorrectly and need to further filter (or something like that). I think it is very strange that there are is no separation between the treated and control samples, as this was really not expected (the treatment group was treated chronically with a pharmaceutical), so I was considering the possibility that my MDS and BCV plots may be indicating that there's a mass of outlier genes, which would be inflating the variability of the dispersions? What are your thoughts on this?
The MDS plot show sample A45cm to be a massive outlier, and A23cm is also far removed from the other control samples. There is no consistency between replicates, which is a prerequisite for any DE analysis. Have you explored what is unusual about A45cm?
There is certainly a mass of hugely variable genes, but the problems appear more widespread than that. Data problems like this can't be solved by changing the DE testing method.
Ok, that is very helpful! I am a beginner with differential expression analysis so it is great to get a second opinion on what is going on. Thank you so much and I will definitely be looking more into these issues with my samples!