Question

Cannot figure out why the DE genes' FDR > 0.05 in edgeR output

0

Entering edit mode

ylqqmm • 0

@ylqqmm-22708

Last seen 5.7 years ago

Hello there,

I am working on DE analysis on two groups of total reads count from a RNAseq data using edgeR. I am not super experienced in edgeR, but I have used this same R script on some other dataset previously and that worked fine, but this time for some reason after I filtered out the non significantly differentially expressed genes (DE=0) in the edgeR output file, there are still 100+ genes (whose DE= 1 or -1) having FDR values > 0.05. I double checked the script and compared with the edgeR manual, but just could not figure out how that happened. Can anybody please help me?? Thank you so much.

Below is my R script:

x <- read.delim("~/Desktop/miRNA275 project/RNAseq/TotalReads.txt", sep = "\t",row.names = 1)
head (x)
group <- factor(c("3xant", "3xant", "3xant","Scr", "Scr", "Scr"))
group <- relevel(group,ref = "Scr")
y <- DGEList(counts=x,group=group)
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
y$samples
design <- model.matrix(~group)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef = 2)
topTags(lrt)
topGenes <- topTags(lrt, n=Inf)
summary(de <- decideTests(lrt))
logcpm <- cpm(y, prior.count = 2, log = TRUE)
DE_data <- topGenes$table
counts <- fit$counts
countsdf <- as.data.frame(counts)
data_combined <- merge(DE_data, countsdf, by="row.names")
gene_names <- subset(x, keep)
data_combined$description <- gene_names$Description
data_combined$de <- de
write.csv(data_combined, file = "~/Desktop/EdgeRResult.csv")

edger rnaseq • 1.0k views

ADD COMMENT • link updated 5.7 years ago by Gordon Smyth 53k • written 5.7 years ago by ylqqmm • 0

0

Entering edit mode

The edgeR part of your script looks fine, but then the script does strange things that don't seem to make any sense. The file you have created is not a list of DE genes.

What are you trying to do? Why don't you just use the topTags table from edgeR?

ADD REPLY • link 5.7 years ago Gordon Smyth 53k

0

Entering edit mode

Thanks for your quick reply Gordon!

I was trying to combine the original read count file with the edgeR output data, so I can look at the read counts under each gene in different samples. I also wanted to have an additional column called "DE". if DE=1, then it's up regulated; if DE=-1, then it's down regulated; if DE=0, it's not significantly different.

My problem was that in the final output there are some genes FDR > 0.05 but was categorized under DE= 1 or - ,which should not happen. I'm not sure what went wrong.

ADD REPLY • link 5.7 years ago ylqqmm • 0

score 0 · Answer 1 · 2020-01-14

If you want a file of DE genes with FDR < 0.05, why would you not simply use:

tab <- topTags(lrt, p=0.05)
write.csv(tab, file = "EdgeRResult.csv")

If you want to add raw counts to the output table, just use

topcounts <- y$counts[row.names(tab),]
tab <- data.frame(tab,topcounts)

The reason why your code doesn't work is because you have sorted the topTable results by p-values but you have left the decideTests unsorted. When you put them together, they naturally don't match up.

To my mind, your code is much more complex than it needs to be. For example, if you wanted to assemble the edgeR DE results for all genes including the counts but without sorting you might use:

tab <- topTags(lrt,n=Inf,sort="none")
de <- decideTests(lrt)
taball <- data.frame(tab,DE=de,y$counts)

If you subsequently wanted a sorted list, you could use

o <- order(taball$PValue)
taball <- taball[o,]

There is no need to ever use complex functions like merge.