Cannot figure out why the DE genes' FDR > 0.05 in edgeR output
1
0
Entering edit mode
ylqqmm • 0
@ylqqmm-22708
Last seen 4.9 years ago

Hello there,

I am working on DE analysis on two groups of total reads count from a RNAseq data using edgeR. I am not super experienced in edgeR, but I have used this same R script on some other dataset previously and that worked fine, but this time for some reason after I filtered out the non significantly differentially expressed genes (DE=0) in the edgeR output file, there are still 100+ genes (whose DE= 1 or -1) having FDR values > 0.05. I double checked the script and compared with the edgeR manual, but just could not figure out how that happened. Can anybody please help me?? Thank you so much.

Below is my R script:

x <- read.delim("~/Desktop/miRNA275 project/RNAseq/TotalReads.txt", sep = "\t",row.names = 1)
head (x)
group <- factor(c("3xant", "3xant", "3xant","Scr", "Scr", "Scr"))
group <- relevel(group,ref = "Scr")
y <- DGEList(counts=x,group=group)
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
y$samples
design <- model.matrix(~group)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef = 2)
topTags(lrt)
topGenes <- topTags(lrt, n=Inf)
summary(de <- decideTests(lrt))
logcpm <- cpm(y, prior.count = 2, log = TRUE)
DE_data <- topGenes$table
counts <- fit$counts
countsdf <- as.data.frame(counts)
data_combined <- merge(DE_data, countsdf, by="row.names")
gene_names <- subset(x, keep)
data_combined$description <- gene_names$Description
data_combined$de <- de
write.csv(data_combined, file = "~/Desktop/EdgeRResult.csv")
edger rnaseq • 823 views
ADD COMMENT
0
Entering edit mode

The edgeR part of your script looks fine, but then the script does strange things that don't seem to make any sense. The file you have created is not a list of DE genes.

What are you trying to do? Why don't you just use the topTags table from edgeR?

ADD REPLY
0
Entering edit mode

Thanks for your quick reply Gordon!

I was trying to combine the original read count file with the edgeR output data, so I can look at the read counts under each gene in different samples. I also wanted to have an additional column called "DE". if DE=1, then it's up regulated; if DE=-1, then it's down regulated; if DE=0, it's not significantly different.

My problem was that in the final output there are some genes FDR > 0.05 but was categorized under DE= 1 or - ,which should not happen. I'm not sure what went wrong.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 4 minutes ago
WEHI, Melbourne, Australia

If you want a file of DE genes with FDR < 0.05, why would you not simply use:

tab <- topTags(lrt, p=0.05)
write.csv(tab, file = "EdgeRResult.csv")

If you want to add raw counts to the output table, just use

topcounts <- y$counts[row.names(tab),]
tab <- data.frame(tab,topcounts)

The reason why your code doesn't work is because you have sorted the topTable results by p-values but you have left the decideTests unsorted. When you put them together, they naturally don't match up.

To my mind, your code is much more complex than it needs to be. For example, if you wanted to assemble the edgeR DE results for all genes including the counts but without sorting you might use:

tab <- topTags(lrt,n=Inf,sort="none")
de <- decideTests(lrt)
taball <- data.frame(tab,DE=de,y$counts)

If you subsequently wanted a sorted list, you could use

o <- order(taball$PValue)
taball <- taball[o,]

There is no need to ever use complex functions like merge.

ADD COMMENT

Login before adding your answer.

Traffic: 541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6