Hello there,
I am working on DE analysis on two groups of total reads count from a RNAseq data using edgeR. I am not super experienced in edgeR, but I have used this same R script on some other dataset previously and that worked fine, but this time for some reason after I filtered out the non significantly differentially expressed genes (DE=0) in the edgeR output file, there are still 100+ genes (whose DE= 1 or -1) having FDR values > 0.05. I double checked the script and compared with the edgeR manual, but just could not figure out how that happened. Can anybody please help me?? Thank you so much.
Below is my R script:
x <- read.delim("~/Desktop/miRNA275 project/RNAseq/TotalReads.txt", sep = "\t",row.names = 1)
head (x)
group <- factor(c("3xant", "3xant", "3xant","Scr", "Scr", "Scr"))
group <- relevel(group,ref = "Scr")
y <- DGEList(counts=x,group=group)
keep <- rowSums(cpm(y)>1) >= 2
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
y$samples
design <- model.matrix(~group)
y <- estimateDisp(y,design)
fit <- glmFit(y,design)
lrt <- glmLRT(fit,coef = 2)
topTags(lrt)
topGenes <- topTags(lrt, n=Inf)
summary(de <- decideTests(lrt))
logcpm <- cpm(y, prior.count = 2, log = TRUE)
DE_data <- topGenes$table
counts <- fit$counts
countsdf <- as.data.frame(counts)
data_combined <- merge(DE_data, countsdf, by="row.names")
gene_names <- subset(x, keep)
data_combined$description <- gene_names$Description
data_combined$de <- de
write.csv(data_combined, file = "~/Desktop/EdgeRResult.csv")
The edgeR part of your script looks fine, but then the script does strange things that don't seem to make any sense. The file you have created is not a list of DE genes.
What are you trying to do? Why don't you just use the topTags table from edgeR?
Thanks for your quick reply Gordon!
I was trying to combine the original read count file with the edgeR output data, so I can look at the read counts under each gene in different samples. I also wanted to have an additional column called "DE". if DE=1, then it's up regulated; if DE=-1, then it's down regulated; if DE=0, it's not significantly different.
My problem was that in the final output there are some genes FDR > 0.05 but was categorized under DE= 1 or - ,which should not happen. I'm not sure what went wrong.