edgeR - no sequences are differentially expressed
1
0
Entering edit mode
Idit Buch ▴ 10
@idit-buch-5528
Last seen 9.6 years ago
Dear All, When using edgeR for differential expression of RNA-Sequences, I do not get any differentially expressed sequences (FDR <= 0.1). The R code is attached. Note that we have 5 libraries and 2 groups such that the first three libraries (group 1) are compared against the last two libraries (group 2). The results are: Loading required package: methods Loading required package: limma Calculating library sizes from column totals. initial lib sizes: 125765 256153 462073 445484 773405 initial no. of tags: 379196 reduced lib sizes: 104768 160197 382981 373739 599384 reduced no. of tags: 25749 common dispersion analysis: 0.7386983 down regulated: 0 up regulated: 0 Using grid search to estimate tagwise dispersion. tagwise dispersion analysis: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001001 0.001001 0.503800 0.435100 0.745200 1.299000 down regulated: 0 up regulated: 0 Apparently, all FDR values are 1. Can you please advise how to proceed ? Regards Idit Buch, Ph.D. Senior bioinformatics researcher -------------- next part -------------- nsamples<-5 FDR.cutoff <- 0.1 grpInput<-c(1,1,1,2,2) readsfile<- "inputMatrix" # The 1st row is the sequence reads<-read.table(readsfile,header=FALSE,row.names=1,sep="") library(edgeR) group<-factor((grpInput),exclude=NULL) ngroups<-length(levels(group)) prior.n<- 25/(nsamples-ngroups) data<-DGEList(counts=reads,group=group) cat("initial lib sizes: ",data$samples$lib.size,"\n") cat("initial no. of tags: ",dim(data)[1],"\n") # # filter out very lowly expressed tags, and those which are # expressed in more than half the number of samples # cpm.data <- cpm(data) data <- data[(rowSums(cpm.data > 1)) >= (nsamples/ngroups),] data <- data[(rowSums(data$count > 0)) >= (nsamples/ngroups),] data$samples$lib.size <- colSums(data$counts) cat("reduced lib sizes: ",data$samples$lib.size,"\n") cat("reduced no. of tags: ",dim(data)[1],"\n") data<-calcNormFactors(data) data<-estimateCommonDisp(data) de.com<-exactTest(data) sortedDE.com<-topTagsde.com,n=NROWde.com)) # SUMMARY cat ("common dispersion analysis: ", data$common.dispersion,"\n") downReg <- sumsortedDE.com$table$logFC < 0 & sortedDE.com$table$FDR <= FDR.cutoff) upReg <- sumsortedDE.com$table$logFC > 0 & sortedDE.com$table$FDR <= FDR.cutoff) cat ("down regulated: ",downReg, " up regulated: ",upReg, "\n") #DE tagwise data<-estimateTagwiseDisp(data, prior.n=prior.n, prop.used=0.5, grid.length=500, verbose=TRUE) de.tgw<-exactTest(data) sortedDE.tgw<-topTags(de.tgw,n=NROW(de.tgw)) # SUMMARY cat ("\ntagwise dispersion analysis:\n") summary(data$tagwise.dispersion) tgw.dispersion.summary = summary(data$tagwise.dispersion) downReg <- sum(sortedDE.tgw$table$logFC < 0 & sortedDE.tgw$table$FDR <= FDR.cutoff) upReg <- sum(sortedDE.tgw$table$logFC > 0 & sortedDE.tgw$table$FDR <= FDR.cutoff) cat ("down regulated: ",downReg, " up regulated: ",upReg, "\n")
edgeR edgeR • 960 views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 5.5 years ago
Hi Idit, One possibility is that there is no differential expression between your 2 conditions. A few thoughts, based on what you've described: -- Your library sizes are "small", relative to many of today's HTS datasets (higher counts are a driving factor in statistical power for these analyses) -- Your common dispersion is rather high, relative to many datasets I've seen. It's worth checking that replicates are similar to each other (e.g. plotMDS could give some sense of this) and whether normalization should play a role ? try also plotSmear() or maPlot(). -- Given the low total counts, maybe you could be more restrictive in the filtering to pay a lesser multiple testing penalty I hope that offers some possibilities to try. Best, Mark On 04.10.2012, at 13:01, Idit Buch wrote: > Dear All, > > > > When using edgeR for differential expression of RNA-Sequences, I do not get > any differentially expressed sequences (FDR <= 0.1). > > The R code is attached. Note that we have 5 libraries and 2 groups such > that the first three libraries (group 1) > > are compared against the last two libraries (group 2). The results are: > > > > Loading required package: methods > > Loading required package: limma > > Calculating library sizes from column totals. > > initial lib sizes: 125765 256153 462073 445484 773405 > > initial no. of tags: 379196 > > reduced lib sizes: 104768 160197 382981 373739 599384 > > reduced no. of tags: 25749 > > common dispersion analysis: 0.7386983 > > down regulated: 0 up regulated: 0 > > Using grid search to estimate tagwise dispersion. > > > > > > tagwise dispersion analysis: > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 0.001001 0.001001 0.503800 0.435100 0.745200 1.299000 > > down regulated: 0 up regulated: 0 > > > > Apparently, all FDR values are 1. > > > > Can you please advise how to proceed ? > > > > Regards > > > > Idit Buch, Ph.D. > > Senior bioinformatics researcher > <edger_code.txt>_______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6