edgeR - no sequences are differentially expressed

0

Entering edit mode

Idit Buch ▴ 10

@idit-buch-5528

Last seen 9.6 years ago

Dear All, When using edgeR for differential expression of RNA-Sequences, I do not get any differentially expressed sequences (FDR <= 0.1). The R code is attached. Note that we have 5 libraries and 2 groups such that the first three libraries (group 1) are compared against the last two libraries (group 2). The results are: Loading required package: methods Loading required package: limma Calculating library sizes from column totals. initial lib sizes: 125765 256153 462073 445484 773405 initial no. of tags: 379196 reduced lib sizes: 104768 160197 382981 373739 599384 reduced no. of tags: 25749 common dispersion analysis: 0.7386983 down regulated: 0 up regulated: 0 Using grid search to estimate tagwise dispersion. tagwise dispersion analysis: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001001 0.001001 0.503800 0.435100 0.745200 1.299000 down regulated: 0 up regulated: 0 Apparently, all FDR values are 1. Can you please advise how to proceed ? Regards Idit Buch, Ph.D. Senior bioinformatics researcher -------------- next part -------------- nsamples<-5 FDR.cutoff <- 0.1 grpInput<-c(1,1,1,2,2) readsfile<- "inputMatrix" # The 1st row is the sequence reads<-read.table(readsfile,header=FALSE,row.names=1,sep="") library(edgeR) group<-factor((grpInput),exclude=NULL) ngroups<-length(levels(group)) prior.n<- 25/(nsamples-ngroups) data<-DGEList(counts=reads,group=group) cat("initial lib sizes: ",data$samples$lib.size,"\n") cat("initial no. of tags: ",dim(data)[1],"\n") # # filter out very lowly expressed tags, and those which are # expressed in more than half the number of samples # cpm.data <- cpm(data) data <- data[(rowSums(cpm.data > 1)) >= (nsamples/ngroups),] data <- data[(rowSums(data$count > 0)) >= (nsamples/ngroups),] data$samples$lib.size <- colSums(data$counts) cat("reduced lib sizes: ",data$samples$lib.size,"\n") cat("reduced no. of tags: ",dim(data)[1],"\n") data<-calcNormFactors(data) data<-estimateCommonDisp(data) de.com<-exactTest(data) sortedDE.com<-topTagsde.com,n=NROWde.com)) # SUMMARY cat ("common dispersion analysis: ", data$common.dispersion,"\n") downReg <- sumsortedDE.com$table$logFC < 0 & sortedDE.com$table$FDR <= FDR.cutoff) upReg <- sumsortedDE.com$table$logFC > 0 & sortedDE.com$table$FDR <= FDR.cutoff) cat ("down regulated: ",downReg, " up regulated: ",upReg, "\n") #DE tagwise data<-estimateTagwiseDisp(data, prior.n=prior.n, prop.used=0.5, grid.length=500, verbose=TRUE) de.tgw<-exactTest(data) sortedDE.tgw<-topTags(de.tgw,n=NROW(de.tgw)) # SUMMARY cat ("\ntagwise dispersion analysis:\n") summary(data$tagwise.dispersion) tgw.dispersion.summary = summary(data$tagwise.dispersion) downReg <- sum(sortedDE.tgw$table$logFC < 0 & sortedDE.tgw$table$FDR <= FDR.cutoff) upReg <- sum(sortedDE.tgw$table$logFC > 0 & sortedDE.tgw$table$FDR <= FDR.cutoff) cat ("down regulated: ",downReg, " up regulated: ",upReg, "\n")

edgeR edgeR • 960 views

ADD COMMENT • link updated 11.6 years ago by Mark Robinson ▴ 880 • written 11.6 years ago by Idit Buch ▴ 10

0

Entering edit mode

Mark Robinson ▴ 880

@mark-robinson-4908

Last seen 5.5 years ago

Hi Idit, One possibility is that there is no differential expression between your 2 conditions. A few thoughts, based on what you've described: -- Your library sizes are "small", relative to many of today's HTS datasets (higher counts are a driving factor in statistical power for these analyses) -- Your common dispersion is rather high, relative to many datasets I've seen. It's worth checking that replicates are similar to each other (e.g. plotMDS could give some sense of this) and whether normalization should play a role ? try also plotSmear() or maPlot(). -- Given the low total counts, maybe you could be more restrictive in the filtering to pay a lesser multiple testing penalty I hope that offers some possibilities to try. Best, Mark On 04.10.2012, at 13:01, Idit Buch wrote: > Dear All, > > > > When using edgeR for differential expression of RNA-Sequences, I do not get > any differentially expressed sequences (FDR <= 0.1). > > The R code is attached. Note that we have 5 libraries and 2 groups such > that the first three libraries (group 1) > > are compared against the last two libraries (group 2). The results are: > > > > Loading required package: methods > > Loading required package: limma > > Calculating library sizes from column totals. > > initial lib sizes: 125765 256153 462073 445484 773405 > > initial no. of tags: 379196 > > reduced lib sizes: 104768 160197 382981 373739 599384 > > reduced no. of tags: 25749 > > common dispersion analysis: 0.7386983 > > down regulated: 0 up regulated: 0 > > Using grid search to estimate tagwise dispersion. > > > > > > tagwise dispersion analysis: > > Min. 1st Qu. Median Mean 3rd Qu. Max. > > 0.001001 0.001001 0.503800 0.435100 0.745200 1.299000 > > down regulated: 0 up regulated: 0 > > > > Apparently, all FDR values are 1. > > > > Can you please advise how to proceed ? > > > > Regards > > > > Idit Buch, Ph.D. > > Senior bioinformatics researcher > <edger_code.txt>_______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.6 years ago Mark Robinson ▴ 880

Login before adding your answer.