edgeR Warings
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Recently, I use edgeR in R to find DEG, When I loaded my data and estimated the dispersion with estimateCommonDisp, there are always some warnings for my data: estimateCommonDisp(y,verbose=TRUE) Disp = 99.99477 , BCV = 9.9997 There were 50 or more warnings (use warnings() to see the first 50) warnings() Warning messages: 1: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 2: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 3: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 4: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 5: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 6: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 7: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 8: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' 9: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' ??????.. But when I change to estiamteGLMCommonDisp, everything looks good, and here is the results: estimateGLMCommonDisp(y,verbose=TRUE) Disp = 0.69044 , BCV = 0.8309 So, For my data (just have two groups), can I use the estimateGLMCommonDisplay to estimate the dispersion and then use the exactTest to find the DEGs? Thanks and looking forward to your reply. Tongwu Here is my scripts in R: setwd("~/Work/Axeq_RNA_seq/DEG/edgeR/") x<-read.delim("../All_htseqCounts.txt2",head=T,check.names=F,row.names =1) head(x) target<-read.table("../design_type.list",head=T) library(edgeR) ## For R and S ### head(x[,1:17]) y=DGEList(counts=x[,1:17], group=target$Sensitivity[1:17],) dim(y) ## Filtering keep<-rowSums(cpm(y)>1) >=5 y<-y[keep,] dim(y) ## Re-compute the library sizes y$samples$lib.size <- colSums(y$counts) ## Normalizing y<-calcNormFactors(y) y$samples ## Data exploration plotMDS(y) ## Estimating the dispersion y<-estimateGLMCommonDisp(y,verbose=TRUE) ##Disp = 0.69044 , BCV = 0.8309 y <- estimateGLMTrendedDisp(y) y <- estimateGLMTagwiseDisp(y) plotBCV(y) ## Differential expression et<-exactTest(y) top<- topTags(et,n=100) top cpm(y)[rownames(top),] ## total number of DE genes at 5% FDR is given by ## S/R summary(de<-decideTestsDGE(et,adjust.method="BH",p.value=0.05)) ##[,1] ##-1 310 (up-regulaiton in R) ##0 12881 ##1 457 detags<-rownames(y)[as.logical(de)] plotSmear(et,de.tags=detags) abline(h=c(-1,1),col="blue") -- output of sessionInfo(): > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_3.0.6 limma_3.14.3 loaded via a namespace (and not attached): [1] tools_2.15.2 -- Sent via the guest posting facility at bioconductor.org.
edgeR edgeR • 1.5k views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.2 years ago
Hi Tongwu, Right, so that is an absurdly high Dispersion/BCV from estimateCommonDisp(). > So, For my data (just have two groups), can I use the estimateGLMCommonDisplay to estimate the dispersion and then use the exactTest to find the DEGs? Short answer is: I probably wouldn't do that. At least not yet. Let's back up a minute and figure out what is going on -- I guess what is not clear from your code is what your grouping (you say two groups) actually gives. Specifically, what does d$samples look like (i.e. after specifying group=target$Sensitivity[1:17]) ? I notice that you've called estimateGLMCommonDisp() with no design matrix. This means it would treat all 17 samples as replicates and calculate dispersion as if this were a single group. This is probably not what you want. In general, using estimateGLMCommonDisp() with no design matrix in a 2-group situation would tend to overestimate the dispersion, but here it is the opposite, at least relative to estimateCommonDisp() -- I wouldn't expect qCML and Cox-Reid dispersion estimates to be the same, but they should be similar. So, that's strange. I wonder if there are some super highly variable features that kills the qCML estimation? I have not experienced this before, but have you done other checks like an MDS plot ? Looking for highly variable features ? Best regards, Mark On 14.03.2013, at 16:06, Tongwu [guest] <guest at="" bioconductor.org=""> wrote: > > Recently, I use edgeR in R to find DEG, When I loaded my data and estimated the dispersion with estimateCommonDisp, there are always some warnings for my data: > estimateCommonDisp(y,verbose=TRUE) > Disp = 99.99477 , BCV = 9.9997 > There were 50 or more warnings (use warnings() to see the first 50) > warnings() > Warning messages: > 1: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 2: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 3: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 4: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 5: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 6: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 7: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 8: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > 9: In condLogLikDerSize(y, r, der = 0L) : value out of range in 'lgamma' > ??????.. > But when I change to estiamteGLMCommonDisp, everything looks good, and here is the results: > estimateGLMCommonDisp(y,verbose=TRUE) > Disp = 0.69044 , BCV = 0.8309 > So, For my data (just have two groups), can I use the estimateGLMCommonDisplay to estimate the dispersion and then use the exactTest to find the DEGs? > Thanks and looking forward to your reply. > Tongwu > Here is my scripts in R: > setwd("~/Work/Axeq_RNA_seq/DEG/edgeR/") > x<-read.delim("../All_htseqCounts.txt2",head=T,check.names=F,row.nam es=1) > head(x) > target<-read.table("../design_type.list",head=T) > library(edgeR) > ## For R and S ### > head(x[,1:17]) > y=DGEList(counts=x[,1:17], group=target$Sensitivity[1:17],) > dim(y) > ## Filtering > keep<-rowSums(cpm(y)>1) >=5 > y<-y[keep,] > dim(y) > ## Re-compute the library sizes > y$samples$lib.size <- colSums(y$counts) > ## Normalizing > y<-calcNormFactors(y) > y$samples > ## Data exploration > plotMDS(y) > ## Estimating the dispersion > y<-estimateGLMCommonDisp(y,verbose=TRUE) > ##Disp = 0.69044 , BCV = 0.8309 > y <- estimateGLMTrendedDisp(y) > y <- estimateGLMTagwiseDisp(y) > plotBCV(y) > ## Differential expression > et<-exactTest(y) > top<- topTags(et,n=100) > top > cpm(y)[rownames(top),] > ## total number of DE genes at 5% FDR is given by ## S/R > summary(de<-decideTestsDGE(et,adjust.method="BH",p.value=0.05)) > ##[,1] > ##-1 310 (up-regulaiton in R) > ##0 12881 > ##1 457 > detags<-rownames(y)[as.logical(de)] > plotSmear(et,de.tags=detags) > abline(h=c(-1,1),col="blue") > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.15.2 (2012-10-26) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_3.0.6 limma_3.14.3 > > loaded via a namespace (and not attached): > [1] tools_2.15.2 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics, Institute of Molecular Life Sciences University of Zurich http://tiny.cc/mrobin
ADD COMMENT

Login before adding your answer.

Traffic: 323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6