Error in estimateGLMCommonDisp and glmFit
3
0
Entering edit mode
andrluis • 0
@andrluis-11196
Last seen 8.0 years ago

 

Dear edgeR authors and users,

I was wondering if you could help me to figure out a way to solve the following errors when calling the functions:

a)  estimateGLMCommonDisp

Error in .compressOffsets(y, lib.size = lib.size, offset = offset) : 
  offsets should be finite values​

b) glmFit

Error in glmFit.DGEList(dbcan_dge_expressed, design) : 
  No dispersion values found in DGEList object.​

Another thing that I`m finding wrong in my analysis is that the cpm() estimation is too high for my data. I guess this is affetcing the results for the rest of the analysis.

#The commands I used:

My DGEList:

dbcan_dge  <- DGEList(counts = dbcan_all[,1:48], 
                                    group = metagenome_map$FCR, 
                                    genes = rownames(dbcan_all))​

Then, I determined the smallest library size:

dbcan_dge$samples​ 

For this example, 31 is the smallest lilbrary size; however, when:

# I Determine expression level filter, this number jumps to 64516.13 !
func_counts_20LZD2L <- dbcan_dge$counts[, "20LZD2L"] 
cpm_20LZD2L <- cpm(func_counts_20LZD2L)​

# Re-compute the library sizes after filtering
dbcan_dge_expressed$samples$lib.size <- colSums(dbcan_dge_expressed$counts)​

###### Model testing ###########

treatment = as.factor(metagenome_map$FCR) 
treatment = relevel(treatment, ref = "High")

design = model.matrix(~treatment, data=metagenome_map)

dbcan_dge_expressed
dbcan_dge_expressed = calcNormFactors(dbcan_dge_expressed, method="RLE")
dbcan_dge_expressed = estimateGLMCommonDisp(dbcan_dge_expressed, design)

#Here I got the errors described previously for the estimateGLMCommonDisp ()

dbcan_dge_expressed_fit <- glmFit(dbcan_dge_expressed, design)​

#Here I got the errors described previously for ​the glmFit​()

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocInstaller_1.24.0 plyr_1.8.4           edgeR_3.16.3        
[4] limma_3.30.4         reshape2_1.4.2       ggplot2_2.2.0       
[7] dplyr_0.5.0         

loaded via a namespace (and not attached):
 [1] locfit_1.5-9.1   Rcpp_0.12.8      lattice_0.20-34  digest_0.6.10   
 [5] assertthat_0.1   grid_3.3.2       R6_2.2.0         gtable_0.2.0    
 [9] DBI_0.5-1        magrittr_1.5     scales_0.4.1     stringi_1.1.2   
[13] lazyeval_0.2.0   labeling_0.3     tools_3.3.2      stringr_1.1.0   
[17] munsell_0.4.3    colorspace_1.3-1 tibble_1.2      

Thank you very much for your help,

Andre

 

 

 

edger • 5.0k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 20 hours ago
The city by the bay

The first error probably comes about because you have library sizes of zero after filtering, or alternatively, the normalization factors were calculated at some non-finite or zero value (which is also most likely due to the presence of low counts). This means that the effective library sizes become invalid offsets upon log-transformation. The second error comes about because estimateGLMCommonDisp fails from the first error, which means you don't have dispersion estimates to use in GLM fitting.

So, in short, avoid getting library sizes of zero. You don't show what you did to get dbcan_dge_expressed, but either you're filtering too aggressively such that one library doesn't have any counts left, or some of your libraries have failed. Library sizes are usually at least ~1 million for RNA-seq experiments, not 31.

ADD COMMENT
0
Entering edit mode
andrluis • 0
@andrluis-11196
Last seen 8.0 years ago

Hi, Aaron:

Thank you for your insight. The dbcan_dge_expressed is not RNA-seq experiment. Actually, it is comprised of enzymes that I retrieved from the dbcan database. So, this is why the counts in this specific library were so low. In general, the maximum count is not higher than 300. I will double check the filter commands and follow your comments to try fixing this problem.

Thank you again, Aaron.

Andre

ADD COMMENT
1
Entering edit mode

The documentation in the edgeR user's guide and elsewhere is written under the assumption that the counts are those of reads in an RNA-seq experiment (or, at least, a genomics experiment). If this is not the case, I can't confidently say whether your analysis is appropriate or not. For example, the counts might follow a distribution that is clearly not negative binomial, or various assumptions in calcNormFactors might not be valid. In short, you had better know what you're doing.

ADD REPLY
0
Entering edit mode
andrluis • 0
@andrluis-11196
Last seen 8.0 years ago

Hi, Aaron:

I improved the filter and also changed the method of the calcNormFactors() from "RLE" to "upperquartile", removing counts of zeros in all libraries.

Now everything is working fine as expected.

Thank you again for your help,

Andre 

ADD COMMENT

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6