Question

Error in estimateGLMCommonDisp and glmFit

0

Entering edit mode

andrluis • 0

@andrluis-11196

Last seen 7.4 years ago

Dear edgeR authors and users,

I was wondering if you could help me to figure out a way to solve the following errors when calling the functions:

a) estimateGLMCommonDisp

Error in .compressOffsets(y, lib.size = lib.size, offset = offset) :
offsets should be finite values

b) glmFit

Error in glmFit.DGEList(dbcan_dge_expressed, design) :
No dispersion values found in DGEList object.

Another thing that I`m finding wrong in my analysis is that the cpm() estimation is too high for my data. I guess this is affetcing the results for the rest of the analysis.

#The commands I used:

My DGEList:

dbcan_dge <- DGEList(counts = dbcan_all[,1:48],
group = metagenome_map$FCR,
genes = rownames(dbcan_all))

Then, I determined the smallest library size:

dbcan_dge$samples

For this example, 31 is the smallest lilbrary size; however, when:

# I Determine expression level filter, this number jumps to 64516.13 !
func_counts_20LZD2L <- dbcan_dge$counts[, "20LZD2L"]
cpm_20LZD2L <- cpm(func_counts_20LZD2L)

# Re-compute the library sizes after filtering
dbcan_dge_expressed$samples$lib.size <- colSums(dbcan_dge_expressed$counts)

###### Model testing ###########

treatment = as.factor(metagenome_map$FCR)
treatment = relevel(treatment, ref = "High")

design = model.matrix(~treatment, data=metagenome_map)

dbcan_dge_expressed
dbcan_dge_expressed = calcNormFactors(dbcan_dge_expressed, method="RLE")
dbcan_dge_expressed = estimateGLMCommonDisp(dbcan_dge_expressed, design)

#Here I got the errors described previously for the estimateGLMCommonDisp ()

dbcan_dge_expressed_fit <- glmFit(dbcan_dge_expressed, design)

#Here I got the errors described previously for the glmFit()

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.24.0 plyr_1.8.4 edgeR_3.16.3
[4] limma_3.30.4 reshape2_1.4.2 ggplot2_2.2.0
[7] dplyr_0.5.0

loaded via a namespace (and not attached):
[1] locfit_1.5-9.1 Rcpp_0.12.8 lattice_0.20-34 digest_0.6.10
[5] assertthat_0.1 grid_3.3.2 R6_2.2.0 gtable_0.2.0
[9] DBI_0.5-1 magrittr_1.5 scales_0.4.1 stringi_1.1.2
[13] lazyeval_0.2.0 labeling_0.3 tools_3.3.2 stringr_1.1.0
[17] munsell_0.4.3 colorspace_1.3-1 tibble_1.2

Thank you very much for your help,

Andre

edger • 4.4k views

ADD COMMENT • link 7.4 years ago andrluis • 0

score 1 · Answer 1 · 2016-11-23

The first error probably comes about because you have library sizes of zero after filtering, or alternatively, the normalization factors were calculated at some non-finite or zero value (which is also most likely due to the presence of low counts). This means that the effective library sizes become invalid offsets upon log-transformation. The second error comes about because estimateGLMCommonDisp fails from the first error, which means you don't have dispersion estimates to use in GLM fitting.

So, in short, avoid getting library sizes of zero. You don't show what you did to get dbcan_dge_expressed, but either you're filtering too aggressively such that one library doesn't have any counts left, or some of your libraries have failed. Library sizes are usually at least ~1 million for RNA-seq experiments, not 31.

score 0 · Answer 2 · 2016-11-23

0

Entering edit mode

andrluis • 0

@andrluis-11196

Last seen 7.4 years ago

Hi, Aaron:

Thank you for your insight. The dbcan_dge_expressed is not RNA-seq experiment. Actually, it is comprised of enzymes that I retrieved from the dbcan database. So, this is why the counts in this specific library were so low. In general, the maximum count is not higher than 300. I will double check the filter commands and follow your comments to try fixing this problem.

Thank you again, Aaron.

Andre

ADD COMMENT • link 7.4 years ago andrluis • 0

1

Entering edit mode

The documentation in the edgeR user's guide and elsewhere is written under the assumption that the counts are those of reads in an RNA-seq experiment (or, at least, a genomics experiment). If this is not the case, I can't confidently say whether your analysis is appropriate or not. For example, the counts might follow a distribution that is clearly not negative binomial, or various assumptions in calcNormFactors might not be valid. In short, you had better know what you're doing.

ADD REPLY • link 7.4 years ago Aaron Lun ★ 28k

score 0 · Answer 3 · 2016-11-23

0

Entering edit mode

andrluis • 0

@andrluis-11196

Last seen 7.4 years ago

Hi, Aaron:

I improved the filter and also changed the method of the calcNormFactors() from "RLE" to "upperquartile", removing counts of zeros in all libraries.

Now everything is working fine as expected.

Thank you again for your help,

Andre

ADD COMMENT • link 7.4 years ago andrluis • 0