Dear edgeR authors and users,
I was wondering if you could help me to figure out a way to solve the following errors when calling the functions:
a) estimateGLMCommonDisp
Error in .compressOffsets(y, lib.size = lib.size, offset = offset) :
offsets should be finite values
b) glmFit
Error in glmFit.DGEList(dbcan_dge_expressed, design) :
No dispersion values found in DGEList object.
Another thing that I`m finding wrong in my analysis is that the cpm() estimation is too high for my data. I guess this is affetcing the results for the rest of the analysis.
#The commands I used:
My DGEList:
dbcan_dge <- DGEList(counts = dbcan_all[,1:48],
group = metagenome_map$FCR,
genes = rownames(dbcan_all))
Then, I determined the smallest library size:
dbcan_dge$samples
For this example, 31 is the smallest lilbrary size; however, when:
# I Determine expression level filter, this number jumps to 64516.13 !
func_counts_20LZD2L <- dbcan_dge$counts[, "20LZD2L"]
cpm_20LZD2L <- cpm(func_counts_20LZD2L)
# Re-compute the library sizes after filtering
dbcan_dge_expressed$samples$lib.size <- colSums(dbcan_dge_expressed$counts)
###### Model testing ###########
treatment = as.factor(metagenome_map$FCR)
treatment = relevel(treatment, ref = "High")
design = model.matrix(~treatment, data=metagenome_map)
dbcan_dge_expressed
dbcan_dge_expressed = calcNormFactors(dbcan_dge_expressed, method="RLE")
dbcan_dge_expressed = estimateGLMCommonDisp(dbcan_dge_expressed, design)
#Here I got the errors described previously for the estimateGLMCommonDisp ()
dbcan_dge_expressed_fit <- glmFit(dbcan_dge_expressed, design)
#Here I got the errors described previously for the glmFit()
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X El Capitan 10.11.6
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.24.0 plyr_1.8.4 edgeR_3.16.3
[4] limma_3.30.4 reshape2_1.4.2 ggplot2_2.2.0
[7] dplyr_0.5.0
loaded via a namespace (and not attached):
[1] locfit_1.5-9.1 Rcpp_0.12.8 lattice_0.20-34 digest_0.6.10
[5] assertthat_0.1 grid_3.3.2 R6_2.2.0 gtable_0.2.0
[9] DBI_0.5-1 magrittr_1.5 scales_0.4.1 stringi_1.1.2
[13] lazyeval_0.2.0 labeling_0.3 tools_3.3.2 stringr_1.1.0
[17] munsell_0.4.3 colorspace_1.3-1 tibble_1.2
Thank you very much for your help,
Andre
The documentation in the edgeR user's guide and elsewhere is written under the assumption that the counts are those of reads in an RNA-seq experiment (or, at least, a genomics experiment). If this is not the case, I can't confidently say whether your analysis is appropriate or not. For example, the counts might follow a distribution that is clearly not negative binomial, or various assumptions in
calcNormFactors
might not be valid. In short, you had better know what you're doing.