First post here ever! Never thought I'd be confused enough and not find an answer on the internet for my problem, lol. Anyway!
I am using a public dataset of cancer brain samples and want to create a heatmap of sample distances as according to: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#heatmap-of-the-sample-to-sample-distances
My code is the following:
de = DESeqDataSetFromMatrix(countData = exprs(recountDataCombat), colData = pData(recountDataCombat), design = formula) de <- estimateSizeFactors(de) de <- estimateDispersionsGeneEst(de) dispersions(de) <- mcols(de)$dispGeneEst glm_all_nb_combat <- nbinomWaldTest(de) res <- results(glm_all_nb_combat, name=resultsNames(glm_all_nb_combat)) .myMAPlot(res, name=title_figure_2) # Sample Distances dds <- glm_all_nb_combat vsd <- vst(dds, blind=FALSE)
But I get the following error
Error in estimateDispersionsFit(object, quiet = TRUE, fitType) : all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT
I am confused by this since I did follow this advice in my code when I did
de <- estimateSizeFactors(de) de <- estimateDispersionsGeneEst(de) dispersions(de) <- mcols(de)$dispGeneEst glm_all_nb_combat <- nbinomWaldTest(de)
I want to avoid rlog at all costs since I have around 600~ samples. But my questions are:
Does anyone know away around this?
If not does anyone know where I can find the source code?
Is VST not how I should be normalizing these samples since the dispersion estimates are so low?
Thank you in advance! :D Mia Altieri