Entering edit mode
brijon
•
0
@brijon-13253
Last seen 7.5 years ago
Hi
Im having alot of trouble applying DESeq2 to my metagenome gene abundance data , any advice regarding where I may be going wrong would be much appreciated.Early into my career so sorry for my ignorance.
dds <- DESeqDataSetFromMatrix(countData=spc.matrix,colData = env,design= ~ Habitat)
I have tried running estimateSizeFactors like so:
dds<-estimateSizeFactors(dds,type="iterate")
and get the following error:
Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) : all gene-wise dispersion estimates are within 2 orders of magnitude from the minimum value, and so the standard curve fitting techniques will not work. One can instead use the gene-wise estimates as final estimates: dds <- estimateDispersionsGeneEst(dds) dispersions(dds) <- mcols(dds)$dispGeneEst ...then continue with testing using nbinomWaldTest or nbinomLRT
I have tried to then go on to run the estimateDispersionsGeneEst function as advised:
dds <- estimateDispersionsGeneEst(dds)
and get this error...
Error in .local(object, ...) : first calculate size factors, add normalizationFactors, or set normalized=FALSE
I tried to go back and reset the count data with the normalized=FALSE parameter
countdat<-counts(dds,normalized=FALSE) counts(dds)<-countdat
but still get the same error when i reapply the estimateDispersionsGeneEst function.
Many Thanks,
Briony
Thanks very much for your advice Michael, would you consider deseq2 to be appropriate for environmental metagenomic samples, or would you suggest its more appropriate for clinical metagenome samples with less extreme differences in functional profiles?
Cheers again,
Briony
I know this is an unsatisfying answer, but the mileage from the NB methods depends on various properties of the dataset, and I don't analyze metagenomic data myself, so I'm hard pressed to come up with rules for when the NB methods would be outperformed by other specific software. I'd look at, e.g. MA plot and the top genes using plotCounts to make sure that the inference makes sense and isn't driven by individual samples too much. Also, you should probably set minReplicatesForReplace=Inf, as the outlier replacement will probably not be appropriate for the count distribution. And I'm fairly sure the "poscounts" normalization will be better than the default, which might end up using very few rows for normalization (or fail if all rows have a 0).