Question

DESeq2 estimateDispersionsGeneEst error

0

Entering edit mode

brijon • 0

@brijon-13253

Last seen 6.9 years ago

Hi

Im having alot of trouble applying DESeq2 to my metagenome gene abundance data , any advice regarding where I may be going wrong would be much appreciated.Early into my career so sorry for my ignorance.

dds <- DESeqDataSetFromMatrix(countData=spc.matrix,colData = env,design= ~ Habitat)

I have tried running estimateSizeFactors like so:

dds<-estimateSizeFactors(dds,type="iterate")

and get the following error:

Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) : 
  all gene-wise dispersion estimates are within 2 orders of magnitude
  from the minimum value, and so the standard curve fitting techniques will not work.
  One can instead use the gene-wise estimates as final estimates:
  dds <- estimateDispersionsGeneEst(dds)
  dispersions(dds) <- mcols(dds)$dispGeneEst
  ...then continue with testing using nbinomWaldTest or nbinomLRT

I have tried to then go on to run the estimateDispersionsGeneEst function as advised:

dds <- estimateDispersionsGeneEst(dds)

and get this error...

Error in .local(object, ...) :

  first calculate size factors, add normalizationFactors, or set normalized=FALSE

I tried to go back and reset the count data with the normalized=FALSE parameter

countdat<-counts(dds,normalized=FALSE)

counts(dds)<-countdat

but still get the same error when i reapply the estimateDispersionsGeneEst function.

Many Thanks,

Briony

deseq2 • 2.3k views

ADD COMMENT • link updated 6.9 years ago by Michael Love 41k • written 6.9 years ago by brijon • 0

score 1 · Answer 1 · 2017-06-14

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 9 hours ago

United States

Try the poscounts size factor estimator type in version 1.16. This is recommended for metagenomics.

ADD COMMENT • link 6.9 years ago Michael Love 41k

0

Entering edit mode

Thanks very much for your advice Michael, would you consider deseq2 to be appropriate for environmental metagenomic samples, or would you suggest its more appropriate for clinical metagenome samples with less extreme differences in functional profiles?

Cheers again,

Briony

ADD REPLY • link 6.9 years ago brijon • 0

0

Entering edit mode

I know this is an unsatisfying answer, but the mileage from the NB methods depends on various properties of the dataset, and I don't analyze metagenomic data myself, so I'm hard pressed to come up with rules for when the NB methods would be outperformed by other specific software. I'd look at, e.g. MA plot and the top genes using plotCounts to make sure that the inference makes sense and isn't driven by individual samples too much. Also, you should probably set minReplicatesForReplace=Inf, as the outlier replacement will probably not be appropriate for the count distribution. And I'm fairly sure the "poscounts" normalization will be better than the default, which might end up using very few rows for normalization (or fail if all rows have a 0).

ADD REPLY • link 6.9 years ago Michael Love 41k