Search
Question: Estimating group-specific dispersion in DESeq2
0
gravatar for bkellman
6 months ago by
bkellman0
bkellman0 wrote:

Is there a reason that the 'per-sample' dispersion calculation was discontinued for DESeq2? I am working with ASD samples which have been observed [ http://dx.doi.org/10.1371/journal.pone.0016715 ] to exhibit lower variance. I need to estimate the ASD and typical dispersion separately. Can I do this with DESeq2?

I can run DESeq::estimateDispersions(...) but that produces two dispersion vectors. How I'm not sure how to integrate these dispersion vectors into DESeq2. It seems like there is only room from one dispersion vector in the analysis.

Ben

ADD COMMENTlink modified 6 months ago by Michael Love12k • written 6 months ago by bkellman0
0
gravatar for Michael Love
6 months ago by
Michael Love12k
United States
Michael Love12k wrote:

Ok, so let's link to the original post as well:

C: Estimation of dispersion in DeSeq and DeSeq2

As I said, DESeq2 only supports a single dispersion value per gene, and this was for generalizing the code base to all designs. Usually it's fine to use a single dispersion value per gene, because the condition with the larger dispersion (if they are truly different across groups) will raise the single estimated value. Can you give an example of a gene in which the dispersion estimates are very discordant, maybe show the plotCounts() for this gene? Note that the dispersion is not the same as the variance. The single dispersion value allows for different groups to have different variance.

Or you can go ahead and use DESeq (1), or I believe baySeq, to have different dispersion values per gene.

ADD COMMENTlink written 6 months ago by Michael Love12k

I ran a quick preliminary check. It looks like the correlation of dispersions diverges quickly when the min rowSums is raised from 1,10,100 (for 100 samples: 70 ASD, 30 TD). So for genes with a base mean > 1 there is little cohesion between the separate assessments. Let me get an example gene.

> cutoff=1
> tmp0=DESeq::newcountdataset ( conditions=dds_tmp$dx_clean)(counts(dds_tmp)[rowSums(counts(dds_tmp))>cutoff,])
> tmp0=DESeq::estimateSizeFactors(tmp0)
> tmp12 = DESeq::estimateDispersions( tmp0 ,method = "per-condition" )
> tmp22 = DESeq::estimateDispersions( tmp0 ,method = "pooled" )
>cor(data.matrix(data.frame(disp1=tmp12@featureData@data$disp_1,disp2=tmp12@featureData@data$disp_2,pooled=tmp22@featureData@data$disp_pooled)),method='pearson')

           disp1     disp2    pooled
disp1  1.0000000 0.9672386 0.9907651
disp2  0.9672386 1.0000000 0.9759009
pooled 0.9907651 0.9759009 1.0000000

>
> cutoff=10
> tmp0=DESeq::newcountdataset ( conditions=dds_tmp$dx_clean)(counts(dds_tmp)[rowSums(counts(dds_tmp))>cutoff,])
> tmp0=DESeq::estimateSizeFactors(tmp0)
> tmp12 = DESeq::estimateDispersions( tmp0 ,method = "per-condition" )
> tmp22 = DESeq::estimateDispersions( tmp0 ,method = "pooled" )
cor(data.matrix(data.frame(disp1=tmp12@featureData@data$disp_1,disp2=tmp12@featureData@data$disp_2,pooled=tmp22@featureData@data$disp_pooled)),method='pearson')
           disp1     disp2    pooled
disp1  1.0000000 0.8870488 0.9557692
disp2  0.8870488 1.0000000 0.9216594
pooled 0.9557692 0.9216594 1.0000000

>
> cutoff=100
> tmp0=DESeq::newcountdataset ( conditions=dds_tmp$dx_clean)(counts(dds_tmp)[rowSums(counts(dds_tmp))>cutoff,])
> tmp0=DESeq::estimateSizeFactors(tmp0)
> tmp12 = DESeq::estimateDispersions( tmp0 ,method = "per-condition" )
> tmp22 = DESeq::estimateDispersions( tmp0 ,method = "pooled" )
>cor(data.matrix(data.frame(disp1=tmp12@featureData@data$disp_1,disp2=tmp12@featureData@data$disp_2,pooled=tmp22@featureData@data$disp_pooled)),method='pearson')

           disp1     disp2    pooled
disp1  1.0000000 0.6627738 0.8129001
disp2  0.6627738 1.0000000 0.8345221
pooled 0.8129001 0.8345221 1.0000000

 

ADD REPLYlink written 6 months ago by bkellman0

I'd use DESeq2 to estimate the gene's dispersions for each group, these are more accurate per gene than DESeq. See instructions in my previous answer. And then I'm mostly interested in the difference for individual genes and the plotCounts() figure.

ADD REPLYlink written 6 months ago by Michael Love12k

Hi Michael - Is there a cut off for differences in dispersions as to whether the contrast function is suitable to use over testing pairwise groups by inputting the pairs of data in independently?

ADD REPLYlink written 5 weeks ago by bekah0

No strict cutoff. Take a look at the PCA plot and see if the pairs are very different. You can make the plot and post an image here (using imgur.com for example) if you like.

ADD REPLYlink written 5 weeks ago by Michael Love12k



​This was the pca created from logtransformed data using blind = FALSE to make sure biological variation from treatment was not smoothed out excessively. Colours are the experimental treatment groups (i.e. 5 replicates per treatment). I think that the dispersal is too varied and independent pairwise comparison would be best. Thank you again for all your advice.

ADD REPLYlink written 5 weeks ago by bekah0

I agree, they seem fairly different in terms of within-group variability.

ADD REPLYlink written 5 weeks ago by Michael Love12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 126 users visited in the last hour