Question

Dispersion plot -Interpretation -DESeq2

1

Entering edit mode

suhanya ▴ 20

@8ecb7cda

Last seen 2.5 years ago

Poland

I am working on the RNA-Seq data of bacterial samples including untreated and 2 drug treatment groups (3 replicates each) following the DESeq2 steps. My question is looking at the dispersion plots including fit type parametric and local, can it be inferred that there is less variability among the genes? Also because the results table also gave about 265 statistically significant genes between the control and treatment 1. It seems that with higher mean counts there is a slight increase in dispersion in the fit type local. Kindly share your feedback on understanding this plot.

Code should be placed in three backticks as shown below

   ddsObj<- DESeqDataSetFromMatrix(countData = Raw_counts,colData = sampleinfo,design = ~Condition)
   idx <- rowSums(counts(ddsObj) >10 ) >3
   table(idx)

   ddsObj<-ddsObj[idx,]
   dds<-DESeq(ddsObj)
   plotDispEsts(dds)
    plotDispEsts(dds,fitType = "local")
   dds$sizeFactor
   res<-results(dds)
   res<-res[order(res$padj),]

fittype-Parametric

fittype-Local

DESeq2 • 6.6k views

ADD COMMENT • link 2.5 years ago suhanya ▴ 20

score 1 · Answer 1 · 2022-06-01

1

Entering edit mode

jeroen.gilis ▴ 90

@jeroengilis-21551

Last seen 11 months ago

Belgium

Hi suhanya,

It feels to me that you could be mixing up a couple of concepts here.

First, to my knowledge, fitType is an argument of the estimateDispersions function rather than plotDispEsts function. As such, I'm not sure what passing the fitType argument is doing in your code snippet.

Second, the option of changing the default setting fitType = parametric to fitType = local is mainly there to allow for still computing dispersion estimates if the parametric fit fails. This is explained in the package vignette: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis. As such, comparing dispersion estimates from the parametric fit with the local fit is not comparing variability between treatment groups, as your questions suggests.

Third, your questions seems to be about differential variability between treatment groups. I don't see how these plots would tell you something about that. What you might need is making this plot for the treatment groups separately. However, this is only exploratory, and I can imagine there are better visualizations and tests around for studying differential variability.

Fourth, differential variability is not the same as differential expression. You mention 265 genes with differential expression between groups. This means the average (mean) expression of these genes differs between groups. However, the variability may be the same. To simplify, think of the two-sample t-test. Differential expression would compare the means of the distributions, whereas differential variability would compare the width of the distribution.

Kind regards,

Jeroen

ADD COMMENT • link 2.5 years ago jeroen.gilis ▴ 90

0

Entering edit mode

The fitType argument was mistakenly typed out in the plotDispEsts(). And Thank you for the clarification. I would check more about this.

The plot I have obtained is quite different from other available dispersionEsts plots, where with larger mean normalized counts the dispersion decreases. I would like to ask what could be the reasons that in my plot I do not see such a trend.

ADD REPLY • link 2.5 years ago suhanya ▴ 20

0

Entering edit mode

It does strike me as odd that for the parametric fitType (top plot) the dispersion estimates are perfectly flat. In this post https://www.seqanswers.com/forum/applications-forums/rna-sequencing/36373-bad-fits-for-deseq-dispersion-estimates?t=41787, Mike Love does mention that "The parametric curve flattens out, which is a good fit for many RNA-Seq datasets but not necessarily for all, hence we provide the local regression."

That said, for your fit with fitType = local, the trend is also almost flat. This does not need to be a problem; the relationship between mean and dispersion will differ depending on the dataset/experiment. E.g., you are working with bacteria, whereas many plots online will display results for mouse/human. One thing to think about, are all 9 samples run in the same batch? If not, a batch variable should be included accordingly in the model, which may also affect the plot (and the analysis as a whole).

That said, I have seen many mean-variance plots that looked considerably worse than yours, so if you are not missing any covariates I would not hesitate to work with this model.

Jeroen

ADD REPLY • link 2.5 years ago jeroen.gilis ▴ 90

1

Entering edit mode

Yes, Thank you for your time and clear explanation.

ADD REPLY • link 2.5 years ago suhanya ▴ 20