Question: accounting for technical variation in DESEQ2
gravatar for marco.trizzino83
11 months ago by
marco.trizzino8310 wrote:

I have a question about DESEQ2 data normalization.  I know that DESEQ2  requires raw reads counts, that the softwares normalizes by seq depth. 


But what if I want to account for technical variation? Normally, I would quantile-normalize the data, but I understand that DESEQ2 does not support quantile normalized data, so how can I correct for this kind of variability?


Thanks in advance,



normalization deseq2 • 226 views
ADD COMMENTlink modified 11 months ago by Peter Langfelder2.1k • written 11 months ago by marco.trizzino8310
Answer: accounting for technical variation in DESEQ2
gravatar for Peter Langfelder
11 months ago by
United States
Peter Langfelder2.1k wrote:

Others may have more systematic answers, but here are my 2 cents regarding specifically quantile normalization. When I check quantiles in DESeq-normalized data (more precisely, normalized and variance-stabilized), the data always look "nearly quantile normalized", in that specific percentiles (I normally use 30%, 50%, 70%, 80%, 90%) vary within a very narrow range, certainly much less than the differences between the percentiles. In other words, in my experience DESeq normalization approximates quantile normalization very well.

ADD COMMENTlink written 11 months ago by Peter Langfelder2.1k

Thanks. So you would recommend performing variance-stabilization before looking for DE genes (or differentially accessible ATAC-seq regions) if I am concern about technical variation?

ADD REPLYlink written 11 months ago by marco.trizzino8310

You don’t need to apply VST before DE (you cannot actually supply transformed data to DESeq2 which instead uses original counts and models the heteroskedastocity via the NB GLM).

Let me check in later for more links to technical variation related software that can be useful here integrating with DESeq2 (RUV, cqn, sva, etc).

ADD REPLYlink modified 11 months ago • written 11 months ago by Michael Love24k

So, there are two categories as I see it for modeling extra technical variation, one based on covariates, e.g. gene GC content, and gene length:

  • cqn
  • EDASeq
  • etc.

and the other based on factor analysis:

  • RUVSeq
  • svaseq
  • etc.

We have examples of incorporating these in the vignette and workflow

The covariate-based methods are useful if you have biased counts related to per-sample fluctuations in PCR or RNA degradation. If you use Salmon with --gcBias (and --posBias for positional bias), and then tximport, then you don't need to use those to deal with that type of technical variation, as Salmon has already corrected for these during its estimation steps and its passed along to DESeq2 via tximport. You can assess GC bias and positional bias with MultiQC (and FASTQC modules, also soon to come Salmon modules).

The factor analysis methods are useful for removing additional technical variation regardless the source, but if the bias is partially confounded with the biological covariates, its possible to remove some signal. This doesn't happen with Salmon or the covariate based methods because they are working on a per-sample basis, and only removing variation that can be explained based on gene, transcript or cDNA fragment features.

ADD REPLYlink modified 11 months ago • written 11 months ago by Michael Love24k

No, I recommend checking the normalized (and perhaps VST'd) data and unless there is good reason to worry about quantile normalization, don't (worry about QN). As Michael says below, if you feel you have inter-sample technical variation, you can look into SVA, RUV-seq and possibly other approaches to creating covariates that can be used within DESeq to account for inter-sample variation.

ADD REPLYlink written 11 months ago by Peter Langfelder2.1k

Thank you both for the replies, I'll check what you suggested and let you know if I have more questions.

ADD REPLYlink written 11 months ago by marco.trizzino8310
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 141 users visited in the last hour