accounting for technical variation in DESEQ2
1
1
Entering edit mode
@marcotrizzino83-9987
Last seen 5.7 years ago

I have a question about DESEQ2 data normalization.  I know that DESEQ2  requires raw reads counts, that the softwares normalizes by seq depth. 

 

But what if I want to account for technical variation? Normally, I would quantile-normalize the data, but I understand that DESEQ2 does not support quantile normalized data, so how can I correct for this kind of variability?

 

Thanks in advance,

 

Marco

deseq2 normalization • 782 views
ADD COMMENT
0
Entering edit mode
@peter-langfelder-4469
Last seen 21 days ago
United States

Others may have more systematic answers, but here are my 2 cents regarding specifically quantile normalization. When I check quantiles in DESeq-normalized data (more precisely, normalized and variance-stabilized), the data always look "nearly quantile normalized", in that specific percentiles (I normally use 30%, 50%, 70%, 80%, 90%) vary within a very narrow range, certainly much less than the differences between the percentiles. In other words, in my experience DESeq normalization approximates quantile normalization very well.

ADD COMMENT
0
Entering edit mode

Thanks. So you would recommend performing variance-stabilization before looking for DE genes (or differentially accessible ATAC-seq regions) if I am concern about technical variation?

ADD REPLY
0
Entering edit mode

You don’t need to apply VST before DE (you cannot actually supply transformed data to DESeq2 which instead uses original counts and models the heteroskedastocity via the NB GLM).

Let me check in later for more links to technical variation related software that can be useful here integrating with DESeq2 (RUV, cqn, sva, etc).

ADD REPLY
1
Entering edit mode

So, there are two categories as I see it for modeling extra technical variation, one based on covariates, e.g. gene GC content, and gene length:

  • cqn
  • EDASeq
  • etc.

and the other based on factor analysis:

  • RUVSeq
  • svaseq
  • etc.

We have examples of incorporating these in the vignette and workflow

The covariate-based methods are useful if you have biased counts related to per-sample fluctuations in PCR or RNA degradation. If you use Salmon with --gcBias (and --posBias for positional bias), and then tximport, then you don't need to use those to deal with that type of technical variation, as Salmon has already corrected for these during its estimation steps and its passed along to DESeq2 via tximport. You can assess GC bias and positional bias with MultiQC (and FASTQC modules, also soon to come Salmon modules).

The factor analysis methods are useful for removing additional technical variation regardless the source, but if the bias is partially confounded with the biological covariates, its possible to remove some signal. This doesn't happen with Salmon or the covariate based methods because they are working on a per-sample basis, and only removing variation that can be explained based on gene, transcript or cDNA fragment features.

ADD REPLY
0
Entering edit mode

No, I recommend checking the normalized (and perhaps VST'd) data and unless there is good reason to worry about quantile normalization, don't (worry about QN). As Michael says below, if you feel you have inter-sample technical variation, you can look into SVA, RUV-seq and possibly other approaches to creating covariates that can be used within DESeq to account for inter-sample variation.

ADD REPLY
0
Entering edit mode

Thank you both for the replies, I'll check what you suggested and let you know if I have more questions.

ADD REPLY

Login before adding your answer.

Traffic: 456 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6