Question

DESeq2 and CQN how to use offsets and get transformations?

0

Entering edit mode

kodream ▴ 20

@kodream-6952

Last seen 7.9 years ago

United States

Hi All,

I am using DESeq2 and I want to use GC content, gene length and size factor offsets using CQN package, as I do various QC visualization plots and statistics, I have been trying to export the data with varianceStabilizingTransformation.

First off I want to say I have used DESeq2 with the estimateSizeFactors function, and this *appears* to produce artifact free smooth results.

However, when I use CQN then set the normalization factors ie:

normalizationFactors(dds) <- exp(offsets$glm.offset)

where offsets is a CQN object, I get a bimodal distribution in a combination of PC1 and PC2.

I have read through the source code for varianceStabilizingTransformtion and getVarianceStabilizedData, and I can see that the fitType must be local in the DESeqDataSet for the offsets to be used in the transformation, so I ran the package with these settings, and it appears that this does not matter.

Furthermore, when I explore the CQN quantile normalized data the principal components look fine, when I compare the CQN RPKM normalized data, the VST data approaches zero smoothly, where the CQN data extends into the negative range, so a zero in the VST could be anywhere from -5 to a 5, I mention this, because I wonder if there is a numerical problerm with negative offsets perhaps effecting larger sizeFactors?

Anyway, DESeq2 has many data transformation options. It appears that rlog does take the offsets into account, however is incredibly slow and I have only been able to use it on a subset of the data, it does not appear to suffer from the bimodal problem *as much*, however it is still noticeable. I need a transformation that leaves the variance associated with condition of interest with other covariates removed, which is one of the reasons normalizeTransform is not appropriate, or using the CQN RPKM directly.

-Robin

deseq2 • 2.9k views

ADD COMMENT • link 8.3 years ago kodream ▴ 20

score 0 · Answer 1 · 2016-11-09

hi Robin,

"I have read through the source code for varianceStabilizingTransformtion and getVarianceStabilizedData, and I can see that the fitType must be local in the DESeqDataSet for the offsets to be used in the transformation"

I just want to point out this is not the case: all of the fitType paths in the VST code use ncounts, which is counts(object, normalized=TRUE), which is using the normalization factors from CQN.

I have a question: did you divide out the geometric mean of the exponentiated CQN offset matrix so the rows are approximately centered on 1? There is code for this in the vignette (Section: Sample-/gene-dependent normalization factors).

score 0 · Answer 2 · 2016-11-09

0

Entering edit mode

kodream ▴ 20

@kodream-6952

Last seen 7.9 years ago

United States

Hi Mike,

I divided by the geometric mean, this appeared to improve the PC1 to PC2 relationship. The bimodal problem went away.

Thank you.

-Robin

FWIW it seems in a different/confusing order in the manual:

normFactors <- normFactors / exp(rowMeans(log(normFactors)))

normalizationFactors(dds) <- normFactors

...

cqnOffset <- cqnObject$glm.offset

cqnNormFactors <- exp(cqnOffset)

ADD COMMENT • link 8.3 years ago kodream ▴ 20

0

Entering edit mode

Dear Mike and Robin,

as Robin pointed out, I am still a bit confused about how to include the normalizationFactors matrix from cqn into DESeq2.

I believe that the right way (this is the only way I see an improvement in the GC and length bias) should be:

cqnOffset <- cqnObject$glm.offset

cqnNormFactors <- exp(cqnOffset)

normalizationFactors(dds) <- cqnNormFactors

Could you confirm that?

Thank you.

Best Regards,

Gianluca

ADD REPLY • link 7.8 years ago glmazzo ▴ 10

0

Entering edit mode

Before inputing normalizationFactors into DESeq2, you should divide out the geometric mean:

normFactors <- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds) <- normFactors

This is so that mean(counts(dds, normalized=TRUE)[gene,]) is roughly on the same scale as mean(counts(dds)[gene,]).

ADD REPLY • link 7.8 years ago Michael Love 43k

0

Entering edit mode

Dear Mike,

thank you for your help.

I run the code as you suggested.

cqnOffset <- cqnObject$glm.offset

cqnNormFactors <- exp(cqnOffset)

normFactors<- cqnNormFactors / exp(rowMeans(log(cqnNormFactors)))

normalizationFactors(dds)=normFactors

Anyway when I tested the normalized counts with and without using the normalization factors from cqn I noticed that if I divide them by the geometric means the correction becomes almost completely ineffective while if I do not divide out the geometric means the correction seems to be fine.

Please, could you explain to me why? Is this to be expected

Thank you.

Best Regards,

Gianluca

ADD REPLY • link 7.8 years ago glmazzo ▴ 10

0

Entering edit mode

Hi Gianluca,

Maybe you can make a new post where you show how you assess "effective"

ADD REPLY • link 7.8 years ago Michael Love 43k

0

Entering edit mode

Dear Mike,

I attached three plots made with NOISeq to test for GC contenct bias, that I used to test the effect of the cqn correction.

From NOISeq manual:

“The GC content is divided in intervals (bins) containing 200 features. The middle point of each bin is depicted in X axis. For each bin, the 5% trimmed mean of the corresponding expression values is computed and depicted in Y axis… …. Both the model p-value and the coefficient of determination (R2) are shown in the plot as well as the fitted regression curve. If the model p-value is significant and R2 value is high (more than 70%), the expression will depend on the feature GC content and the curve will show the type of dependence”

Fig (A) is obtained with counts(dds,normalized=TRUE) without using the normalization factors from cqn

Fig (B) is obtained with counts(dds,normalized=TRUE) using the normalization factors from cqn without dividing out the geometric mean

Fig (C) is obtained with counts(dds,normalized=TRUE) using the normalization factors from cqn dividing out the geometric mean

Fig B is exactly the same plot of when I test directly the normalized values from cqn with NOISeq.

Thank you.

Gianluca

ADD REPLY • link 7.8 years ago glmazzo ▴ 10

0

Entering edit mode

I agree it looks like it's not working. It's getting hard to follow everything though, could you start a new thread (post new question) and post all the code so I can look it over and see if everything looks correct?

ADD REPLY • link 7.8 years ago Michael Love 43k