Hi All,
I am using DESeq2 and I want to use GC content, gene length and size factor offsets using CQN package, as I do various QC visualization plots and statistics, I have been trying to export the data with varianceStabilizingTransformation.
First off I want to say I have used DESeq2 with the estimateSizeFactors function, and this *appears* to produce artifact free smooth results.
However, when I use CQN then set the normalization factors ie:
normalizationFactors(dds) <- exp(offsets$glm.offset)
where offsets is a CQN object, I get a bimodal distribution in a combination of PC1 and PC2.
I have read through the source code for varianceStabilizingTransformtion and getVarianceStabilizedData, and I can see that the fitType must be local in the DESeqDataSet for the offsets to be used in the transformation, so I ran the package with these settings, and it appears that this does not matter.
Furthermore, when I explore the CQN quantile normalized data the principal components look fine, when I compare the CQN RPKM normalized data, the VST data approaches zero smoothly, where the CQN data extends into the negative range, so a zero in the VST could be anywhere from -5 to a 5, I mention this, because I wonder if there is a numerical problerm with negative offsets perhaps effecting larger sizeFactors?
Anyway, DESeq2 has many data transformation options. It appears that rlog does take the offsets into account, however is incredibly slow and I have only been able to use it on a subset of the data, it does not appear to suffer from the bimodal problem *as much*, however it is still noticeable. I need a transformation that leaves the variance associated with condition of interest with other covariates removed, which is one of the reasons normalizeTransform is not appropriate, or using the CQN RPKM directly.
-Robin
Dear Mike and Robin,
as Robin pointed out, I am still a bit confused about how to include the normalizationFactors matrix from cqn into DESeq2.
I believe that the right way (this is the only way I see an improvement in the GC and length bias) should be:
cqnOffset <- cqnObject$glm.offset
cqnNormFactors <- exp(cqnOffset)
normalizationFactors(dds) <- cqnNormFactors
Could you confirm that?
Thank you.
Best Regards,
Gianluca
Before inputing normalizationFactors into DESeq2, you should divide out the geometric mean:
This is so that
mean(counts(dds, normalized=TRUE)[gene,])
is roughly on the same scale asmean(counts(dds)[gene,])
.Dear Mike,
thank you for your help.
I run the code as you suggested.
cqnOffset <- cqnObject$glm.offset
cqnNormFactors <- exp(cqnOffset)
normFactors<- cqnNormFactors / exp(rowMeans(log(cqnNormFactors)))
normalizationFactors(dds)=normFactors
Anyway when I tested the normalized counts with and without using the normalization factors from cqn I noticed that if I divide them by the geometric means the correction becomes almost completely ineffective while if I do not divide out the geometric means the correction seems to be fine.
Please, could you explain to me why? Is this to be expected
Thank you.
Best Regards,
Gianluca
Hi Gianluca,
Maybe you can make a new post where you show how you assess "effective"
Dear Mike,
I attached three plots made with NOISeq to test for GC contenct bias, that I used to test the effect of the cqn correction.
From NOISeq manual:
“The GC content is divided in intervals (bins) containing 200 features. The middle point of each bin is depicted in X axis. For each bin, the 5% trimmed mean of the corresponding expression values is computed and depicted in Y axis… …. Both the model p-value and the coefficient of determination (R2) are shown in the plot as well as the fitted regression curve. If the model p-value is significant and R2 value is high (more than 70%), the expression will depend on the feature GC content and the curve will show the type of dependence”
Fig (A) is obtained with counts(dds,normalized=TRUE) without using the normalization factors from cqn
Fig (B) is obtained with counts(dds,normalized=TRUE) using the normalization factors from cqn without dividing out the geometric mean
Fig (C) is obtained with counts(dds,normalized=TRUE) using the normalization factors from cqn dividing out the geometric mean
Fig B is exactly the same plot of when I test directly the normalized values from cqn with NOISeq.
Thank you.
Gianluca
I agree it looks like it's not working. It's getting hard to follow everything though, could you start a new thread (post new question) and post all the code so I can look it over and see if everything looks correct?