Regressing out covariates from expression matrix for PCA visualization - can I use coefficients calculated in DESeq2??
1
0
Entering edit mode
tombate94 • 0
@62e1114a
Last seen 13 hours ago
United States

Hi everyone,

I'm running some pseudobulk RNA-seq analyses with DESeq2 and I'm trying to get a feel for how known covariates are influencing the model. I particularly like to see the PCA plots with the covariates regressed out from the vst transformed expression matrix.

I have acheived this via the removeBatchEffects() function from the limma package and also by fitting the regression coefficients by Ordinary Least Squares (OLS). Both removeBatchEffects() and OLS return the same results when visualized by PCA. I think this is because I have not specified weights for lmFit() within removeBatchEffects().

Either way I'm not too bothered about the above, because it seems to me that the regression coefficients estimated by DESeq2 will turn out different anyway. As such visualization of the covariate effects via PCA using the above removeBatchEffects() and OLS methods won't capture the effects of the covariates as estimated by DESeq2, which result in the LFCs that we end up using for inferences.

This leads to my question, can we use the coefficients estimated by DESeq2 to regress out the effects of covariates for PCA visualization? To be more explicit, i'm proposing after running DESeq2 to:

vsd <- vst(dds, blind = FALSE)
beta <- coef(dds)
vsd_clean <- vsd - design_covariates %*% beta_covariates    
#Variables labelled 'covariates' are selecting for covariates that we wish to regress out 
plotPCA(vsd_clean)

As expected (though perhaps I am wrong) this yields a different PCA plot than the removeBatchEffects() and OLS methods. It's my understanding that DESeq2 fits the GLM log2(q) = X***Beta* where q* is proportional to the 'expected true count', which is not the 'true count' we observe in the data. But even so, is it not the case that subtracting from the vst transformed 'true counts' provides a more accurate account of the covariate effects estimated by DESeq2?

I'm not fully literate in GLMs and I might have gotten something very wrong here, so any guidance on this matter would be much appreciated.

Thanks! Tom

DESeq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

q* is proportional to the 'expected true count', which is not the 'true count' we observe in the data

Both VST and the GLM in DESeq2 remove the effect of sequencing depth, so no difference here.

is it not the case that subtracting from the vst transformed 'true counts' provides a more accurate account of the covariate effects estimated by DESeq2?

For visualization, we recommend using VST -> removeBatchEffect -> PCA (1)

It's hard to visualize the effect of controlling in a GLM, for one because it happens inside the log and secondly because if the covariates are not orthogonal, subsequent regressions are not equivalent to simultaneous fitting of coefficients.

Given that you can't exactly recreate a plot that diagrams how estimates coefficients match the data, we recommend (1) as the best option.

ADD COMMENT
0
Entering edit mode

Thank-you for clarifying this!

ADD REPLY

Login before adding your answer.

Traffic: 836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6