Question

Batch-resolved visualization after RUVg normalization?

0

Entering edit mode

ctlong • 0

@58139fde

Last seen 1 day ago

Macao

Hi,

I am currently following the RUVseq vignette to perform RUVg normalization and to identify potential unwanted variation in my dataset. My RLE plot before and after RUVg normalization suggests that the unwanted variations have been addressed. However, I have now stumbled upon the question about what really are the RUVg normalized counts stored in object@assayData$normalizedCounts, and what types of exploratory analyses can I perform with it as input? Based on my understanding (I might be wrong), the RUVg normalized counts are obtained by regressing the original counts on the unwanted factors. So how different are these normalized counts compared with upperquartile normalization, and if these are essentially raw counts regressed on unwanted variation, why aren't they suitable as input for DE analysis?

Furthermore, I would like to manually draw my own batch-resolved PCA and perform hierarchical clustering after RUVg normalization. With this, should I use object@assayData$normalizedCounts as input, or is there anyway to do this with the unwanted variation identified by RUVg. I think VST normalization of raw counts from DESeq2 does a really nice job in performing these types of visualizations. In this case, does using the RUVg normalized counts as input serve the same purpose on generating batch-resolved visualizations. Thanks for the help!

RUVSeq • 446 views

ADD COMMENT • link 3 months ago ctlong • 0

1

Entering edit mode

You can use removeBatchEffects from limma on the vst-transformed counts, and provide the RUV factors to removeBatchEffects via the covariates argument. This visualizes the effect of regressing these factors.

ADD REPLY • link 3 months ago ATpoint ★ 4.0k

0

Entering edit mode

Oh wow. Never thought about the combination of using both Limma and DESeq2, though it sounds technically feasible. That being said, do you have any recommended tutorials or vignettes on how to do this correctly, since I guess this procedure is quite inordinary, so probably having a guide is the safest approach.

ADD REPLY • link 3 months ago ctlong • 0

0

Entering edit mode

It's perfectly ordinary, even the vignette mentions it:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

ADD REPLY • link 3 months ago ATpoint ★ 4.0k

score 1 · Answer 1 · 2024-01-12

It's not clear what you mean by upperquartile normalization (unless you mean the edgeR method implemented in calcNormFactors) . Conventionally, normalization factors are estimated and then used in the GLM as offsets in order to control for library size. You can compute normalization factors based on the upperquartile, and then use as offsets, but that isn't a way of adjusting the counts.

The main reason for not using adjusted counts (RUVg or any other normalization that affects the counts) is the same as for microarray analyses - you are removing variation (and thereby reducing the available degrees of freedom) without accounting for that reduction in df in your model. This is why the help page for removeBatchEffects says it's for visualization rather than for analysis. You can use the adjusted counts for plots and whatnot, but you shouldn't use for analysis. And this is what the authors of RUVSeq indicate in the section of the vignette that immediately follows the RUVg section. They use RUVg to estimate factors of unwanted variation and then include them in the design matrix. This controls for the excess variation while also correctly reducing the df of the model. I mean they say this directly:

The normalized values are stored in the normalizedCounts slot of set and can be accessed with the normCounts method. These counts should be used only for exploration. It is important that subsequent DE analysis be done on the original counts (accessible through the counts method), as removing the unwanted factors from the counts can also remove part of a factor of interest (Gagnon-Bartsch, Jacob, and Speed 2013).