Question: Upper-quartile normalization before RUVg normalization?
0
13 days ago by
Jon Bråte150
Norway
Jon Bråte150 wrote:

Hi, I'm looking through the RUVseq manual and it seems that the set object used for RUVg normalization has first been normalized using betweenLaneNormalization from EDASeq? I tried RUVg normalization both with and without doing betweenLaneNormalization first and I get different results. So I just wanted to confirm whether it's recommended to do betweenLaneNormalization before RUVg normalization?

Thanks!

Jon

ruvseq • 75 views
modified 12 days ago by davide risso830 • written 13 days ago by Jon Bråte150
Answer: Upper-quartile normalization before RUVg normalization?
1
12 days ago by
davide risso830
Weill Cornell Medicine
davide risso830 wrote:

Hi Jon,

you are right, adjusting for sequencing depth prior to RUV does influence the results. Our recommended workflow is to first run a between-sample normalization (e.g., by using upper-quartile implemented in betweenLaneNormalization) to adjust for sequencing depth and then run RUV. This is also what is suggested in the RUVSeq vignette.

Best, Davide

Thanks for the clarification!

If I may ask another related thing, our spike set (ERCC genes) has a lot of zero counts, and this causes infinite and missing values after betweenLaneNormalization. We can solve this by adding +1 to each gene, but I am not sure how this will affect the results, especially for the genes which have zero in the first place. Would you recommend to add 1 to every count?

Error message after betweenLaneNormalization and RUVg normalization:

Error in svd(Ycenter[, cIdx]) : infinite or missing values in 'x'
In RUVg(counts, cIdx, k, drop, center, round, epsilon, tolerance,  :
The expression matrix does not contain counts.
Please, pass a matrix of counts (not logged) or set isLog to TRUE to skip the log transformation

1

I would perhaps consider filtering out the spike-ins with a lot of zeros and/or choose a different normalization than upper-quartile, more robust to zeros, e.g., TMM or even scran (developed specifically with data with lots of zeros).

Alternatively, you can use RUVg without normalizing the data first. In our experience, it performs slightly worse, but it's still OK. Remember that the first factor usually picks up sequencing depth, so you will probably need to increase your k by 1.