Question: Upper-quartile normalization before RUVg normalization?
gravatar for Jon Bråte
5 months ago by
Jon Bråte160
Jon Bråte160 wrote:

Hi, I'm looking through the RUVseq manual and it seems that the set object used for RUVg normalization has first been normalized using betweenLaneNormalization from EDASeq? I tried RUVg normalization both with and without doing betweenLaneNormalization first and I get different results. So I just wanted to confirm whether it's recommended to do betweenLaneNormalization before RUVg normalization?



ruvseq • 146 views
ADD COMMENTlink modified 5 months ago by davide risso830 • written 5 months ago by Jon Bråte160
Answer: Upper-quartile normalization before RUVg normalization?
gravatar for davide risso
5 months ago by
davide risso830
University of Padova
davide risso830 wrote:

Hi Jon,

you are right, adjusting for sequencing depth prior to RUV does influence the results. Our recommended workflow is to first run a between-sample normalization (e.g., by using upper-quartile implemented in betweenLaneNormalization) to adjust for sequencing depth and then run RUV. This is also what is suggested in the RUVSeq vignette.

Best, Davide

ADD COMMENTlink written 5 months ago by davide risso830

Thanks for the clarification!

If I may ask another related thing, our spike set (ERCC genes) has a lot of zero counts, and this causes infinite and missing values after betweenLaneNormalization. We can solve this by adding +1 to each gene, but I am not sure how this will affect the results, especially for the genes which have zero in the first place. Would you recommend to add 1 to every count?

Error message after betweenLaneNormalization and RUVg normalization:

Error in svd(Ycenter[, cIdx]) : infinite or missing values in 'x'
In addition: Warning message:
In RUVg(counts, cIdx, k, drop, center, round, epsilon, tolerance,  :
The expression matrix does not contain counts.
Please, pass a matrix of counts (not logged) or set isLog to TRUE to skip the log transformation
ADD REPLYlink written 5 months ago by Jon Bråte160

I would perhaps consider filtering out the spike-ins with a lot of zeros and/or choose a different normalization than upper-quartile, more robust to zeros, e.g., TMM or even scran (developed specifically with data with lots of zeros).

Alternatively, you can use RUVg without normalizing the data first. In our experience, it performs slightly worse, but it's still OK. Remember that the first factor usually picks up sequencing depth, so you will probably need to increase your k by 1.

ADD REPLYlink written 5 months ago by davide risso830

We tried filtering out those spike-ins, but we were left with so very few... Thanks for these advice, we will check them out!!

ADD REPLYlink written 5 months ago by Jon Bråte160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 465 users visited in the last hour