Question: Upper-quartile normalization before RUVg normalization?
gravatar for Jon Bråte
13 days ago by
Jon Bråte150
Jon Bråte150 wrote:

Hi, I'm looking through the RUVseq manual and it seems that the set object used for RUVg normalization has first been normalized using betweenLaneNormalization from EDASeq? I tried RUVg normalization both with and without doing betweenLaneNormalization first and I get different results. So I just wanted to confirm whether it's recommended to do betweenLaneNormalization before RUVg normalization?



ruvseq • 75 views
ADD COMMENTlink modified 12 days ago by davide risso830 • written 13 days ago by Jon Bråte150
Answer: Upper-quartile normalization before RUVg normalization?
gravatar for davide risso
12 days ago by
davide risso830
Weill Cornell Medicine
davide risso830 wrote:

Hi Jon,

you are right, adjusting for sequencing depth prior to RUV does influence the results. Our recommended workflow is to first run a between-sample normalization (e.g., by using upper-quartile implemented in betweenLaneNormalization) to adjust for sequencing depth and then run RUV. This is also what is suggested in the RUVSeq vignette.

Best, Davide

ADD COMMENTlink written 12 days ago by davide risso830

Thanks for the clarification!

If I may ask another related thing, our spike set (ERCC genes) has a lot of zero counts, and this causes infinite and missing values after betweenLaneNormalization. We can solve this by adding +1 to each gene, but I am not sure how this will affect the results, especially for the genes which have zero in the first place. Would you recommend to add 1 to every count?

Error message after betweenLaneNormalization and RUVg normalization:

Error in svd(Ycenter[, cIdx]) : infinite or missing values in 'x'
In addition: Warning message:
In RUVg(counts, cIdx, k, drop, center, round, epsilon, tolerance,  :
The expression matrix does not contain counts.
Please, pass a matrix of counts (not logged) or set isLog to TRUE to skip the log transformation
ADD REPLYlink written 11 days ago by Jon Bråte150

I would perhaps consider filtering out the spike-ins with a lot of zeros and/or choose a different normalization than upper-quartile, more robust to zeros, e.g., TMM or even scran (developed specifically with data with lots of zeros).

Alternatively, you can use RUVg without normalizing the data first. In our experience, it performs slightly worse, but it's still OK. Remember that the first factor usually picks up sequencing depth, so you will probably need to increase your k by 1.

ADD REPLYlink written 11 days ago by davide risso830

We tried filtering out those spike-ins, but we were left with so very few... Thanks for these advice, we will check them out!!

ADD REPLYlink written 11 days ago by Jon Bråte150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 119 users visited in the last hour