Upper-quartile normalization before RUVg normalization?
1
0
Entering edit mode
Jon Bråte ▴ 260
@jon-brate-6263
Last seen 6 months ago
Norway

Hi, I'm looking through the RUVseq manual and it seems that the set object used for RUVg normalization has first been normalized using betweenLaneNormalization from EDASeq? I tried RUVg normalization both with and without doing betweenLaneNormalization first and I get different results. So I just wanted to confirm whether it's recommended to do betweenLaneNormalization before RUVg normalization?

Thanks!

Jon

RUVSeq • 2.7k views
ADD COMMENT
1
Entering edit mode
davide risso ▴ 980
@davide-risso-5075
Last seen 10 months ago
University of Padova

Hi Jon,

you are right, adjusting for sequencing depth prior to RUV does influence the results. Our recommended workflow is to first run a between-sample normalization (e.g., by using upper-quartile implemented in betweenLaneNormalization) to adjust for sequencing depth and then run RUV. This is also what is suggested in the RUVSeq vignette.

Best, Davide

ADD COMMENT
0
Entering edit mode

Thanks for the clarification!

If I may ask another related thing, our spike set (ERCC genes) has a lot of zero counts, and this causes infinite and missing values after betweenLaneNormalization. We can solve this by adding +1 to each gene, but I am not sure how this will affect the results, especially for the genes which have zero in the first place. Would you recommend to add 1 to every count?

Error message after betweenLaneNormalization and RUVg normalization:

Error in svd(Ycenter[, cIdx]) : infinite or missing values in 'x'
In addition: Warning message:
In RUVg(counts, cIdx, k, drop, center, round, epsilon, tolerance,  :
The expression matrix does not contain counts.
Please, pass a matrix of counts (not logged) or set isLog to TRUE to skip the log transformation
ADD REPLY
1
Entering edit mode

I would perhaps consider filtering out the spike-ins with a lot of zeros and/or choose a different normalization than upper-quartile, more robust to zeros, e.g., TMM or even scran (developed specifically with data with lots of zeros).

Alternatively, you can use RUVg without normalizing the data first. In our experience, it performs slightly worse, but it's still OK. Remember that the first factor usually picks up sequencing depth, so you will probably need to increase your k by 1.

ADD REPLY
0
Entering edit mode

We tried filtering out those spike-ins, but we were left with so very few... Thanks for these advice, we will check them out!!

ADD REPLY

Login before adding your answer.

Traffic: 527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6