Question

Raw read count Vs EdgeR normzlized read count for RUVg normalization

0

Entering edit mode

Venu Pullabhatla ▴ 40

@venu-pullabhatla-5550

Last seen 7.4 years ago

Hi All

I am working on an RNASeq project with 2 groups of 7 samples each. The libraries contain spike-in controls (ERCC's) to be able to use for normalization. I am using RUVSeq (RUVg) package to perform this. I understand that the choice of "k" has a robust effect to calculate factors of unwanted variation. Going through previous posts and the article i understand that more of the variation could be captures with values of k from 1 to 6, although 6 was used in the article for higher number of samples. So i tried using different values of k from 1-10 and looked at PCA and RLE plots and did not find any k value convincingly good especially for RLE plots, although there was an improvement compared to before normalization. I performed this using raw counts as suggested in the vignette.

On the other hand, I tried the same using normalized counts from EdgeR instead of raw counts which gave me very nice RLE plots. So my question here is whether I can use normalized counts instead of raw counts for performing RUVg normalization? Many many thanks in advance. Can someone please guide me how to proceed further.

Venu

rnaseq ruvseq normalization • 1.2k views

ADD COMMENT • link updated 7.4 years ago by davide risso ▴ 950 • written 7.4 years ago by Venu Pullabhatla ▴ 40

score 0 · Answer 1 · 2016-11-18

Dear Venu,

please note that in our paper (and experience) we see that often normalizing the data using the ERCC spike ins as negative controls does not work as well as using endogenous genes, so it's not surprising that your RLE plots don't look great after RUVg with ERCC as negative controls.

Do you expect to have a global shift in expression in your experiment? I.e., that the majority of the genes are up-regulated in one of the groups? If so, the best option is probably still using the spike-ins, even though the results don't look great; if not, have you tried using only edgeR normalization (i.e., TMM) without RUVg? If the RLE and PCA plots look good you may want to skip RUV altogether (there's no need to correct for batch effects if they're not there!).

Generally speaking, it is fine to use normalized counts to estimate the factors of unwanted variation, provided that you also include the normalization as an offset in the model. That's what's done in the RUVSeq vignette.