Hi, I'm working on an RNAseq experiment with spikes added and I would like to use RUVseq package. I had a look in the documentation and started with the very nice example on zebrafish. In this example RUVg is run on the object obtained after betweenLaneNormalization of the EDA package which is equivalent to runing RUVg on the normalized counts obtained by this normalisation method (I checked). Without this I would have run RUVg directly on the counts (either via a matrix or via an object). Is there any guidelines on whether we should run RUVg on counts or on "pre-normalized" counts? Thanks in advance for any help.
in our experience, it is usually preferable to apply RUV to the normalized counts. However, that means that the assumption of the normalization method hold in your data.
In particular, usually normalizations assume that the majority of genes are not DE and/or that there is a roughly equal number of up- and down-regulated genes. If you think that these are reasonable assumptions in your experiment, we found that RUV on normalized data leads to slightly more robust results.
The fact that you are specifically mentioning spike-ins makes me think that perhaps you expect a very different amount of RNA among the conditions, or a lot of gene deregulation. If that's the case, you might be better off not normalizing the data and apply RUV directly on the raw counts (using the spike-ins as negative controls).
I hope this answers your question.