RUVSeq empirical negative controls? how many to take to find the span set
Entering edit mode
Last seen 2.1 years ago
United States


so I am using RUVg(eset, k=1,...)   determining the in-silico negative controls by gene or transcript that has a p.value greater than 0.55 with the FDR procedure Benj-Hoch (many many highly insignificant entries came up).  

my question is I am not sure how many to include as insignificant entries  into RUVg  empirical negative controls. ?

my first thought is to take the bottom 10% insignificant genes/transcripts with pval near 0.8 (these were a few hundred far fewer than the flat threshold of p.val 0.55).


Then after reading the RUVSeq manual, they grabbed anything that is not in the top 5000 genes returned from edgeR.

I do notice a difference in the calculated weights by negative control selection process, but am not sure if it is helpful during factor analysis algorithm which elements (and how many elements) can optimize the computation for the spanning space of unwanted variance.


Any suggestions are greatly appreciated.



Anthony C.



ruvseq RUV ruvnormalize ruvg • 1.3k views
Entering edit mode
davide risso ▴ 920
Last seen 23 months ago
University of Padova

Hi Anthony,

when selecting a set of negative controls you have a tradeoff between having a good number of genes and a set of genes that are not affected by the biological factor of interest. Selecting more genes will in principle lead to more stable estimates of the unwanted variation (UV) factors, but will carry the risk of including genes that are actually DE.

In practice we see that usually a few hundreds genes are OK, so I think that your approach of selecting only the bottom 10% of genes ranked by p-value should be fine. However, if the results are very different with different sets of negative controls, you may want to explore a bit more the behavior of these genes to see whether the set with fewer genes doesn't fully capture the batch effects or if the larger set captures some biological signal of interest.

The easiest way is to plot the samples in the space of the first principal components color-coded by biology and possibly by other factors that you know may influence the experiment.

I hope this helps.



Login before adding your answer.

Traffic: 476 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6