Question: RUVSeq empirical negative controls? how many to take to find the span set
0
gravatar for anthonycolombo60
3.3 years ago by
anthonycolombo600 wrote:

Hi.

so I am using RUVg(eset, k=1,...)   determining the in-silico negative controls by gene or transcript that has a p.value greater than 0.55 with the FDR procedure Benj-Hoch (many many highly insignificant entries came up).  

my question is I am not sure how many to include as insignificant entries  into RUVg  empirical negative controls. ?

my first thought is to take the bottom 10% insignificant genes/transcripts with pval near 0.8 (these were a few hundred far fewer than the flat threshold of p.val 0.55).

 

Then after reading the RUVSeq manual, they grabbed anything that is not in the top 5000 genes returned from edgeR.

I do notice a difference in the calculated weights by negative control selection process, but am not sure if it is helpful during factor analysis algorithm which elements (and how many elements) can optimize the computation for the spanning space of unwanted variance.

 

Any suggestions are greatly appreciated.

 

Sincerely,

Anthony C.

 

 

ruvseq ruv ruvg ruvnormalize • 640 views
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by anthonycolombo600
Answer: RUVSeq empirical negative controls? how many to take to find the span set
2
gravatar for davide risso
3.3 years ago by
davide risso830
University of Padova
davide risso830 wrote:

Hi Anthony,

when selecting a set of negative controls you have a tradeoff between having a good number of genes and a set of genes that are not affected by the biological factor of interest. Selecting more genes will in principle lead to more stable estimates of the unwanted variation (UV) factors, but will carry the risk of including genes that are actually DE.

In practice we see that usually a few hundreds genes are OK, so I think that your approach of selecting only the bottom 10% of genes ranked by p-value should be fine. However, if the results are very different with different sets of negative controls, you may want to explore a bit more the behavior of these genes to see whether the set with fewer genes doesn't fully capture the batch effects or if the larger set captures some biological signal of interest.

The easiest way is to plot the samples in the space of the first principal components color-coded by biology and possibly by other factors that you know may influence the experiment.

I hope this helps.

 

ADD COMMENTlink written 3.3 years ago by davide risso830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 180 users visited in the last hour