so I am using RUVg(eset, k=1,...) determining the in-silico negative controls by gene or transcript that has a p.value greater than 0.55 with the FDR procedure Benj-Hoch (many many highly insignificant entries came up).
my question is I am not sure how many to include as insignificant entries into RUVg empirical negative controls. ?
my first thought is to take the bottom 10% insignificant genes/transcripts with pval near 0.8 (these were a few hundred far fewer than the flat threshold of p.val 0.55).
Then after reading the RUVSeq manual, they grabbed anything that is not in the top 5000 genes returned from edgeR.
I do notice a difference in the calculated weights by negative control selection process, but am not sure if it is helpful during factor analysis algorithm which elements (and how many elements) can optimize the computation for the spanning space of unwanted variance.
Any suggestions are greatly appreciated.