Question: Number of co-variables k in RUVSeq (RUVr)
0
gravatar for David Rengel
3.9 years ago by
David Rengel70
European Union
David Rengel70 wrote:

Hi,

I use RUVSeq and I find it extremely helpful.  I have a question concerning
the number of covariables to be used under RUVr. I've realized that
increasing the number of covariables makes the groups I want to see on the
PCA more visible and distinct from each other. It follows the the number of
DE genes also increases with k.
In one of my projects I have 72 samples and I run RUVr with up to k=50. The
number of DE genes on each of my comparisons increases exponentially up to a
plateau when k is high. Likewise, the common dispersion decreases with increasing k. It looks so good both in terms of PCA and DE genes that I wonder if using such high k values might have induced false interpretations or high number of false positives.
I came to ask myself such questions also because on the RUVSeq manual, the
given example is k=1 and I wondered why this is the case if increasing k
improves the results.

I would be grateful if you could provide me with any feedback on this.

Thanks!

rnaseq edger ruvseq ruvr • 1.6k views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by David Rengel70
Answer: Number of co-variables k in RUVSeq (RUVr)
1
gravatar for davide risso
3.9 years ago by
davide risso830
University of Padova
davide risso830 wrote:

Hi David,

what set of negative control genes are you using?

What you are observing is not too surprising, if you are using a large set of negative controls. RUVr assumes that the factors of unwanted variation are orthogonal to the factor of interest, so with such a large number of factors you are probably removing all the variation that is not explained by the factor of interest. Hence, you get smaller dispersion parameters, and more DE genes.

Note that the fact that you have more DE genes does not mean that the data are well normalized. A large fraction of them is likely to be made of false positives. A better way of deciding how many factors to use in your dataset is to look at the behavior of positive and negative controls (i.e., genes that you know - or suspect - to be DE and non DE, respectively) at different values of k. If you see that the fraction of DE positive controls increases, while the fraction of DE negative controls doesn't, than you are on the right track. (Note that the negative controls that you use for testing k should be different from the one you use to estimate the factors of RUV).

Finally, although this is largely an empirical observation, usually a few (2-3) factors are enough to capture the unwanted variation. In very noisy datasets you can increase to maybe 5 or 10, but 50 sounds definitely too many.

ADD COMMENTlink written 3.9 years ago by davide risso830
Answer: Number of co-variables k in RUVSeq (RUVr)
0
gravatar for David Rengel
3.9 years ago by
David Rengel70
European Union
David Rengel70 wrote:

Hi Davide,

Thanks a lot for the answer. I thought your answer would have been mailed to me, that is why I had not replied.

Actually, I am not working with negative controls for several reasons. Should I? I mean, negative controls are not meant to be used under RUVg? It is not so obvious for me to find non modulated robust genes, especially in the 72-sample project. Indeed, that is why I chose RUVr.

Nevertheless, I am verifying some candidate genes that are actually meant to be modulated. In some other project (not the one with 72 samples) some genes are actually being tested by qPCE as I write. I'll see how those ones behave.

I would appreciate any help with regard to the negative controls.

Kind regards,

David

ADD COMMENTlink written 3.9 years ago by David Rengel70

Although RUVr and RUVs are more robust to the choice of negative controls, they still formally require you to choose a set of such genes. Since, as I said, RUVr is robust to some negative controls not being really "negative", I would suggest that you try with a "general" set of genes, such as the list of housekeeping genes that you can find here:

http://www.stat.berkeley.edu/~johann/ruv/resources/hk.txt

We have good experience with using housekeeping genes as negative controls, in general.

ADD REPLYlink written 3.9 years ago by davide risso830

Thanks Davide. I'll have a look at HKG, though it is not that straignt forward: the species I am dealing with has not been so thouroughly studied. And I wil certainly reduce the number of k variables!

Best,

ADD REPLYlink written 3.8 years ago by David Rengel70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 155 users visited in the last hour