Question

RUVseq using RUVg most non-differential expressed genes

0

Entering edit mode

tonja.r ▴ 80

@tonjar-7565

Last seen 8.4 years ago

United Kingdom

I was following a protocol of RUVseq for a method RUVg. After performing a first pass of edger differential analysis to identify the most non-differential expressed genes I took a look on my table top and found out that I had only 7 genes with FDR < 0.9 and all others genes have an FDR of >0.999. The concept of RUVg is to take the most undifferentially expressed genes to find the factors of unwanted variants but if I have only 7 genes with <0.9, doesn't it mean already that RUVg will not help me to account for the batch effect?

First pass of edgeR:

design <- model.matrix( ̃x, data=pData(set))
y <- DGEList(counts=counts(set), group=x)
y <- calcNormFactors(y, method="upperquartile") y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
fit <- glmFit(y, design) lrt <- glmLRT(fit, coef=2)
top <- topTags(lrt, n=nrow(set))$table

ruvseq • 2.7k views

ADD COMMENT • link updated 9.3 years ago by davide risso ▴ 980 • written 9.3 years ago by tonja.r ▴ 80

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 2 hours ago

The city by the bay

Well, it's hard to say. The lack of significant genes may be due to a batch effect between your replicates, which is inflating your dispersion estimates and reducing detection power for DE. If this is the case, then RUVg might be able to help by removing that batch effect. But, you won't know until you try.

Of course, if this were hypothetically true, then you wouldn't be able to define non-DE genes as those with large adjusted p-values. Even moderately DE genes would have large p-values due to the lack of power from inflated variability. Including DE genes in the control set would probably cause RUVg to remove genuine DE between the conditions of interest, which is not ideal. You could probably get around this by using RUVr instead.

ADD COMMENT • link 9.3 years ago Aaron Lun ★ 28k

score 1 · Accepted Answer · 2015-12-02

An alternative strategy would be to use a general list of housekeeping genes, like the one that you can find here: http://www.stat.berkeley.edu/~johann/ruv/resources/hk.txt (for human, it should work fine for mouse, too, but may not for other organisms).

If you have replicate samples, you can consider using RUVs. We find that it is usually quite robust to the set of negative controls, so it should not be a problem even if your set of genes is not strictly a set of negative controls.