Question

Can Normalized cpm values be used as RUVs input

0

Entering edit mode

sooby • 0

@sooby-11430

Last seen 7.6 years ago

Hi,

I am considering to remove the batch effect in my samples to decrease the variation among the replicates. I find RUVseq is great and easy to use. It has very specified details about how to use RUVseq in the DE genes analysis. But I still have a question. I need to used the batch corrected genes expressions in the downstream analysis. I have one solution, but I am not sure if it is available. Please help me.

My solution is :

1. use edgeR to generate normalized cpm values.

2. Put the normalized cpm values into RUVseq and use RUVs function to remove the unwanted variations (because for each conditions, I have three replicates.).

3. use normCounts function to get the corrected gene expression values.

After that, I can get a normalized cpm values , adjusted by removing unwanted variations. I only use these values to explore some specific genes expression profile in my samples. For the DE genes analysis, I will do as the examples showed in RUVseq manual.

Is my solution possible?

Or, if you have any suggestions, please let me know.

Best,

Sooby.

rnaseq ruvseq • 1.6k views

ADD COMMENT • link updated 7.6 years ago by davide risso ▴ 950 • written 7.6 years ago by sooby • 0

score 0 · Answer 1 · 2016-09-06

0

Entering edit mode

davide risso ▴ 950

@davide-risso-5075

Last seen 5 weeks ago

University of Padova

I believe that your approach is reasonable for exploratory data analysis. As you point out, we recommend a different approach for differential expression, but your proposed solution will work if the objective is data exploration.

ADD COMMENT • link 7.6 years ago davide risso ▴ 950

0

Entering edit mode

Thanks for your fast reply, David.

I found when I used RUVs function, the W I got for every samples were all negative. Is it normal? Additionally, do you have any suggestions about what is the best "k factor" in RUV analysis? I found the bigger K could result in the better cluster of replicates. But I think if we use big K, we may loose some true DE genes. So what is your opinion?

Sooby.

ADD REPLY • link 7.6 years ago sooby • 0

0

Entering edit mode

The values in W are not very informative: Since we need to estimate both W and alpha, these are unidentifiable (e.g., swapping signs between alpha and W will give you the same solution). Increasing k too much may be a problem, especially if the negative control genes are not a perfect set (we generally find that <= 3 is on the safe side).

ADD REPLY • link 7.6 years ago davide risso ▴ 950

0

Entering edit mode

Let me rephrase that: the value in W are actually informative, but not in an absolute scale, i.e., they're only informative in relation to one another and not for their actual value.

ADD REPLY • link 7.6 years ago davide risso ▴ 950

0

Entering edit mode

Thanks David,

I think I get your point about the W values. And I will pay attention to the k values.

Thanks again,

Sooby.

ADD REPLY • link 7.6 years ago sooby • 0