Search
Question: How to use RUVSeq in clustering problems?
0
gravatar for bioinfo20014
6 weeks ago by
bioinfo2001410
bioinfo2001410 wrote:

I have successfully used RUVSeq to correct samples from a "classical" control vs treatment experiment for batch effects using RUVr, RUVg, RUVs and svaseq and all gave similar results, which were satisfactory.

Now I want to use RUVSeq in a clustering problem and I understand I can only use RUVs.

I obtained public RNA-seq from various tissues with replicates and after running RUVs, the resulting PCA doesn't separate samples by tissue, while the rlog'ed uncorrected counts and svaseq corrected counts resulted in the expected clustering by tissue.

My question is: how to use RUVs in clustering problems? My code:

counts_norm = round(counts_deseq, digits=0)
differences <- makeGroups(groups)
batch_ruv_reps <- RUVs(counts_norm, rownames(counts_norm), k=3, differences)
counts_ruvseq = batch_ruv_reps$normalizedCounts # plot PCA using this matrix

groups is a vector of tissue names, counts_deseq is a matrix of counts normalized using DESeq2's rlog function

differences is:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    2    3   -1   -1   -1   -1   -1
[2,]    4    5   -1   -1   -1   -1   -1   -1
[3,]    6    7    8    9   -1   -1   -1   -1
[4,]   23   24   25   26   27   28   29   30
[5,]   10   11   12   -1   -1   -1   -1   -1
[6,]   15   16   17   18   19   20   21   22
[7,]   13   14   -1   -1   -1   -1   -1   -1

I changed k and didn't get better results. Is there anything else I should be doing?

Thank you.

ADD COMMENTlink modified 6 weeks ago by davide risso520 • written 6 weeks ago by bioinfo2001410
0
gravatar for davide risso
6 weeks ago by
davide risso520
Weill Cornell Medicine
davide risso520 wrote:

I'm not sure if this is the only problem, but you should not use RUVs on the rlog transformed values from DESeq2. RUVs assumes that your input matrix contains counts, and rounded transformed data from DESeq2 is not the the expected input for RUVs (which will internally take the log of the counts).

A better pipeline for what you are trying to do is to use the transformed data from DESeq2 (without rounding) and use the RUVnormalize package that uses a linear model version of RUV. The `naiveReplicateRUV` function will be the analog of RUVs, or you can have a look at the `iterativeRUV` function to estimate the factors of unwanted variation and the clusters iteratively.

ADD COMMENTlink written 6 weeks ago by davide risso520

Thanks for your reply. I'm actually using the DESeq2 normalized counts, not rlog, unlike I said. I also tried using raw counts and the betweenLaneNormalization function but got similar results.

I am going to try what you suggested.

ADD REPLYlink written 6 weeks ago by bioinfo2001410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 254 users visited in the last hour