Question: Issue with RUVs
gravatar for asif.zubair
2.8 years ago by
asif.zubair0 wrote:


I am trying to use RUVs to normalise my data. I have 12 samples which are replicated in pairs. 

i.e. I can specify my differences like this:

differences = matrix(data= seq(1,12,by=1), byrow=TRUE, nrow=6)

where each row is a biological replicate

I create a "counts" matrix which is n X 12 matrix of the read counts of my genes. I create a vector "geneNames" which lists all the genes in the count matrix. 

I call the RUVs function as such:

RUVs(counts, geneNames, k=1, differences)

However, I am getting the following error:

Error in Y[scIdx[ii, jj], , drop = FALSE] : 
  no 'dimnames' attribute for array

Please note my "counts" matrix is not a data.frame but simply a numeric matrix. 

I then tried using the newSeqExpressionSet function:

x = as.factor(rep(c(1,2,3,4,5,6), each=2))

set = newSeqExpressionSet(as.matrix(counts), phenoData = data.frame(x,row.names=geneNames))

and run 

RUVs(set, genes, k=1, differences)

However, I get the following error:

Error in Y[scIdx[ii, jj], , drop = FALSE] : subscript out of bounds

Could someone suggest to me how I could modify my commands to make this work ? 

Thank you ! 

ADD COMMENTlink modified 2.7 years ago • written 2.8 years ago by asif.zubair0
gravatar for davide risso
2.8 years ago by
davide risso370
Weill Cornell Medicine
davide risso370 wrote:

Hi Asif,

my guess is that your count matrix doesn't have the "rownames" attribute. You should modify your matrix to have row names the same gene IDs that you use in the "genes" vector. Alternatively, you can use a numeric index instead of a character string to identify genes.  

I hope this helps.


ADD COMMENTlink written 2.8 years ago by davide risso370

Hi Davide,

Thank you, it worked ! Can I use my normalized counts directly for an analysis by say edgeR ? 



ADD REPLYlink written 2.7 years ago by asif.zubair0

No. We don't recommend using the normalized counts (except for exploration/visualization). You should include the estimated factors (the W matrix) in edgeR's GLM design matrix (see the RUVSeq vignette for a detailed example).

ADD REPLYlink written 2.7 years ago by davide risso370
gravatar for asif.zubair
2.7 years ago by
asif.zubair0 wrote:

I plotted the PCA for my normalized counts but it doesn't give anything sensible. I did normalization using DESeq's size factor estimation algorithm and it gives me sensible results. The function call I made is below:

ruv_de = RUVs(data.matrix(de), genes, k=1, differences)


EDASeq::plotPCA(data.matrix(ruv_de$normalizedCounts), col=colors[x], xlim=c(-0.5,0.5), cex=0.90)

Is there something I am doing wrong ? 

ADD COMMENTlink written 2.7 years ago by asif.zubair0

What do you mean by sensible? As you can see, there are some choices you have to make: for instance you can try different values of k.

It's hard to comment on this without knowing more about your experiment. If you're happy with the DESeq size factors, maybe it means your dataset doesn't need any other normalization more than just scaling for sequencing depth, then you should stick with DESeq normalization. But again, it's hard to tell without looking at the data.


ADD REPLYlink written 2.7 years ago by davide risso370

Yes, I think you are right. 

By sensible I meant that the replicates don't cluster together as I would like. However, simply by using the DESeq size factors, it works. 

I like RUVSeq approach and so wanted to try it on my dataset as well. Particularly interesting was the counter intuitive argument that you make about the spike-in genes in your paper. 

I am currently using only part of the dataset, perhaps this normalisation will be of more use when I use the whole dataset ? 

Thank you for your help. Appreciate it. 

ADD REPLYlink written 2.7 years ago by asif.zubair0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 111 users visited in the last hour