Question

Issue with RUVs

0

Entering edit mode

asif.zubair • 0

@asifzubair-6770

Last seen 7.6 years ago

Hi,

I am trying to use RUVs to normalise my data. I have 12 samples which are replicated in pairs.

i.e. I can specify my differences like this:

differences = matrix(data= seq(1,12,by=1), byrow=TRUE, nrow=6)

where each row is a biological replicate

I create a "counts" matrix which is n X 12 matrix of the read counts of my genes. I create a vector "geneNames" which lists all the genes in the count matrix.

I call the RUVs function as such:

RUVs(counts, geneNames, k=1, differences)

However, I am getting the following error:

Error in Y[scIdx[ii, jj], , drop = FALSE] :
no 'dimnames' attribute for array

Please note my "counts" matrix is not a data.frame but simply a numeric matrix.

I then tried using the newSeqExpressionSet function:

x = as.factor(rep(c(1,2,3,4,5,6), each=2))

set = newSeqExpressionSet(as.matrix(counts), phenoData = data.frame(x,row.names=geneNames))

and run

RUVs(set, genes, k=1, differences)

However, I get the following error:

Error in Y[scIdx[ii, jj], , drop = FALSE] : subscript out of bounds

Could someone suggest to me how I could modify my commands to make this work ?

Thank you !

ruvseq RUVs EDA • 2.4k views

ADD COMMENT • link 9.6 years ago asif.zubair • 0

0

Entering edit mode

asif.zubair • 0

@asifzubair-6770

Last seen 7.6 years ago

I plotted the PCA for my normalized counts but it doesn't give anything sensible. I did normalization using DESeq's size factor estimation algorithm and it gives me sensible results. The function call I made is below:

ruv_de = RUVs(data.matrix(de), genes, k=1, differences)

x=rep(c(1,2,3,4,5,6),2)

EDASeq::plotPCA(data.matrix(ruv_de$normalizedCounts), col=colors[x], xlim=c(-0.5,0.5), cex=0.90)

Is there something I am doing wrong ?

ADD COMMENT • link 9.6 years ago asif.zubair • 0

0

Entering edit mode

What do you mean by sensible? As you can see, there are some choices you have to make: for instance you can try different values of k.

It's hard to comment on this without knowing more about your experiment. If you're happy with the DESeq size factors, maybe it means your dataset doesn't need any other normalization more than just scaling for sequencing depth, then you should stick with DESeq normalization. But again, it's hard to tell without looking at the data.

ADD REPLY • link 9.6 years ago davide risso ▴ 950

0

Entering edit mode

Yes, I think you are right.

By sensible I meant that the replicates don't cluster together as I would like. However, simply by using the DESeq size factors, it works.

I like RUVSeq approach and so wanted to try it on my dataset as well. Particularly interesting was the counter intuitive argument that you make about the spike-in genes in your paper.

I am currently using only part of the dataset, perhaps this normalisation will be of more use when I use the whole dataset ?

Thank you for your help. Appreciate it.

ADD REPLY • link 9.6 years ago asif.zubair • 0

score 1 · Accepted Answer · 2014-09-23

1

Entering edit mode

davide risso ▴ 950

@davide-risso-5075

Last seen 5 weeks ago

University of Padova

Hi Asif,

my guess is that your count matrix doesn't have the "rownames" attribute. You should modify your matrix to have row names the same gene IDs that you use in the "genes" vector. Alternatively, you can use a numeric index instead of a character string to identify genes.

I hope this helps.

davide

ADD COMMENT • link 9.6 years ago davide risso ▴ 950

0

Entering edit mode

Hi Davide,

Thank you, it worked ! Can I use my normalized counts directly for an analysis by say edgeR ?

Best,

Asif

ADD REPLY • link 9.6 years ago asif.zubair • 0

1

Entering edit mode

No. We don't recommend using the normalized counts (except for exploration/visualization). You should include the estimated factors (the W matrix) in edgeR's GLM design matrix (see the RUVSeq vignette for a detailed example).

ADD REPLY • link 9.6 years ago davide risso ▴ 950