Search
Question: Normalised counts from RUVg in RUVSeq
0
gravatar for Pauly Lin
2.5 years ago by
Pauly Lin60
University of New South Wales, Australia
Pauly Lin60 wrote:

Hi, 

I'm using the function RUVg in the RUVSeq package to analyse my RNA-seq data. The output of RUVg consists of the factors of unwanted variation W and the normalised counts N. Could someone please tell me how the normalised counts N are calculated? See the formula below.

log E[Y |W, X, O] = W α + Xβ + O. 

Are the normalised counts simply log(Y)-Wα-O? If so, how can I get α from the output of RUVg? Also, log(Y)-Wα-O wouldn't consist of integers, so why does the normalised counts matrix N produced by RUVg consist of counts?

Thanks!

Paul

ADD COMMENTlink modified 16 months ago by cbcb0 • written 2.5 years ago by Pauly Lin60
0
gravatar for Pauly Lin
2.5 years ago by
Pauly Lin60
University of New South Wales, Australia
Pauly Lin60 wrote:

I have figured this out myself, so no need to answer this question any more. 

Cheers

Paul

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Pauly Lin60
0
gravatar for davide risso
2.5 years ago by
davide risso520
Weill Cornell Medicine
davide risso520 wrote:

Hi Paul,

I'm glad you figured it out yourself. I'll just post an answer anyway in case somebody else has a similar issue.

The normalized counts are indeed simply the residuals from ordinary least squares regression of logY on W (with the offset if needed)􏰂. The output from RUV is actually exponentiated (and optionally rounded) to be interpretable as "normalized counts". Please, note that these are intended just for visualization/exploration, as the RUV model has been tested only on supervised problems, and works better when W is inserted in the model with the original counts (rather than modeling the normalized counts).

Similarly, the alpha estimated from the first step of RUV shouldn't probably be used directly (and hence is not returned), but rather alpha and beta will be (re-)estimated by the full model on W and X. If you're using edgeR, you should find the estimated parameters in the output of glmFit (coefficients).

 

ADD COMMENTlink written 2.5 years ago by davide risso520
0
gravatar for cbcb
16 months ago by
cbcb0
cbcb0 wrote:

Hello,

How to apply RUVg normalization on my FPKM or counts table followed by export/write the normalized data frame to a directory?

Many thanks for your kind help in advance.

Chris

 

 

ADD COMMENTlink written 16 months ago by cbcb0

Hi Chris,

please, use the "Add Answer" field only to answer the original posted question. If you have a new question, please open a new thread. 

In this particular case, it looks like the best place for you to start is by reading the RUVg vignette, available here:

https://www.bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf

ADD REPLYlink written 16 months ago by davide risso520

Hi David,

Many thanks for your quick reply.

I did start with the RUVSeq vignette, but i can't find how to export the normalized counts? I have got a kinetics data and I dont know how to apply DESeq on such a data. Hence, I would like to export the RUVg normalized data frame.

 

Thanks,

 

Chris

ADD REPLYlink written 16 months ago by cbcb0

The way you extract normalized counts depends on what object you're working on. If you're starting with a matrix, RUVg will return a list with two elements, one of which is the matrix of normalized counts.

If you're working with a SeqExpressionSet object, then you can use the normCounts() method to extract the normalized data. This is all documented in the RUVg manual page, available with:

?RUVSeq::RUVg

Have a look also at

?EDASeq::normCounts​
ADD REPLYlink written 16 months ago by davide risso520
0
gravatar for cbcb
16 months ago by
cbcb0
cbcb0 wrote:

Could you please comment on the below commands? is it right way to get the RUVg normalized data?

thanks,

 

my_data=read.table(my_data_file, sep="\t", header=TRUE)

ercc_rows <- grep("^ERCC", rownames(my_data))
my_data_ruv_object_k1 <- RUVg(x = as.matrix(my_data), cIdx = ercc_rows, k = 1)
my_data_ruv_k1 <- my_data_ruv_object_k1$normalizedCounts

 

ADD COMMENTlink written 16 months ago by cbcb0

That's correct. 

ADD REPLYlink written 16 months ago by davide risso520

> tail(my_data)
             A_count   B_count     D_count    C_count
ERCC-00163   38.02860   42.57420   30.858200    29.8872
ERCC-00164    0.00000    1.12037    0.907595     0.0000
ERCC-00165  184.29200  211.75100  155.199000   174.3420
ERCC-00168    2.92528    1.12037    0.907595     0.0000
ERCC-00170   48.75460   62.74100   66.254400    78.7031
ERCC-00171 8049.39000 6976.57000 5932.040000  6338.0900

 

> tail(my_data_ruv_k1 )
           A_count    B_count    D_count     C_count
ERCC-00163 3.65079862  3.7640702  3.5198576  3.3956702
ERCC-00164 0.08228729  0.8149703  0.2887788  0.2113975
ERCC-00165 5.19581388  5.3400049  5.1644678  5.0996380
ERCC-00168 1.39912058  0.7759935  0.5083635  0.0813939
ERCC-00170 3.90363981  4.1521606  4.2235097  4.3694117
ERCC-00171 8.97126617  8.8333496  8.7846650  8.6974335

 

Why do my 0 read counts get assigned values after RUVg normalization?

is it expected to happen?

Many thanks
 

ADD REPLYlink written 16 months ago by cbcb0

Yes. It can happen since the "normalized data" are simply the residuals of a linear model, and there is no constraint in the model to force zero to stay zero. Note that in practice this will all be close to 0.

Also, keep in mind that the normalzed value are in the log(x+1) scale.

ADD REPLYlink written 16 months ago by davide risso520

many thanks

ADD REPLYlink written 16 months ago by cbcb0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 232 users visited in the last hour