RUV4 for microarrays
Hi everyone, I am trying to remove an unknown batch effect. After reading several articles and posts, I found RUV (remove unwanted variables). In the first place, I used the RUVnormalise package with the use of 11 housekeeping genes as negatives controls. However, the batch effect remains. Then, I focused on the package called ruv and the function RUV4. I observed that I could get the experimental control genes from my data and applying with them the correction.

When I applied RUV4 function doesn't return an adjusted data matrix, instead of this, it returns a list of different values. I don't know how to obtain the adjusted data matrix to carry out the downstream analysis, such as DE.

Can anyone help me?

# Log transformation
exprlog <- log2(exprbygene)

res <- diffExprAnalysis(dat = exprlog,ed = ed,condition = "title")
sum(res$CONTROL-RIF$adj.P.Val>0.999) # 30

control_genes <- rownames(res$CONTROL-RIF[res$CONTROL-RIF$adj.P.Val>0.999,]) # Normalization with RUV cIdx <- which(rownames(exprlog)%in%control_genes) k <- 3 design <- model.matrix(~0 + ed$title)

nsY <- RUV4(Y = t(exprlog), ctl = cIdx, X = ed\$title, k = 1)
The ruv package isn't a Bioconductor package, but maybe it's Bioconductor-adjacent? Technically since it's a CRAN package you should be asking on R-help or biostars or whatever. Anyway, this is an example of the need to perform a close reading of the help pages for any package you might want to use - the information is there, but it's usually pretty terse and every word counts.

You are asking how to get the adjusted data matrix for downstream analysis, while apparently not understanding that RUV4 is carrying out the analysis for you. From ?RUV4

Arguments:

Y: The data.  A m by n matrix, where m is the number of samples
and n is the number of features.

X: The factor(s) of interest.  A m by p matrix, where m is the
number of samples and p is the number of factors of interest.
Very often p = 1.  Factors and dataframes are also
permissible, and converted to a matrix by 'design.matrix'.

## and further down

Details:

Implements the RUV-4 algorithm as described in Gagnon-Bartsch,
Jacob, and Speed (2013), using the SVD as the factor analysis
routine.  Unwanted factors W are estimated using control genes.  Y
is then regressed on the variables X, Z, and W.

Which pretty clearly states that this function does the regression for you? But ruv could use a vignette because RUV4 returns a not completely useful object. If you look at ?ruv_summary it becomes somewhat clearer:

RUV Summary

Description:

Post-process and summarize the results of call to RUV2, RUV4,
RUVinv, or RUVrinv.

Usage:

ruv_summary(Y, fit, rowinfo=NULL, colinfo=NULL, colsubset=NULL, sort.by="F.p",
var.type=c("ebayes", "standard", "pooled"),
p.type=c("standard", "rsvar", "evar"), min.p.cutoff=10e-25)

## and further down

Details:

This function post-processes the results of a call to
RUV2/4/inv/rinv and then nicely summarizes the output.  The
post-processing step primarily consists of a call to
t-statistics, and and p-values.  See variance_adjust for details.
The 'var.type' and 'p.type' options determine which of these
the column means of the 'Y' matrix are computed, both before and
after the call to 'RUV1' (if 'eta' was specified).

After post-processing, the results are summarized into a list
containing 4 objects: 1) the data matrix 'Y'; 2) a dataframe 'R'
containing information about the rows (samples); 3) a dataframe
'C' containing information about the columns (features, e.g.
genes), and 4) a list 'misc' of other information returned by
RUV2/4/inv/rinv.

There are other functions in ruv that are presumably useful for doing things, but I leave it to you to do your own further exploration.

