Entering edit mode

Hi guys,

I'm using SVA package for identifying unwanted variation and then removing batch effects from my RNA-Seq data.

Let's say I found my ten SV with sva function; my question is: how can I calculate the proportion of variance explained for each SV, as for PCs in PCA?

Thanks a lot,

Mattia

Thanks Jeff for your precious suggestion. Actually, I tried to do something similar to PCA, using a part of your "sva code":

dat <- data_normalizedpprob <- svaobj$pprob.gam*(1-svaobj$pprob.b)

dats <- dat*pprob

dats <- dats - rowMeans(dats)uu <- eigen(t(dats)%*%dats)

I added this line:

uu_val <-uu$values / sum(uu$values)which could express the fraction of each eigenvalue over the total sum. Can I interpret the object

as the "Percent of explained variance" for each SV?uu_valI finally produced the figure where each SV is plotted against the corrisponding

.uu_valx_val <- 1:ncol(dats)expl_var_plot <- as.data.frame(cbind(x_val,uu_val))ggplot(expl_var_plot, aes(x_val,uu_val)) +geom_point(size=3,pch=19, color="blue") +

scale_color_gradient() +

geom_text(aes(label=rownames(expl_var_plot)),hjust=0.5,vjust=2,size=3) +

xlab("SV") +

ylab("uu$values / sum(uu$values)")

Thanks for your time and consideration.

Mattia

Jeff,

I followed your suggestion. I created a simulated RNA-Seq dataset (15000 genes and 30 samples for each of the 2 conditions) with 3 batch effects (strong, medium and weak effects). Then, after VST normalization, I calculated:

, as discussed above;uu_valuu_val2 <-uu$values^2 / sum(uu$values^2)):expl_var_pcaPC_res <- prcomp(data_normalized)expl_var_pca <- (PC_res$sdev)^2 / sum(PC_res$sdev^2)Then, I calculated the correlation between uu_val2 vs expl_var_pca which resulted equal to 0.95.

Moreover, I found that the first 2 SVs (corresponding to the first 2 uu_val) highly correlate (cor>0.9) with the strong and medium batches, whereas I need 6 SVs to correct the weak batch.

So, it seems to me that uu_val2 behaves like Percentage of Explained variance in PCA.

Thank you again!