SVA and proportion of variance explained
1
1
Entering edit mode
Mattia ▴ 10
@mattia-9769
Last seen 2.2 years ago
Milano

Hi guys,

I'm using SVA package for identifying unwanted variation and then removing batch effects from my RNA-Seq data.

Let's say I found my ten SV with sva function; my question is: how can I calculate the  proportion of variance explained for each SV, as for PCs in PCA?

Thanks a lot,

Mattia

sva svd explained variance pca • 1.6k views
3
Entering edit mode
Jeff Leek ▴ 610
@jeff-leek-5015
Last seen 7 months ago
United States
Hi Mattia There is currently no approach for calculating the percent of variance explained for SVs. I think you can get a pretty good approximation by looking at the percent of variance explained by the equivalent number of PCs. But I haven't explored this carefully. Jeff On Thu, Oct 20, 2016 at 11:30 AM mchiesa [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User mchiesa <https: support.bioconductor.org="" u="" 9769=""/> wrote Question: > SVA and proportion of variance explained > <https: support.bioconductor.org="" p="" 88553=""/>: > > Hi guys, > > I'm using SVA package for identifying unwanted variation and then removing > batch effects from my RNA-Seq data. > > Let's say I found my ten SV with sva function; my question is: how can I > calculate the proportion of variance explained for each SV, as for PCs in > PCA? > > Thanks a lot, > > Mattia > ------------------------------ > > Post tags: sva, svd, explained variance, pca > > You may reply via email or visit SVA and proportion of variance explained >
0
Entering edit mode

Thanks Jeff for your precious suggestion. Actually, I tried to do something similar to PCA, using a part of your "sva code":

dat <- data_normalized
pprob <- svaobj$pprob.gam*(1-svaobj$pprob.b)
dats <- dat*pprob

dats <- dats - rowMeans(dats)
uu <- eigen(t(dats)%*%dats)

uu_val <-uu$values / sum(uu$values)

which could express the fraction of each eigenvalue over the total sum. Can I interpret the object uu_val as the "Percent of explained variance" for each SV?

I finally produced the figure where each SV is plotted against the corrisponding uu_val.

x_val <- 1:ncol(dats)

expl_var_plot <- as.data.frame(cbind(x_val,uu_val))

ggplot(expl_var_plot, aes(x_val,uu_val)) +
geom_point(size=3,pch=19, color="blue")  +
geom_text(aes(label=rownames(expl_var_plot)),hjust=0.5,vjust=2,size=3) +
xlab("SV") +
ylab("uu$values / sum(uu$values)")

Thanks for your time and consideration.

Mattia

0
Entering edit mode
Hey Mattia Thats a clever idea and clearly summarizes the percent of variance explained in the upweighted part of hte matrix in sva, but I'm not sure it estimates the percent of variance explained like with the pcs. I'd have to consider it much more carefully. You might try it on some simulated data to check to see if it works? Jeff On Mon, Oct 24, 2016 at 9:21 AM mchiesa [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User mchiesa <https: support.bioconductor.org="" u="" 9769=""/> wrote Comment: > SVA and proportion of variance explained > <https: support.bioconductor.org="" p="" 88553="" #88648="">: > > Thanks Jeff for your precious suggestion. Actually, I tried to do > something similar to PCA, using a part of your "sva code": > > > > > *dat <- data_normalized dats <- dat*svaobj$pprob.b dats <- dats - > rowMeans(dats) uu <- eigen(t(dats)%*%dats)* > > I added this line: > > *uu_val <-uu$values / sum(uu$values)* > > which could express the fraction of each eigenvalue over the total sum. > Can I interpret the object *uu_val* as the "Percent of explained > variance" for each SV? > > I finally produced the figure where each SV is plotted against the > corrisponding *uu_val*. > > *x_val <- 1:ncol(dats)* > > *expl_var_plot <- as.data.frame(cbind(x_val,uu_val))* > > > > > > > *ggplot(expl_var_plot, aes(x_val,uu_val)) + geom_point(size=3,pch=19, > color="blue") + scale_color_gradient() + > geom_text(aes(label=rownames(expl_var_plot)),hjust=0.5,vjust=2,size=3) + > xlab("SV") + ylab("uu$values / sum(uu$values)")* > > Thanks for your time and consideration. > > Mattia > > > ------------------------------ > > Post tags: sva, svd, explained variance, pca > > You may reply via email or visit > C: SVA and proportion of variance explained > ADD REPLY 0 Entering edit mode Jeff, I followed your suggestion. I created a simulated RNA-Seq dataset (15000 genes and 30 samples for each of the 2 conditions) with 3 batch effects (strong, medium and weak effects). Then, after VST normalization, I calculated: 1. uu_val, as discussed above; 2. uu_val2 <-uu$values^2 / sum(uu$values^2) 3. Percentage of Explained variance (expl_var_pca): PC_res <- prcomp(data_normalized) expl_var_pca <- (PC_res$sdev)^2 / sum(PC_res$sdev^2) Then, I calculated the correlation between uu_val2 vs expl_var_pca which resulted equal to 0.95. Moreover, I found that the first 2 SVs (corresponding to the first 2 uu_val) highly correlate (cor>0.9) with the strong and medium batches, whereas I need 6 SVs to correct the weak batch. So, it seems to me that uu_val2 behaves like Percentage of Explained variance in PCA. Thank you again! ADD REPLY 0 Entering edit mode Cool! :) Jeff On Tue, Oct 25, 2016 at 9:28 AM mchiesa [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User mchiesa <https: support.bioconductor.org="" u="" 9769=""/> wrote Comment: > SVA and proportion of variance explained > <https: support.bioconductor.org="" p="" 88553="" #88690="">: > > Jeff, > > I followed your suggestion. I created a simulated RNA-Seq dataset (15000 > genes and 30 samples for each of the 2 conditions) with 3 batch effects > (strong, medium and weak effects). Then, after VST normalization, I > calculated: > > 1. *uu_val*, as discussed above; > 2. *uu_val2 <-uu$values^2 / sum(uu$values^2)* > 3. Percentage of Explained variance (*expl_var_pca*): > > * PC_res <- prcomp(data_normalized)* > > * expl_var_pca <- (PC_res$sdev)^2 / sum(PC_res\$sdev^2)* > > Then, I calculated the correlation between uu_val2 vs expl_var_pca which > resulted equal to 0.95. > > Moreover, I found that the first 2 SVs (corresponding to the first 2 > uu_val) highly correlate (cor>0.9) with the strong and medium batches, > whereas I need 6 SVs to correct the weak batch. > > So, it seems to me that uu_val2 behaves like Percentage of Explained > variance in PCA. > > Thank you again! > ------------------------------ > > Post tags: sva, svd, explained variance, pca > > You may reply via email or visit > C: SVA and proportion of variance explained >