Applying SVA to RNA-Seq dataset
1
0
Entering edit mode
L_K • 0
@l_k-14850
Last seen 4.0 years ago

Dear Bioconductor community,

I'm currently working on a RNA-seq differential expression project with a sRNA seq dataset (for miRNA differential expression analysis) and a mRNA seq dataset for (mRNA differential expression analysis). My condition of interest hast three levels with n=8, n=5 and n=6.

Anyhow, the question arose whether I should use SVA to account for potential batch effects or not. It's not that I would expect several severe batch effects (at least during library prep and sequencing). However since it's a non-model organism study (veterinary field) I thought that SVA might maybe account for mixed breeds within the groups or other unknown effects contributing unwanted gene expression variation.

Thus I would love to get your opinion on that question. Is it a problem to apply SVA to a data set with a small sample size?

To investigate the effect of SVA on the dataset I generated two PCAs: one after subtracting the significant surrogate variables (4 SVs were detected) from the dataset .

The PCA plot on the right illustrates the dataset after removal of 4 surrogate variables.

As the resulting differentially expressed genes/miRNAs are unfortunately not a subset of each other but different I really don't know which path to take and how to justify it.

Are there any possible analyses/quality controls I could run to answer my question?

And an additional small question: Would you suggest to add the RNA extraction Day as a covariate in the linear model? (there were always one from each of the three conditions extracted on one day and I have these batch dates)

 

Thank you very much for your help

 

-Matt

 

edit: Code how I subtract the surrogate variables (I use a function Jaffe et al. 2015 published):

 

cleaningP = function(y, mod, svaobj,  P=ncol(mod)) {
         X=cbind(mod,svaobj$sv)
         Hat=solve(t(X)%*%X)%*%t(X)
         beta=(Hat%*%t(y))
         cleany=y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])
         return(cleany)
 }
mod = model.matrix(~sex+condition, data =colData(dds2))

cleanp = cleaningP(mat,mod,svseq)

pca <- prcomp(t(cleanp))

 

 

DESeq2 sva svaseq batch effect correction • 3.6k views
ADD COMMENT
1
Entering edit mode
@ryan-c-thompson-5618
Last seen 9 weeks ago
Icahn School of Medicine at Mount Sinai…

It looks to me like SVA is helping quite a bit for this data set. I always like to plot the surrogate variables against any known confounding factors (such as RNA extraction date, in your case). If you can show that SVA is capturing the variation due to known confounders, that gives you confidence that SVA is capturing real effects in your data that should be corrected for.

Other things you can plot your SVs against include RNA QC statistics like RIN, total read count, and percent of reads aligned to genes.

ADD COMMENT
0
Entering edit mode

Thank you very much for your input! Highly appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6