The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Applying SVA to RNA-Seq dataset
0
gravatar for L_K
12 months ago by
L_K0
L_K0 wrote:

Dear Bioconductor community,

I'm currently working on a RNA-seq differential expression project with a sRNA seq dataset (for miRNA differential expression analysis) and a mRNA seq dataset for (mRNA differential expression analysis). My condition of interest hast three levels with n=8, n=5 and n=6.

Anyhow, the question arose whether I should use SVA to account for potential batch effects or not. It's not that I would expect several severe batch effects (at least during library prep and sequencing). However since it's a non-model organism study (veterinary field) I thought that SVA might maybe account for mixed breeds within the groups or other unknown effects contributing unwanted gene expression variation.

Thus I would love to get your opinion on that question. Is it a problem to apply SVA to a data set with a small sample size?

To investigate the effect of SVA on the dataset I generated two PCAs: one after subtracting the significant surrogate variables (4 SVs were detected) from the dataset .

The PCA plot on the right illustrates the dataset after removal of 4 surrogate variables.

As the resulting differentially expressed genes/miRNAs are unfortunately not a subset of each other but different I really don't know which path to take and how to justify it.

Are there any possible analyses/quality controls I could run to answer my question?

And an additional small question: Would you suggest to add the RNA extraction Day as a covariate in the linear model? (there were always one from each of the three conditions extracted on one day and I have these batch dates)

 

Thank you very much for your help

 

-Matt

 

edit: Code how I subtract the surrogate variables (I use a function Jaffe et al. 2015 published):

 

cleaningP = function(y, mod, svaobj,  P=ncol(mod)) {
         X=cbind(mod,svaobj$sv)
         Hat=solve(t(X)%*%X)%*%t(X)
         beta=(Hat%*%t(y))
         cleany=y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])
         return(cleany)
 }
mod = model.matrix(~sex+condition, data =colData(dds2))

cleanp = cleaningP(mat,mod,svseq)

pca <- prcomp(t(cleanp))

 

 

ADD COMMENTlink modified 12 months ago • written 12 months ago by L_K0
Answer: Applying SVA to RNA-Seq dataset
1
gravatar for Ryan C. Thompson
12 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.2k wrote:

It looks to me like SVA is helping quite a bit for this data set. I always like to plot the surrogate variables against any known confounding factors (such as RNA extraction date, in your case). If you can show that SVA is capturing the variation due to known confounders, that gives you confidence that SVA is capturing real effects in your data that should be corrected for.

Other things you can plot your SVs against include RNA QC statistics like RIN, total read count, and percent of reads aligned to genes.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Ryan C. Thompson7.2k

Thank you very much for your input! Highly appreciated.

ADD REPLYlink written 12 months ago by L_K0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 184 users visited in the last hour