Dear Bioconductor community,

I'm currently working on a RNA-seq differential expression project with a sRNA seq dataset (for miRNA differential expression analysis) and a mRNA seq dataset for (mRNA differential expression analysis). My condition of interest hast three levels with n=8, n=5 and n=6.

Anyhow, the question arose whether I should use SVA to account for potential batch effects or not. It's not that I would expect several severe batch effects (at least during library prep and sequencing). However since it's a non-model organism study (veterinary field) I thought that SVA might maybe account for mixed breeds within the groups or other unknown effects contributing unwanted gene expression variation.

Thus I would love to get your opinion on that question. Is it a problem to apply SVA to a data set with a small sample size?

To investigate the effect of SVA on the dataset I generated two PCAs: one after subtracting the significant surrogate variables (4 SVs were detected) from the dataset .

The PCA plot on the right illustrates the dataset after removal of 4 surrogate variables.

As the resulting differentially expressed genes/miRNAs are unfortunately not a subset of each other but different I really don't know which path to take and how to justify it.

Are there any possible analyses/quality controls I could run to answer my question?

And an additional small question: Would you suggest to add the RNA extraction Day as a covariate in the linear model? (there were always one from each of the three conditions extracted on one day and I have these batch dates)

Thank you very much for your help

-Matt

edit: Code how I subtract the surrogate variables (I use a function Jaffe et al. 2015 published):

cleaningP = function(y, mod, svaobj, P=ncol(mod)) {

X=cbind(mod,svaobj$sv)

Hat=solve(t(X)%*%X)%*%t(X)

beta=(Hat%*%t(y))

cleany=y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])

return(cleany)

}

mod = model.matrix(~sex+condition, data =colData(dds2))

cleanp = cleaningP(mat,mod,svseq)

pca <- prcomp(t(cleanp))

Thank you very much for your input! Highly appreciated.